Asian Development Bank#

This module handles the Asian Transport Outlook (ATO) source maintained by the Asian Development Bank (ADB, initially) and Asian Infrastructure Investment Bank (AIIB, more recently), specifically the ATO National Database.

In particular, it converts data from the ATO native Excel file format—both the 2022-10-07 and 2024-05-20 formats—to SDMX and extracts metadata.

ATO National Database format#

The ATO native Excel file format is characterized by the following. There is an ATO National Database User Guide that contains some of the information below, but does not describe the file format.

  • Data flow IDs like TAS-PAT-001(1), wherein:

    • TAS is the ID of a ‘category’ code with a corresponding name like “Transport Activity & Services”. Individual files (‘workbooks’) contain data for flows within one category.

    • PAT is the ID of a ‘subcategory’ code with a corresponding name like “Passenger Activity Transit”.

    • 001(1) is the ID of an ‘indicator’ code with a corresponding name like “Passengers Kilometer Travel - Railways”.

    • All data flows with the same initial part like TAS-PAT-001 contain data for the same measure. The final part (1) indicates data from alternate sources for the same measure.

  • Files contain a “TOC” sheet and further individual sheets. See read_sheet(), which reads these sheets, for a detailed description of the apparent format.

  • The files are periodically updated.

  • The update schedule is not fixed in advance.

  • Previous versions of the files do not appear to be available.

  • The file metadata contains a “Created by:” field with information like “2022-10-07, 11:41:56, openpyxl”. At least two different dates have been observed:

    • 2022-10-07

    • 2024-05-20.

Code reference#

Asian Development Bank (ADB) provider.

Submodules

cli

CLI for adb.

Module data

CL_ECONOMY

List of all "ECONOMY" codes appearing in processed data.

CS_MEASURE

List of all measures (indicators) appearing in processed data.

FILES

Mapping from short codes for ATO data categories to file names.

transport_data.adb.CL_ECONOMY = <Codelist ECONOMY (0 items): Asian Transport Outlook subject economy>[source]#

List of all “ECONOMY” codes appearing in processed data.

transport_data.adb.CS_MEASURE = <ConceptScheme MEASURE (0 items): Asian Transport Outlook measures (indicators)>[source]#

List of all measures (indicators) appearing in processed data.

Todo

Validate against the master list of indicators; or read from that file and validate IDs appearing in data files.

transport_data.adb.FILES = {'ACC': ('ATO Workbook (ACCESS & CONNECTIVITY (ACC)).xlsx', 'sha256:21c3b7e662d932f5cc61c22489acb3cf0e8a70200abc2372c7fe212602903fd7'), 'APH': ('ATO Workbook (AIR POLLUTION & HEALTH (APH)).xlsx', 'sha256:b06c102a1184ed83d77673146599e11c2b6c81d784ebcee6b46f4d43713e899a'), 'CLC': ('ATO Workbook (CLIMATE CHANGE (CLC)).xlsx', 'sha256:7fe2d4bb656508bf3194406d25ed04c1ac403daaaf1ada74847a2c330efcfb2a'), 'INF': ('ATO Workbook (INFRASTRUCTURE (INF)).xlsx', 'sha256:30b9d05330838809ff0d53b2e61ade935eccdb4b73c0509fd1fc49b405699ac3'), 'MIS': ('ATO Workbook (MISCELLANEOUS (MIS)).xlsx', 'sha256:2ef3cdc5e6363cdca1f671bbf12bf0463fe8a4210cb49b3d32ebd2440c6fe6df'), 'POL': ('ATO Workbook (TRANSPORT POLICY (POL)).xlsx', 'sha256:fbf23b012590b631239654d255d23ccb70fa717b466be8343a5b0f1e8b4ce720'), 'RSA': ('ATO Workbook (ROAD SAFETY (RSA)).xlsx', 'sha256:a4285d129c2739c8660a07a5d1c9902ec21c3cd4d13a2a9dfe9d49daca2c0dd5'), 'SEC': ('ATO Workbook (SOCIO-ECONOMIC (SEC)).xlsx', 'sha256:b5d2ee5d07b5554ef262436ef898afd976fbb4956bd7cb850829cb7391d207c0'), 'TAS': ('ATO Workbook (TRANSPORT ACTIVITY & SERVICES (TAS)).xlsx', 'sha256:628d4e9774f84d30d80706e982e4ddf7187d77f8676e69765c307701da1caf77')}[source]#

Mapping from short codes for ATO data categories to file names.

Functions

convert(part)

convert_sheet(df, aa)

Convert df and aa from read_sheet() into SDMX data structures.

dataset_to_metadata_reports(ds, msd)

Convert the attributes of ATO ds to 1 or more MetadataReport.

expand(fname)

fetch(*parts[, dry_run])

format_data_provider(value)

Format the ATO “Source” data attribute as TDC DATA_PROVIDER metadata.

get_agencies()

prepare(aa)

Prepare an empty data set and associated structures.

provides()

read_sheet(ef, sheet_name)

Read a single sheet.

validate_economy(df)

Validate codes for the "ECONOMY" dimension of df against CL_ECONOMY.

transport_data.adb.convert(part: str)[source]#
transport_data.adb.convert_sheet(df: DataFrame, aa: AnnotableArtefact)[source]#

Convert df and aa from read_sheet() into SDMX data structures.

transport_data.adb.dataset_to_metadata_reports(ds: DataSet, msd: MetadataStructureDefinition) Iterable[MetadataReport][source]#

Convert the attributes of ATO ds to 1 or more MetadataReport.

The metadata reports conform to the TDC metadata structure ( org.metadata.get_msd()).

If ds contains per-series values for attributes named “Source”, “Source (2024-11)”, or similar, then additional metadata reports are generated, one for each series (=GEO, or ‘economy’) and each distinct upstream source indicated by these attribute values.

transport_data.adb.expand(fname: str) str[source]#
transport_data.adb.fetch(*parts, dry_run: bool = False)[source]#
transport_data.adb.format_data_provider(value: str) str[source]#

Format the ATO “Source” data attribute as TDC DATA_PROVIDER metadata.

This makes more explicit how the ATO has handled upstream data.

transport_data.adb.get_agencies()[source]#
transport_data.adb.prepare(aa: AnnotableArtefact) Tuple[DataSet, Callable][source]#

Prepare an empty data set and associated structures.

transport_data.adb.provides()[source]#
transport_data.adb.read_sheet(ef: ExcelFile, sheet_name: str) Tuple[DataFrame, AnnotableArtefact][source]#

Read a single sheet.

This function handles the particular layout of sheets in files like those listed in FILES. These combine data and metadata.

  • Row 1 is a title row.

  • Cell range A2:B10 contain a set of metadata fields, with the field name in column A and the value in column B.

  • Rows 11:13 contain no data or metadata; only a link back to a table of contents sheet.

  • Row 14 contains a label “Series” centre-spanned across

  • Row 15 contains column labels, described below.

  • Row 16 and onwards contain data, followed by two blank rows, and two rows with attribution/acknowledgements.

  • Columns labeled (i.e. in row 15) “Economy Code” and “Economy Name” contain codes and names, respectively, for the geographic units.

  • Columns with numeric labels describe time periods, specifically years, that are part of observation keys.

  • Some sheets have additional columns with non-numeric labels like “Remarks”, “Source (2022-04)”, etc.; these give annotations applying to the observations on the same row (i.e. for a single “Economy Code” and 1 or more time periods).

Note

Sheets in the POL category have a different format.

transport_data.adb.validate_economy(df: DataFrame) DataFrame[source]#

Validate codes for the “ECONOMY” dimension of df against CL_ECONOMY.

  • Every unique pair of (Economy Code, Economy Name) is converted to a Code.

  • These are added to CL_ECONOMY. If a Code with the same ID already exists, it is checked for an exact match (name, description, etc.)

  • The “Economy Code” column of df is renamed “ECONOMY”, and contains only values from CL_ECONOMY. The “Economy Name” column is dropped.