Asian Development Bank#
This module handles the Asian Transport Outlook (ATO) source maintained by the Asian Development Bank (ADB, initially) and Asian Infrastructure Investment Bank (AIIB, more recently), specifically the ATO National Database.
In particular, it converts data from the ATO native Excel file format—both the 2022-10-07 and 2024-05-20 formats—to SDMX and extracts metadata.
ATO National Database format#
The ATO native Excel file format is characterized by the following. There is an ATO National Database User Guide that contains some of the information below, but does not describe the file format.
Data flow IDs like
TAS-PAT-001(1)
, wherein:TAS
is the ID of a ‘category’ code with a corresponding name like “Transport Activity & Services”. Individual files (‘workbooks’) contain data for flows within one category.PAT
is the ID of a ‘subcategory’ code with a corresponding name like “Passenger Activity Transit”.001(1)
is the ID of an ‘indicator’ code with a corresponding name like “Passengers Kilometer Travel - Railways”.All data flows with the same initial part like
TAS-PAT-001
contain data for the same measure. The final part(1)
indicates data from alternate sources for the same measure.
Files contain a “TOC” sheet and further individual sheets. See
read_sheet()
, which reads these sheets, for a detailed description of the apparent format.The files are periodically updated.
The update schedule is not fixed in advance.
Previous versions of the files do not appear to be available.
The file metadata contains a “Created by:” field with information like “2022-10-07, 11:41:56, openpyxl”. At least two different dates have been observed:
2022-10-07
2024-05-20.
Code reference#
Asian Development Bank (ADB) provider.
Submodules
Module data
List of all "ECONOMY" codes appearing in processed data. |
|
List of all measures (indicators) appearing in processed data. |
|
Mapping from short codes for ATO data categories to file names. |
- transport_data.adb.CL_ECONOMY = <Codelist ECONOMY (0 items): Asian Transport Outlook subject economy>[source]#
List of all “ECONOMY” codes appearing in processed data.
- transport_data.adb.CS_MEASURE = <ConceptScheme MEASURE (0 items): Asian Transport Outlook measures (indicators)>[source]#
List of all measures (indicators) appearing in processed data.
Todo
Validate against the master list of indicators; or read from that file and validate IDs appearing in data files.
- transport_data.adb.FILES = {'ACC': ('ATO Workbook (ACCESS & CONNECTIVITY (ACC)).xlsx', 'sha256:21c3b7e662d932f5cc61c22489acb3cf0e8a70200abc2372c7fe212602903fd7'), 'APH': ('ATO Workbook (AIR POLLUTION & HEALTH (APH)).xlsx', 'sha256:b06c102a1184ed83d77673146599e11c2b6c81d784ebcee6b46f4d43713e899a'), 'CLC': ('ATO Workbook (CLIMATE CHANGE (CLC)).xlsx', 'sha256:7fe2d4bb656508bf3194406d25ed04c1ac403daaaf1ada74847a2c330efcfb2a'), 'INF': ('ATO Workbook (INFRASTRUCTURE (INF)).xlsx', 'sha256:30b9d05330838809ff0d53b2e61ade935eccdb4b73c0509fd1fc49b405699ac3'), 'MIS': ('ATO Workbook (MISCELLANEOUS (MIS)).xlsx', 'sha256:2ef3cdc5e6363cdca1f671bbf12bf0463fe8a4210cb49b3d32ebd2440c6fe6df'), 'POL': ('ATO Workbook (TRANSPORT POLICY (POL)).xlsx', 'sha256:fbf23b012590b631239654d255d23ccb70fa717b466be8343a5b0f1e8b4ce720'), 'RSA': ('ATO Workbook (ROAD SAFETY (RSA)).xlsx', 'sha256:a4285d129c2739c8660a07a5d1c9902ec21c3cd4d13a2a9dfe9d49daca2c0dd5'), 'SEC': ('ATO Workbook (SOCIO-ECONOMIC (SEC)).xlsx', 'sha256:b5d2ee5d07b5554ef262436ef898afd976fbb4956bd7cb850829cb7391d207c0'), 'TAS': ('ATO Workbook (TRANSPORT ACTIVITY & SERVICES (TAS)).xlsx', 'sha256:628d4e9774f84d30d80706e982e4ddf7187d77f8676e69765c307701da1caf77')}[source]#
Mapping from short codes for ATO data categories to file names.
Functions
|
|
|
Convert df and aa from |
|
Convert the attributes of ATO ds to 1 or more |
|
|
|
|
|
Format the ATO “Source” data attribute as TDC |
|
Prepare an empty data set and associated structures. |
|
|
|
Read a single sheet. |
|
Validate codes for the "ECONOMY" dimension of df against |
- transport_data.adb.convert_sheet(df: DataFrame, aa: AnnotableArtefact)[source]#
Convert df and aa from
read_sheet()
into SDMX data structures.
- transport_data.adb.dataset_to_metadata_reports(ds: DataSet, msd: MetadataStructureDefinition) Iterable[MetadataReport] [source]#
Convert the attributes of ATO ds to 1 or more
MetadataReport
.The metadata reports conform to the TDC metadata structure (
org.metadata.get_msd()
).If ds contains per-series values for attributes named “Source”, “Source (2024-11)”, or similar, then additional metadata reports are generated, one for each series (=GEO, or ‘economy’) and each distinct upstream source indicated by these attribute values.
- transport_data.adb.format_data_provider(value: str) str [source]#
Format the ATO “Source” data attribute as TDC
DATA_PROVIDER
metadata.This makes more explicit how the ATO has handled upstream data.
- transport_data.adb.prepare(aa: AnnotableArtefact) Tuple[DataSet, Callable] [source]#
Prepare an empty data set and associated structures.
- transport_data.adb.read_sheet(ef: ExcelFile, sheet_name: str) Tuple[DataFrame, AnnotableArtefact] [source]#
Read a single sheet.
This function handles the particular layout of sheets in files like those listed in
FILES
. These combine data and metadata.Row 1 is a title row.
Cell range A2:B10 contain a set of metadata fields, with the field name in column A and the value in column B.
Rows 11:13 contain no data or metadata; only a link back to a table of contents sheet.
Row 14 contains a label “Series” centre-spanned across
Row 15 contains column labels, described below.
Row 16 and onwards contain data, followed by two blank rows, and two rows with attribution/acknowledgements.
Columns labeled (i.e. in row 15) “Economy Code” and “Economy Name” contain codes and names, respectively, for the geographic units.
Columns with numeric labels describe time periods, specifically years, that are part of observation keys.
Some sheets have additional columns with non-numeric labels like “Remarks”, “Source (2022-04)”, etc.; these give annotations applying to the observations on the same row (i.e. for a single “Economy Code” and 1 or more time periods).
Note
Sheets in the
POL
category have a different format.
- transport_data.adb.validate_economy(df: DataFrame) DataFrame [source]#
Validate codes for the “ECONOMY” dimension of df against
CL_ECONOMY
.Every unique pair of (Economy Code, Economy Name) is converted to a
Code
.These are added to
CL_ECONOMY
. If a Code with the same ID already exists, it is checked for an exact match (name, description, etc.)The “Economy Code” column of df is renamed “ECONOMY”, and contains only values from
CL_ECONOMY
. The “Economy Name” column is dropped.