Code reference

Handle TDC-structured metadata.

Submodules

report

Generate reports about TDC-structured metadata.

spreadsheet

Non-standard TDC Excel file format for collecting metadata.

Module data

CONCEPTS

Concepts and metadata attributes in the TDC metadata structure.

transport_data.org.metadata.CONCEPTS = {'COMMENT': ('Comment', 'Any other information about the metadata values, for instance discrepancies or\nunclear or missing information.\n\nPrecede comments with initials; append to existing comments to keep\nchronological order; and include a date (for example, “2024-07-24”) if helpful.'), 'DATAFLOW': ('Data flow ID', 'A unique identifier for the data flow (=data source, data set, etc.).\n\nWe suggest to use IDs like ‘VN001’, where ‘VN’ is the ISO 3166 alpha-2 country\ncode, and ‘001’ is a unique number. The value MUST match the name of the sheet\nin which it appears.'), 'DATA_DESCR': ('Data description', 'Any information about the data flow that does not fit in other attributes.\n\nUntil or unless other metadata attributes are added to this metadata structure/\ntemplate, this MAY include:\n\n- Any conditions on data access, e.g. publicly available, proprietary, fee or\n  subscription required, available on request, etc.\n- Frequency of data updates.\n- Any indication of quality, including third-party references that indicate data\n  quality.\n'), 'DATA_PROVIDER': ('Data provider', 'Organization or individual that provides the data and any related metadata.\n\nThis can be as general (“IEA”) or specific (organization unit/department, specific\nperson responsible, contact details, etc.) as appropriate.'), 'DIMENSION': ('Dimensions', 'Formally, the “statistical concept used in combination with other statistical\nconcepts to identify a statistical series or individual observations.”\n\nRecord all dimensions of the data, either in a bulleted or numbered list, or\nseparated by semicolons. In parentheses, give some indication of the scope\nand/or resolution of the data along each dimension. Most data have at least time\nand space dimensions.\n\nExample:\n\n- TIME_PERIOD (annual, 5 years up to 2021)\n- REF_AREA (whole country; VN only)\n- Vehicle type (12 different types: […])\n- Emissions species (CO2 and 4 others)'), 'MEASURE': ('Measure (‘indicator’)', 'Statistical concept for which data are provided in the data flow.\n\nIf the data flow contains data for multiple measures, give each one separated by\nsemicolons. Example: “Number of cars; passengers per vehicle”.\n\nThis SHOULD NOT duplicate the value for ‘UNIT_MEASURE’. Example: “Annual driving\ndistance per vehicle”, not “Kilometres per vehicle”.'), 'METHOD': ('Methodology', 'Any information about methods used by the data provider to collect, process,\nor prepare the data.'), 'UNIT_MEASURE': ('Unit of measure', 'Unit in which the data values are expressed.\n\nIf ‘MEASURE’ contains 2+ items separated by semicolons, give the respective units in the\nsame way and order. If there are no units, write ‘dimensionless’, ‘1’, or similar.'), 'URL': ('URL or web address', 'Location on the Internet with further information about the data flow.')}[source]

Concepts and metadata attributes in the TDC metadata structure.

Functions

contains_data_for(mdr, ref_area)

Return True if mdr contains data for ref_area.

dfd_id(mdr)

Return the ID of the dataflow targeted by mdr.

get_cs_common()

Create a shared concept scheme for the concepts referenced by dimensions.

get_msd()

Generate and return the TDC metadata structure definition.

groupby(mds[, key])

Group metadata reports in mds according to a key function.

make_ra(mda_id, value)

Generate a ReportedAttribute for mda_id with the given value.

make_tok(dfd)

Generate a TargetObjectKey that refers to dfd.

map_dims_to_ids(mds)

Return a mapping from unique concept IDs used for dimensions to data flow IDs.

map_values_to_ids(mds, mda_id)

Return a mapping from unique reported attribute values to data flow IDs.

merge_ato(mds)

Extend mds with metadata reports for ADB ATO data flows.

unique_dfd_id(mdr, existing)

Generate a unique DSD ID for mdr.

transport_data.org.metadata.contains_data_for(mdr: MetadataReport, ref_area: str) bool[source]

Return True if mdr contains data for ref_area.

True is returned if any of the following:

  1. The referenced data flow definition has an ID that starts with ref_area.

  2. The country’s ISO 3166 alpha-2 code, alpha-3 code, official name, or common name appears in the value of the DATA_DESCR metadata attribute.

Parameters:

ref_area (str) – ISO 3166 alpha-2 code for a country. Passed to pycountry.countries.lookup().

transport_data.org.metadata.dfd_id(mdr: MetadataReport) str[source]

Return the ID of the dataflow targeted by mdr.

transport_data.org.metadata.get_cs_common() ConceptScheme[source]

Create a shared concept scheme for the concepts referenced by dimensions.

Concepts in this scheme have an annotation tdc-aka, which is a list of alternate IDs recognized for the concept.

transport_data.org.metadata.get_msd() MetadataStructureDefinition[source]

Generate and return the TDC metadata structure definition.

transport_data.org.metadata.groupby(mds: MetadataSet, key=typing.Callable[[ForwardRef('v21.MetadataReport')], typing.Hashable]) dict[Hashable, list[MetadataReport]][source]

Group metadata reports in mds according to a key function.

Similar to itertools.groupby().

transport_data.org.metadata.make_ra(mda_id: str, value: Any) OtherNonEnumeratedAttributeValue[source]

Generate a ReportedAttribute for mda_id with the given value.

transport_data.org.metadata.make_tok(dfd: BaseDataflow) TargetObjectKey[source]

Generate a TargetObjectKey that refers to dfd.

transport_data.org.metadata.map_dims_to_ids(mds: MetadataSet) dict[str, set[str]][source]

Return a mapping from unique concept IDs used for dimensions to data flow IDs.

transport_data.org.metadata.map_values_to_ids(mds: MetadataSet, mda_id: str) dict[str, set[str]][source]

Return a mapping from unique reported attribute values to data flow IDs.

transport_data.org.metadata.merge_ato(mds: MetadataSet) None[source]

Extend mds with metadata reports for ADB ATO data flows.

transport_data.org.metadata.unique_dfd_id(mdr: MetadataReport, existing: set[str]) str[source]

Generate a unique DSD ID for mdr.