Development

The transport_data code is:

  • Under development This means its features are not stable and may change at any time. For example, portions of the code may be migrated to other repositories and packages without advance notice or deprecation.

  • Unofficial. The TDCI handles data from data providers such as ato, jrc, and others. However, the derived data products—in particular, SDMX-formatted data and structures—produced by transport_data are strictly unofficial and, unless explicitly stated, have not been checked or validated by the original providers.

Design goals

  • The code is simple, modular, and flat.

  • There is one module per data provider (e.g. ato, estat, jrc).

    • This makes possible the process that:

      • TDCI develops an initial or prototype module for (meta)data from a provider.

      • The provider decides to take over maintenance/development of that code.

      • The module is excised from transport_data, along with all code in its particular subdirectory; the code is adjusted to depend on transport_data.

    • Modules for different data providers have roughly similar semantics (such as function names), but these are not (for now) tightly enforced.

  • Code for handling a specific format is collected in a single module, e.g. iamc, and reused from there.

  • Data is processed from various original formats to SDMX objects, then stored and manipulated as SDMX wherever possible. This ensures that common utilities for manipulating SDMX-structured (meta)data can be applied regardless of the original provider or format(s).

  • transport_data does not duplicate data, metadata, or structural information from data providers. Wherever possible, these are processed or retrieved from original sources, files, etc. transport_data only adds metadata where it is missing from these original sources and has been obtained through independent work by TDCI participants.

  • As little as possible workflow/orchestration code is created. The individual functions/CLI commands in transport_data are kept generic, so they can eventually be incorporated as atomic workflow elements in a framework to be chosen later, or on the backend of the TDC web UI and other systems.

SDMX usage conventions

→ moved to Data standards.

Code style