transport_data.util.pooch.Pooch

class transport_data.util.pooch.Pooch(base_url, registry: dict | Path | None = None, urls: dict | None = None, retry_if_failed: int = 0, allow_updates: bool = True, module: str = '', expand: Callable | None = None, processor: str | None = None)[source]

Bases: Pooch

pooch.Pooch with special powers.

Parameters:
  • module (str) – The path argument to pooch.Pooch is derived automatically from this, within the transport-data directory.

  • expand (callable, optional) – If given, a function that receives any argument and returns a file name found within the Pooch registry.

  • processor – If the exact value “unzip”, fetch() operations are always processed to unzip archive contents into the same directory as the downloaded file.

__init__(base_url, registry: dict | Path | None = None, urls: dict | None = None, retry_if_failed: int = 0, allow_updates: bool = True, module: str = '', expand: Callable | None = None, processor: str | None = None) None

Methods

__init__(base_url[, registry, urls, ...])

fetch(fname, *args, **kwargs)

Get the absolute path to a file in the local storage.

get_url(fname)

Get the full URL to download a file in the registry.

is_available(fname, *args, **kwargs)

Check availability of a remote file without downloading it.

load_registry(fname)

Load entries from a file and add them to the registry.

load_registry_from_doi()

Populate the registry using the data repository API.

path_for(*args, **kwargs)

Return a filename and local cache path for the data file.

Attributes

abspath

Absolute path to the local storage

registry_files

List of file names on the registry

property abspath

Absolute path to the local storage

fetch(fname: str, *args, **kwargs) str

Get the absolute path to a file in the local storage.

If it’s not in the local storage, it will be downloaded. If the hash of the file in local storage doesn’t match the one in the registry, will download a new copy of the file. This is considered a sign that the file was updated in the remote storage. If the hash of the downloaded file still doesn’t match the one in the registry, will raise an exception to warn of possible file corruption.

Post-processing actions sometimes need to be taken on downloaded files (unzipping, conversion to a more efficient format, etc). If these actions are time or memory consuming, it would be best to do this only once right after the file is downloaded. Use the processor argument to specify a function that is executed after the download to perform these actions. See Processors: Post-download actions for details.

Custom file downloaders can be provided through the downloader argument. By default, Pooch will determine the download protocol from the URL in the registry. If the server for a given file requires authentication (username and password), use a downloader that support these features. Downloaders can also be used to print custom messages (like a progress bar), etc. See Downloaders: Customizing the download for details.

Parameters:
  • fname (str) – The file name (relative to the base_url of the remote data storage) to fetch from the local storage.

  • processor (None or callable) – If not None, then a function (or callable object) that will be called before returning the full path and after the file has been downloaded. See Processors: Post-download actions for details.

  • downloader (None or callable) – If not None, then a function (or callable object) that will be called to download a given URL to a provided local file name. See Downloaders: Customizing the download for details.

  • progressbar (bool or an arbitrary progress bar object) – If True, will print a progress bar of the download to standard error (stderr). Requires tqdm to be installed. Alternatively, an arbitrary progress bar object can be passed. See Using custom progress bars for details.

Returns:

full_path – The absolute path (including the file name) of the file in the local storage.

Return type:

str

get_url(fname)

Get the full URL to download a file in the registry.

Parameters:

fname (str) – The file name (relative to the base_url of the remote data storage) to fetch from the local storage.

is_available(fname, *args, **kwargs) bool

Check availability of a remote file without downloading it.

Use this method when working with large files to check if they are available for download.

Parameters:
  • fname (str) – The file name (relative to the base_url of the remote data storage).

  • downloader (None or callable) – If not None, then a function (or callable object) that will be called to check the availability of the file on the server. See Downloaders: Customizing the download for details.

Returns:

status – True if the file is available for download. False otherwise.

Return type:

bool

load_registry(fname)

Load entries from a file and add them to the registry.

Use this if you are managing many files.

Each line of the file should have file name and its hash separated by a space. Hash can specify checksum algorithm using “alg:hash” format. In case no algorithm is provided, SHA256 is used by default. Only one file per line is allowed. Custom download URLs for individual files can be specified as a third element on the line. Line comments can be added and must be prepended with #.

Parameters:

fname (str | fileobj) – Path (or open file object) to the registry file.

load_registry_from_doi() None

Populate the registry using the data repository API.

This version differs from the parent class in using doi_to_repository() from the current module.

path_for(*args, **kwargs)

Return a filename and local cache path for the data file.

property registry_files

List of file names on the registry