Datacube Tools

This (still experimental) module is intended to easily prepare SAR scenes processed by pyroSAR for ingestion into an Open Data Cube.

from pyroSAR.datacube_util import Product, Dataset
from pyroSAR.ancillary import find_datasets

# find pyroSAR files by metadata attributes
archive_s1 = '/.../sentinel1/GRD/processed'
scenes_s1 = find_datasets(archive_s1, sensor=('S1A', 'S1B'), acquisition_mode='IW')

# group the found files by their file basenames
# files with the same basename are considered to belong to the same dataset
grouped = groupby(scenes_s1, 'outname_base')

# define the polarization units describing the data sets
units = {'VV': 'backscatter VV', 'VH': 'backscatter VH'}

# create a new product
with Product(name='S1_GRD_index',
             product_type='gamma0',
             description='Gamma Naught RTC backscatter') as prod:

    for dataset in grouped:
        with Dataset(dataset, units=units) as ds:

            # add the dataset to the product
            prod.add(ds)

            # parse datacube indexing YMLs from product and data set metadata
            prod.export_indexing_yml(ds, 'yml_index_outdir')

    # write the product YML
    prod.write('yml_product')

    # print the product metadata which is written to the product YML
    print(prod)

class pyroSAR.datacube_util.Dataset(filename, units='DN')[source]

Bases: object

A general class describing dataset information required for creating ODC YML files

Parameters:

filename (str, list, Dataset) – the product to be used; either an existing Dataset object or a (list of) file(s) matching the pyroSAR naming pattern, i.e. that can be parsed by pyroSAR.ancillary.parse_datasetname()
units (str or dict) – the units of the product measurement

__add__(dataset)[source]

override the + operator. This is intended to easily combine two Dataset objects, which were created from different files belonging to the same measurement, e.g. two GeoTIFFs with one polarization each.

Parameters:: dataset (Dataset) – the dataset to add to the current one
Returns:: the combination of the two
Return type:: Dataset

__radd__(dataset)[source]

similar to Dataset.__add__() but for function sum(), e.g. sum([Dataset1, Dataset2])

Parameters:: dataset (Dataset) – the dataset to add to the current one
Returns:: the combination of the two
Return type:: Dataset

close()[source]

property filenames

Returns:: all file names registered in the dataset
Return type:: dict

property identifier

Returns:: a unique dataset identifier
Return type:: str

property units

Returns:: all measurement unit names registered in the dataset
Return type:: dict

class pyroSAR.datacube_util.Product(definition=None, name=None, product_type=None, description=None)[source]

Bases: object

A class for describing an ODC product definition

Parameters:

definition (str, list, None) – the source of the product definition; either an existing product YML, a list of Dataset objects, or None. In the latter case the product is defined using the parameters name, product_type and description.
name (str) – the name of the product in the data cube
product_type (str) – the type of measurement defined in the product, e.g. gamma0
description (str) – the description of the product and its measurements

add(dataset)[source]

Add a dataset to the abstracted product description. This first performs a check whether the dataset is compatible with the product and its already existing measurements. If a measurement in the dataset does not yet exist in the product description it is added.

Parameters:: dataset (Dataset) – the dataset whose description is to be added

check_integrity(dataset, allow_new_measurements=False)[source]

check if a dataset is compatible with the product definition.

Parameters:

dataset (Dataset) – the dataset to be checked
allow_new_measurements (bool) – allow new measurements to be added to the product definition? If not and the dataset contains measurements, which are not defined in the product, an error is raised.

Raises:

RuntimeError –

close()[source]

export_indexing_yml(dataset, outdir)[source]

Write a YML file named {Dataset.identifier()}_dcindex.yml, which can be used for indexing a dataset in an Open Data Cube. The file will contain information from the product and the dataset and a test is first performed to check whether the dataset matches the product definition. A unique ID is issued using uuid.uuid4().

Parameters:

dataset (Dataset) – the dataset for which to export a file for
outdir (str) – the directory to write the file to

export_ingestion_yml(outname, product_name, ingest_location, chunking)[source]

Write a YML file, which can be used for ingesting indexed datasets into an Open Data Cube.

Parameters:

outname (str) – the name of the YML file to write
product_name (str) – the name of the product in the ODC
ingest_location (str) – the location of the ingested NetCDF files
chunking (dict) – a dictionary with keys ‘x’, ‘y’ and ‘time’; determines the size of the netCDF files ingested into the datacube; e.g. {‘x’: 512, ‘y’: 512, ‘time’: 1}

property measurements

Returns:: a dictionary with measurement names as keys
Return type:: dict of dict

write(ymlfile)[source]

write the product definition to a YML file

Parameters:: ymlfile (str) – the file to write to