Datacube Tools

This (still experimental) module is intended to easily prepare SAR scenes processed by pyroSAR for ingestion into an Open Data Cube.

from pyroSAR.datacube_util import Product, Dataset
from pyroSAR.ancillary import find_datasets

# find pyroSAR files by metadata attributes
archive_s1 = '/.../sentinel1/GRD/processed'
scenes_s1 = find_datasets(archive_s1, sensor=('S1A', 'S1B'), acquisition_mode='IW')

# group the found files by their file basenames
# files with the same basename are considered to belong to the same dataset
grouped = groupby(scenes_s1, 'outname_base')

# define the polarization units describing the data sets
units = {'VV': 'backscatter VV', 'VH': 'backscatter VH'}

# create a new product
with Product(name='S1_GRD_index',
             product_type='gamma0',
             description='Gamma Naught RTC backscatter') as prod:

    for dataset in grouped:
        with Dataset(dataset, units=units) as ds:

            # add the dataset to the product
            prod.add(ds)

            # parse datacube indexing YMLs from product and data set metadata
            prod.export_indexing_yml(ds, 'yml_index_outdir')

    # write the product YML
    prod.write('yml_product')

    # print the product metadata which is written to the product YML
    print(prod)
class pyroSAR.datacube_util.Dataset(filename, units='DN')[source]

Bases: object

A general class describing dataset information required for creating ODC YML files

Parameters:
__add__(dataset)[source]

override the + operator. This is intended to easily combine two Dataset objects, which were created from different files belonging to the same measurement, e.g. two GeoTIFFs with one polarization each.

Parameters:

dataset (Dataset) – the dataset to add to the current one

Returns:

the combination of the two

Return type:

Dataset

__radd__(dataset)[source]

similar to Dataset.__add__() but for function sum(), e.g. sum([Dataset1, Dataset2])

Parameters:

dataset (Dataset) – the dataset to add to the current one

Returns:

the combination of the two

Return type:

Dataset

close()[source]
property filenames
Returns:

all file names registered in the dataset

Return type:

dict

property identifier
Returns:

a unique dataset identifier

Return type:

str

property units
Returns:

all measurement unit names registered in the dataset

Return type:

dict

class pyroSAR.datacube_util.Product(definition=None, name=None, product_type=None, description=None)[source]

Bases: object

A class for describing an ODC product definition

Parameters:
  • definition (str, list, None) – the source of the product definition; either an existing product YML, a list of Dataset objects, or None. In the latter case the product is defined using the parameters name, product_type and description.

  • name (str) – the name of the product in the data cube

  • product_type (str) – the type of measurement defined in the product, e.g. gamma0

  • description (str) – the description of the product and its measurements

add(dataset)[source]

Add a dataset to the abstracted product description. This first performs a check whether the dataset is compatible with the product and its already existing measurements. If a measurement in the dataset does not yet exist in the product description it is added.

Parameters:

dataset (Dataset) – the dataset whose description is to be added

check_integrity(dataset, allow_new_measurements=False)[source]

check if a dataset is compatible with the product definition.

Parameters:
  • dataset (Dataset) – the dataset to be checked

  • allow_new_measurements (bool) – allow new measurements to be added to the product definition? If not and the dataset contains measurements, which are not defined in the product, an error is raised.

Raises:

RuntimeError

close()[source]
export_indexing_yml(dataset, outdir)[source]

Write a YML file named {Dataset.identifier()}_dcindex.yml, which can be used for indexing a dataset in an Open Data Cube. The file will contain information from the product and the dataset and a test is first performed to check whether the dataset matches the product definition. A unique ID is issued using uuid.uuid4().

Parameters:
  • dataset (Dataset) – the dataset for which to export a file for

  • outdir (str) – the directory to write the file to

export_ingestion_yml(outname, product_name, ingest_location, chunking)[source]

Write a YML file, which can be used for ingesting indexed datasets into an Open Data Cube.

Parameters:
  • outname (str) – the name of the YML file to write

  • product_name (str) – the name of the product in the ODC

  • ingest_location (str) – the location of the ingested NetCDF files

  • chunking (dict) – a dictionary with keys ‘x’, ‘y’ and ‘time’; determines the size of the netCDF files ingested into the datacube; e.g. {‘x’: 512, ‘y’: 512, ‘time’: 1}

property measurements
Returns:

a dictionary with measurement names as keys

Return type:

dict of dict

write(ymlfile)[source]

write the product definition to a YML file

Parameters:

ymlfile (str) – the file to write to