Datacube Tools

This (still experimental) module is intended to easily prepare SAR scenes processed by pyroSAR for ingestion into an Open Data Cube.

from pyroSAR.datacube_util import Product, Dataset
from pyroSAR.ancillary import find_datasets

# find pyroSAR files by metadata attributes
archive_s1 = '/.../sentinel1/GRD/processed'
scenes_s1 = find_datasets(archive_s1, sensor=('S1A', 'S1B'), acquisition_mode='IW')

# group the found files by their file basenames
# files with the same basename are considered to belong to the same dataset
grouped = groupby(scenes_s1, 'outname_base')

# define the polarization units describing the data sets
units = {'VV': 'backscatter VV', 'VH': 'backscatter VH'}

# create a new product
with Product(name='S1_GRD_index',
             product_type='gamma0',
             description='Gamma Naught RTC backscatter') as prod:

    for dataset in grouped:
        with Dataset(dataset, units=units) as ds:

            # add the dataset to the product
            prod.add(ds)

            # parse datacube indexing YMLs from product and data set metadata
            prod.export_indexing_yml(ds, 'yml_index_outdir')

    # write the product YML
    prod.write('yml_product')

    # print the product metadata which is written to the product YML
    print(prod)
class pyroSAR.datacube_util.Dataset(filename, units='DN')[source]

Bases: object

A general class describing dataset information required for creating ODC YML files

Parameters
__add__(dataset)[source]

override the + operator. This is intended to easily combine two Dataset objects, which were created from different files belonging to the same measurement, e.g. two GeoTIFFs with one polarization each.

Parameters

dataset (Dataset) – the dataset to add to the current one

Returns

the combination of the two

Return type

Dataset

__radd__(dataset)[source]

similar to Dataset.__add__() but for function sum(), e.g. sum([Dataset1, Dataset2])

Parameters

dataset (Dataset) – the dataset to add to the current one

Returns

the combination of the two

Return type

Dataset

close()[source]
property filenames
Returns

all file names registered in the dataset

Return type

dict

property identifier
Returns

a unique dataset identifier

Return type

str

property units
Returns

all measurement unit names registered in the dataset

Return type

dict

class pyroSAR.datacube_util.Product(definition=None, name=None, product_type=None, description=None)[source]

Bases: object

A class for describing an ODC product definition

Parameters
  • definition (str, list, None) – the source of the product definition; either an existing product YML, a list of Dataset objects, or None. In the latter case the product is defined using the parameters name, product_type and description.

  • name (str) – the name of the product in the data cube

  • product_type (str) – the type of measurement defined in the product, e.g. gamma0

  • description (str) – the description of the product and its measurements

add(dataset)[source]

Add a dataset to the abstracted product description. This first performs a check whether the dataset is compatible with the product and its already existing measurements. If a measurement in the dataset does not yet exist in the product description it is added.

Parameters

dataset (Dataset) – the dataset whose description is to be added

check_integrity(dataset, allow_new_measurements=False)[source]

check if a dataset is compatible with the product definition.

Parameters
  • dataset (Dataset) – the dataset to be checked

  • allow_new_measurements (bool) – allow new measurements to be added to the product definition? If not and the dataset contains measurements, which are not defined in the product, an error is raised.

Raises

RuntimeError

close()[source]
export_indexing_yml(dataset, outdir)[source]

Write a YML file named {Dataset.identifier()}_dcindex.yml, which can be used for indexing a dataset in an Open Data Cube. The file will contain information from the product and the dataset and a test is first performed to check whether the dataset matches the product definition. A unique ID is issued using uuid.uuid4().

Parameters
  • dataset (Dataset) – the dataset for which to export a file for

  • outdir (str) – the directory to write the file to

export_ingestion_yml(outname, product_name, ingest_location, chunking)[source]

Write a YML file, which can be used for ingesting indexed datasets into an Open Data Cube.

Parameters
  • outname (str) – the name of the YML file to write

  • product_name (str) – the name of the product in the ODC

  • ingest_location (str) – the location of the ingested NetCDF files

  • chunking (dict) – a dictionary with keys ‘x’, ‘y’ and ‘time’; determines the size of the netCDF files ingested into the datacube; e.g. {‘x’: 512, ‘y’: 512, ‘time’: 1}

property measurements
Returns

a dictionary with measurement names as keys

Return type

dict of dict

write(ymlfile)[source]

write the product definition to a YML file

Parameters

ymlfile (str) – the file to write to