Datacube Tools¶
This (still experimental) module is intended to easily prepare SAR scenes processed by pyroSAR for ingestion into an Open Data Cube.
from pyroSAR.datacube_util import Product, Dataset
from pyroSAR.ancillary import find_datasets
# find pyroSAR files by metadata attributes
archive_s1 = '/.../sentinel1/GRD/processed'
scenes_s1 = find_datasets(archive_s1, sensor=('S1A', 'S1B'), acquisition_mode='IW')
# group the found files by their file basenames
# files with the same basename are considered to belong to the same dataset
grouped = groupby(scenes_s1, 'outname_base')
# define the polarization units describing the data sets
units = {'VV': 'backscatter VV', 'VH': 'backscatter VH'}
# create a new product
with Product(name='S1_GRD_index',
product_type='gamma0',
description='Gamma Naught RTC backscatter') as prod:
for dataset in grouped:
with Dataset(dataset, units=units) as ds:
# add the dataset to the product
prod.add(ds)
# parse datacube indexing YMLs from product and data set metadata
prod.export_indexing_yml(ds, 'yml_index_outdir')
# write the product YML
prod.write('yml_product')
# print the product metadata which is written to the product YML
print(prod)
- class pyroSAR.datacube_util.Dataset(filename, units='DN')[source]¶
Bases:
object
A general class describing dataset information required for creating ODC YML files
- Parameters
- __add__(dataset)[source]¶
override the + operator. This is intended to easily combine two Dataset objects, which were created from different files belonging to the same measurement, e.g. two GeoTIFFs with one polarization each.
- __radd__(dataset)[source]¶
similar to
Dataset.__add__()
but for functionsum()
, e.g.sum([Dataset1, Dataset2])
- class pyroSAR.datacube_util.Product(definition=None, name=None, product_type=None, description=None)[source]¶
Bases:
object
A class for describing an ODC product definition
- Parameters
definition (str, list, None) – the source of the product definition; either an existing product YML, a list of
Dataset
objects, or None. In the latter case the product is defined using the parameters name, product_type and description.name (str) – the name of the product in the data cube
product_type (str) – the type of measurement defined in the product, e.g. gamma0
description (str) – the description of the product and its measurements
- add(dataset)[source]¶
Add a dataset to the abstracted product description. This first performs a check whether the dataset is compatible with the product and its already existing measurements. If a measurement in the dataset does not yet exist in the product description it is added.
- Parameters
dataset (Dataset) – the dataset whose description is to be added
- check_integrity(dataset, allow_new_measurements=False)[source]¶
check if a dataset is compatible with the product definition.
- Parameters
- Raises
- export_indexing_yml(dataset, outdir)[source]¶
Write a YML file named {
Dataset.identifier()
}_dcindex.yml, which can be used for indexing a dataset in an Open Data Cube. The file will contain information from the product and the dataset and a test is first performed to check whether the dataset matches the product definition. A unique ID is issued usinguuid.uuid4()
.
- export_ingestion_yml(outname, product_name, ingest_location, chunking)[source]¶
Write a YML file, which can be used for ingesting indexed datasets into an Open Data Cube.
- Parameters
outname (str) – the name of the YML file to write
product_name (str) – the name of the product in the ODC
ingest_location (str) – the location of the ingested NetCDF files
chunking (dict) – a dictionary with keys ‘x’, ‘y’ and ‘time’; determines the size of the netCDF files ingested into the datacube; e.g. {‘x’: 512, ‘y’: 512, ‘time’: 1}