Archive

Archive

Utility for storing SAR image metadata in a database

drop_archive

drop (delete) a scene database

class pyroSAR.archive.Archive(dbfile, custom_fields=None, postgres=False, user='postgres', password='1234', host='localhost', port=5432, cleanup=True, legacy=False)[source]

Bases: SceneArchive

Utility for storing SAR image metadata in a database

Parameters:
  • dbfile (str) – the filename for the SpatiaLite database. This might either point to an existing database or will be created otherwise. If postgres is set to True, this will be the name for the PostgreSQL database.

  • custom_fields (dict[str, Any] | None) – a dictionary containing additional non-standard database column names and data types; the names must be attributes of the SAR scenes to be inserted (i.e. id.attr) or keys in their meta attribute (i.e. id.meta[‘attr’])

  • postgres (bool) – enable postgres driver for the database. Default: False

  • user (str) – required for postgres driver: username to access the database. Default: ‘postgres’

  • password (str) – required for postgres driver: password to access the database. Default: ‘1234’

  • host (str) – required for postgres driver: host where the database is hosted. Default: ‘localhost’

  • port (int) – required for postgres driver: port number to the database. Default: 5432

  • cleanup (bool) – check whether all registered scenes exist and remove missing entries?

  • legacy (bool) – open an outdated database in legacy mode to import into a new database. Opening an outdated database without legacy mode will throw a RuntimeError.

Examples

Ingest all Sentinel-1 scenes in a directory and its subdirectories into the database:

>>> from pyroSAR import Archive, identify
>>> from spatialist.ancillary import finder
>>> dbfile = '/.../scenelist.db'
>>> archive_s1 = '/.../sentinel1/GRD'
>>> scenes_s1 = finder(archive_s1, [r'^S1.*.zip'], regex=True, recursive=True)
>>> with Archive(dbfile) as archive:
>>>     archive.insert(scenes_s1)

select all Sentinel-1 A/B scenes stored in the database, which

  • overlap with a test site

  • were acquired in Ground-Range-Detected (GRD) Interferometric Wide Swath (IW) mode before 2018

  • contain a VV polarization image

  • have not been processed to directory outdir before

>>> from pyroSAR import Archive
>>> from spatialist import Vector
>>> archive = Archive('/.../scenelist.db')
>>> site = Vector('/path/to/site.shp')
>>> outdir = '/path/to/processed/results'
>>> maxdate = '20171231T235959'
>>> selection_proc = archive.select(vectorobject=site, processdir=outdir,
>>>                                 maxdate=maxdate, sensor=['S1A', 'S1B'],
>>>                                 product='GRD', acquisition_mode='IW', vv=1)
>>> archive.close()

Alternatively, the with statement can be used. In this case to just check whether one particular scene is already registered in the database:

>>> from pyroSAR import identify, Archive
>>> scene = identify('S1A_IW_SLC__1SDV_20150330T170734_20150330T170801_005264_006A6C_DA69.zip')
>>> with Archive('/.../scenelist.db') as archive:
>>>     print(archive.is_registered(scene.scene))

When providing ‘postgres’ as driver, a PostgreSQL database will be created at a given host. Additional arguments are required.

>>> from pyroSAR import Archive, identify
>>> from spatialist.ancillary import finder
>>> dbfile = 'scenelist_db'
>>> archive_s1 = '/.../sentinel1/GRD'
>>> scenes_s1 = finder(archive_s1, [r'^S1.*.zip'], regex=True, recursive=True)
>>> with Archive(dbfile, driver='postgres', user='user', password='password', host='host', port=5432) as archive:
>>>     archive.insert(scenes_s1)

Importing an old database:

>>> from pyroSAR import Archive
>>> db_new = 'scenes.db'
>>> db_old = 'scenes_old.db'
>>> with Archive(db_new) as db:
>>>     with Archive(db_old, legacy=True) as db_old:
>>>         db.import_outdated(db_old)
add_tables(tables)[source]

Add tables to the database per sqlalchemy.schema.Table Tables provided here will be added to the database.

Note

Columns using Geometry must have setting management=True for SQLite, for example: geometry = Column(Geometry('POLYGON', management=True, srid=4326))

Parameters:

tables (Table | list[Table]) – The table(s) to be added to the database.

Return type:

None

cleanup()[source]

Remove all scenes from the database, which are no longer stored in their registered location

Return type:

None

close()[source]

close the database connection

Return type:

None

drop_element(scene, with_duplicates=False)[source]

Drop a scene from the data table. If the duplicates table contains a matching entry, it will be moved to the data table.

Parameters:
  • scene (str) – a SAR scene

  • with_duplicates (bool) – True: delete matching entry in duplicates table False: move matching entry from duplicates into data table

Return type:

None

drop_table(table)[source]

Drop a table from the database.

Parameters:

table (str) – the table name

Return type:

None

export2shp(path, table='data')[source]

export the database to a shapefile

Parameters:
  • path (str) – the path of the shapefile to be written. This will overwrite other files with the same name. If a folder is given in path it is created if not existing. If the file extension is missing ‘.shp’ is added.

  • table (str) – the table to write to the shapefile; either ‘data’ (default) or ‘duplicates’

Return type:

None

filter_scenelist(scenelist)[source]

Filter a list of scenes by file names already registered in the database.

Parameters:

scenelist (list[str | ID]) – the scenes to be filtered

Return type:

list[str | ID]

Returns:

The objects of scenelist for all scenes whose basename is not yet registered in the database.

get_colnames(table='data')[source]

Return the names of all columns of a table.

Return type:

list[str]

Returns:

the column names of the chosen table

get_tablenames(return_all=False)[source]

Return the names of all tables in the database

Parameters:

return_all (bool) – only gives tables data and duplicates on default. Set to True to get all other tables and views created automatically.

Return type:

list[str]

Returns:

the table names

get_unique_directories()[source]

Get a list of directories containing registered scenes

Return type:

list[str]

Returns:

the directory names

import_outdated(dbfile)[source]

import an older database

Parameters:

dbfile (str | Archive) – the old database. If this is a string, the name of a CSV file is expected.

Return type:

None

insert(scene_in, pbar=False, test=False)[source]

Insert one or many scenes into the database

Parameters:
  • scene_in (str | ID | list[str | ID]) – a SAR scene or a list of scenes to be inserted

  • pbar (bool) – show a progress bar?

  • test (bool) – should the insertion only be tested or directly be committed to the database?

Return type:

None

is_registered(scene)[source]

Simple check if a scene is already registered in the database.

Parameters:

scene (str | ID) – the SAR scene

Return type:

bool

Returns:

is the scene already registered?

move(scenelist, directory, pbar=False)[source]

Move a list of files while keeping the database entries up to date. If a scene is registered in the database (in either the data or duplicates table), the scene entry is directly changed to the new location.

Parameters:
  • scenelist (list[str]) – the file locations

  • directory (str) – a folder to which the files are moved

  • pbar (bool) – show a progress bar?

Return type:

None

select(sensor=None, product=None, acquisition_mode=None, mindate=None, maxdate=None, vectorobject=None, date_strict=True, processdir=None, recursive=False, polarizations=None, return_value='scene', **kwargs)[source]

select scenes from the database

Parameters:
  • sensor (str | list[str] | None) – the satellite sensor(s)

  • product (str | list[str] | None) – the product type(s)

  • acquisition_mode (str | list[str] | None) – the sensor’s acquisition mode(s)

  • mindate (str | datetime | None) – the minimum acquisition date; strings must be in format YYYYmmddTHHMMSS; default: None

  • maxdate (str | datetime | None) – the maximum acquisition date; strings must be in format YYYYmmddTHHMMSS; default: None

  • vectorobject (Vector | None) – a geometry with which the scenes need to overlap. The object may only contain one feature.

  • date_strict (bool) –

    treat dates as strict limits or also allow flexible limits to incorporate scenes whose acquisition period overlaps with the defined limit?

    • strict: start >= mindate & stop <= maxdate

    • not strict: stop >= mindate & start <= maxdate

  • processdir (str | None) – A directory to be scanned for already processed scenes; the selected scenes will be filtered to those that have not yet been processed. Default: None

  • recursive (bool) – (only if processdir is not None) should also the subdirectories of the processdir be scanned?

  • polarizations (list[str] | None) – a list of polarization strings, e.g. [‘HH’, ‘VV’]

  • return_value (str | list[str]) –

    the query return value(s). Options:

    • geometry_wkb: the scene’s footprint geometry formatted as WKB

    • geometry_wkt: the scene’s footprint geometry formatted as WKT

    • mindate: the acquisition start datetime in UTC formatted as YYYYmmddTHHMMSS

    • maxdate: the acquisition end datetime in UTC formatted as YYYYmmddTHHMMSS

    • all further database column names (see get_colnames())

  • **kwargs (Any) – any further arguments (columns), which are registered in the database. See get_colnames()

Return type:

list[str | bytes] | list[tuple[str | bytes]]

Returns:

If a single return_value is specified: list of values for that attribute. If multiple return_values are specified: list of tuples containing the requested attributes. The return value type is bytes for geometry_wkb and str for all others.

select_duplicates(outname_base=None, scene=None, value='id')[source]

Select scenes from the duplicates table. In case both outname_base and scene are set to None all scenes in the table are returned, otherwise only those that match the attributes outname_base and scene if they are not None.

Parameters:
  • outname_base (str | None) – the basename of the scene

  • scene (str | None) – the scene name

  • value (Literal['id', 'scene']) – the return value; either ‘id’ or ‘scene’

Return type:

list[str]

Returns:

the selected scene(s)

property size: tuple[int, int]

get the number of scenes registered in the database

Return type:

the number of scenes in (1) the main table and (2) the duplicates table

static to_str(string, encoding='utf-8')[source]
Return type:

str

class pyroSAR.archive.SceneArchive(*args, **kwargs)[source]

Bases: Protocol

Common interface for scene archive backends.

Implementations may represent local databases, STAC catalogs, remote APIs, or other scene repositories, but should expose a consistent select method and support context-manager usage.

__enter__()[source]

Enter the archive context.

Return type:

SceneArchive

__exit__(exc_type, exc_val, exc_tb)[source]

Exit the archive context and release resources if necessary.

Return type:

None

close()[source]

Release open resources.

Implementations that do not hold resources may implement this as a no-op.

Return type:

None

static select(sensor=None, product=None, acquisition_mode=None, mindate=None, maxdate=None, vectorobject=None, date_strict=True, return_value='scene')[source]

Select scenes matching the query parameters.

Parameters:
  • sensor (str | list[str] | None) – One sensor or a list of sensors.

  • product (str | list[str] | None) – One product type or a list of product types.

  • acquisition_mode (str | list[str] | None) – One acquisition mode or a list of acquisition modes.

  • mindate (str | datetime | None) – Minimum acquisition date/time.

  • maxdate (str | datetime | None) – Maximum acquisition date/time.

  • vectorobject (Vector | None) – Spatial search geometry.

  • date_strict (bool) – Whether date filtering should be strict.

  • return_value (str | list[str]) – One return field or a list of return fields.

  • **kwargs – Backend-specific optional query arguments.

Return type:

list[Any]

Returns:

The query result. Implementations may return a list of scalar values or tuples depending on return_value.

pyroSAR.archive.drop_archive(archive)[source]

drop (delete) a scene database

Parameters:

archive (Archive) – the database to be deleted

See also

sqlalchemy_utils.functions.drop_database()

Return type:

None

Examples

>>> pguser = os.environ.get('PGUSER')
>>> pgpassword = os.environ.get('PGPASSWORD')
>>> db = Archive('test', postgres=True, port=5432, user=pguser, password=pgpassword)
>>> drop_archive(db)