By Anastasios Mantas, University of Athens

A data cube is a multidimensional array of values. It is a natural data structure for storing analysis-ready Earth observation (EO) data and other kinds of multidimensional data. Due to this, a number of data cube infrastructures targeting EO data have been developed recently (e.g., the Open Data Cube infrastructure in Australia, the Euro Data Cube and Earth System Data Cube funded by ESA etc.). These data cube infrastructures offer libraries and APIs (e.g., datacube, YAXArrays) to store and query multidimensional data. However, before data cube infrastructures became a trend, there had already been lots of research and development on array data base management systems (DBMSs) that offer declarative query languages for modeling and querying multidimensional data. The notion of semantics was introduced by Augustin et al. (DATA, 2019) for data cube systems. The key distinction between traditional data cube systems and semantic ones is that the former (almost exclusively) store numbers, while the latter also offer ways to store high-level concepts denoting the meaning behind those numbers. For instance, a 10-day average land surface temperature above 35°C can be cleanly described as a heat wave. But how exactly can we build such a data cube system, so that we may leverage semantics and present information in the most comprehensible manner?

Plato enables the semantic enrichment of the Earth System Data Cube (ESDC). The system allows users to query EO data, other Linked Open Data, and information/knowledge extracted from the data, using a semantic query language, thus creating new value chains.

The development of Plato allows us to express the following classes of queries:

  1. Queries on the Earth Observation data.
  2. Semantic queries on the low-level content of the image.
  3. Semantic queries on the high-level content of the image.
  4. Any of the above query classes together with a spatial and temporal extent.
  5. Any of the above query classes together with a reference to an external data source.

In current data cubes, only queries of Class 1 and 2 are supported. Through the implementation of Plato in DeepCube, posting queries of all of the above classes is made possible. This is a significant research output of the project, which is also applicable to three Use Cases: Climate induced Migration in AfricaFire Hazard Forecasting in the Mediterranean, and Copernicus Services for Sustainable Tourism.

The architectural overview of the system is shown in Figure 1 below. The core of Plato is the system Ontop. Ontop creates virtual geospatial Resource Description Framework graphs – commonly known as RDF graphs – on top of geospatial data models, such as the ones supported by ESDC. The geometries are then mapped to GeoSPARQL geometry literals using ontologies and Ontology Based Data Access (OBDA)/R2RML mappings. Ontop can be used as a standard SPARQL endpoint that can execute GeoSPARQL queries on top of ESDC. Therefore, it can be used in complement with other tools that produce, manage, explore, and visualize geospatial RDF data.

At the bottom, we can see the data rest in either a standard data cube, or in auxiliary data sources (e.g., CSV files or shapefiles). In order to utilize Ontop, we had to face the challenge of the impedance mismatch between a data model (an ontology) based on directed graphs of classes, instances, properties, attributes and values and the Analysis Ready Data (ARD) stored in a data cube. The technologies we used to address this issue are PostGIS (PostgreSQL extension for geospatial support), virtual tables implemented as Multicorn Foreign Data Wrappers (FDWs) and xarray. FDWs are libraries that can communicate with external data sources while hiding connection and data retrieval details. Multicorn is a PostgreSQL extension which enables us to implement our own FDWs in Python. Last but not least, xarray is a Python package which allows us to access data cubes conveniently.

Figure 1: Architecture of Plato

So, what are the steps to develop an application in Plato?

First, based on the description of the input sources for the application, we need to build: a) an ontology that captures the knowledge of the specific domain, and b) a mapping file to map ontology concepts to the data sources. To populate the materialized tables, we then have to import every related CSV file and shapefile into PostGIS. Next, we must define the virtual tables, i.e., implement the FDWs for retrieving data from the cubes. Finally, we construct GeoSPARQL queries based on the application requirements and execute them through a GeoSPARQL endpoint, which is used as an API. Optionally, the results can be visualized using Sextant, a web-based app for interacting with time-evolving linked geospatial data. Figures 2 and 3 illustrate the workflow of our system, given a simple query for the Fire Hazard Use Case (Greek dataset).

Figure 2: Example query in Plato (natural language)
Figure 3: Visualization of query results in Sextant

Plato is currently the only semantic data cube system using OBDA technologies. In summary, this new system allows users to query Earth Observation data and information/knowledge extracted from that data, using a semantic query language. This query is rewritten using ontology axioms and mappings, and is executed at the data sources (data cubes and other external data sources). The answers are collected and returned to the user.