Earth System Data Cube

Image source: Mahecha, Miguel (2017): Earth System Data Cube. figshare. Figure. https://doi.org/10.6084/m9.figshare.4822930.v2

The Earth System Data Cube (ESDC) was jointly developed by the Max Planck Institute for Biogeochemistry and Brockmann Consult GmbH in the context of the ESA-funded project Earth System Data Lab, a virtual lab to access a wide array of Earth observations across space, time and variables. ESDC allows the organization (or “cubing”) of raster data into a structure that enables the efficient spatio-temporal querying and analytics on time series of satellite imagery. This technology transforms multi-modal Earth Observation data to Analysis Ready Data (ARD) and provides tools to do analyses for detecting trends and performing analytics at scale, by avoiding in-memory-only operations. Extended functionalities of ESDC include the handling of multi-resolution gridded data in space and time, inclusion of categorical data types, optimizations of time series access versus spatial-slicing and interfaces to Machine Learning algorithms (e.g. chunking and batch generation) with both spatial and temporal convolutions.

The Earth System Data Cube seeks to be a service to the scientific community to facilitate access and exploitation of multivariate datasets in Earth Sciences. The atmosphere, terrestrial biosphere, hydrosphere, pedosphere and oceans are characterized by a series of complex phenomena whose interactions control the dynamics of the Earth system as a whole. Today we are confronted with a multitude of long-term and spatially connected data streams that monitor these phenomena, and help us to quantify the conditions of e.g. terrestrial vegetation, its seasonal dynamics, trends and extreme anomalies. However, it remains very difficult to jointly analyze multiple data together to actually understand the interactions between the Earth’s subsystems. This has to do with various obstacles, ranging from data discoverability, formatting inconsistencies, incompatible spatio-temporal resolutions to access restrictions. ESDC overcomes these barriers in a generic way. The core part of the ESDC is the data in analysis-ready form, together with tools and methods to generate, access, and exploit the ESDC.

Image source: Mahecha, M. D., Gans, F., Brandt, G., Christiansen, R., Cornell, S. E., Fomferra, N., Kraemer, G., Peters, J., Bodesheim, P., Camps-Valls, G., Donges, J. F., Dorigo, W., Estupinan-Suarez, L. M., Gutierrez-Velez, V. H., Gutwin, M., Jung, M., Londoño, M. C., Miralles, D. G., Papastefanou, P., and Reichstein, M.: Earth system data cubes unravel global multivariate dynamics, Earth Syst. Dynam., 11, 201–234, https://doi.org/10.5194/esd-11-201-2020, 2020.

Data Cubes – a data cube essentially consists of screened, or Analysis Ready Data (ARD), with the dimensions “latitude”, “longitude”, “time”, “variable”. Further dimensions can be added as a result of an analysis. Currently, ESDC supports a common spatio-temporal grid, DeepCube will advance to create Data Cubes where information layers are stored in heterogeneous spatio-temporal resolution.

Framework – ESDC offers a framework to effectively map user-defined functions to these data cubes, forming a virtual laboratory (accessed through Jupyter labs) to be able to fully concentrate on the exploration of high-dimensional data across domains. The user is able to write their own functions, which are then mapped to the dimension of the cube. Currently we support Python and Julia, and R. ESDC is committed to open source computations, and open data usage. Dynamic resource allocation and rapid scalability of ESDC are its cornerstones for data analysis on the cloud.

Interested in learning more? Contact us!
Fabian Gans, [email protected]