Proposed Definition and Goals of Earth Science Data Analytics

From Earth Science Information Partners (ESIP)

[The following is from Steve Kempler's email to the ESIP ExComm from 9 Dec 2015]

Earth Science Data Analytics (ESDA) Definition and Goals

Earth Science Data Analytics encapsulates the processes and workflows useful for processing data specifically from the perspective of scientific needs. Data Analytics definitions found in the literature are based on problems solved in the business world, characterized by having finite inputs, constraints, and well defined solution sets. Analyzing heterogeneous Earth science data take on very different methods and results orientation. Specifically, Earth science data analytics involves problems that are solved utilizing a vast number of combinations of inputs, applied methods, and unascertainable, but evolving results. Thus, we have a need to define Earth Science Data Analytics (ESDA). Having a clear ESDA definition (currently void in the literature) will facilitate the development of ESDA techniques and tools that focus on Earth science. In addition, the specification of ESDA science driven goals, will further refine data analytics methods that can address the specific needs of Earth science researchers in maximizing the co-utilization of today’s, and tomorrow’s, large heterogeneous datasets.

The ESDA Cluster proposes that the ESIP Federation adopt the following Earth Science Data Analytics definition:

The process of examining, preparing, reducing, and analyzing large amounts of spatial (multi-dimensional), temporal, or spectral data using a variety of data types to uncover patterns, correlations and other information, to better understand our Earth.

This encompasses:

  • Data Preparation – Preparing heterogeneous data so that they can be jointly analyzed
  • Data Reduction – Correcting, ordering and simplifying data in support of analytic objectives
  • Data Analysis – Applying techniques/methods to derive results

The ESDA Cluster proposes that the ESIP Federation adopt the following Goals of Earth Science Data Analytics:

ESDA Goals (read: Earth science data analytics needed ...)

  1. To calibrate data
  2. To validate data (note it does not have to be via data intercomparison)
  3. To assess data quality
  4. To perform coarse data preparation (e.g., subsetting data, mining data, transforming data, recovering data)
  5. To intercompare datasets (i.e., any data intercomparison; Could be used to better define validation/quality)
  6. To tease out information from data
  7. To glean knowledge from data and information
  8. To forecast/predict/model phenomena (i.e., Special kind of conclusion)
  9. To derive conclusions (i.e., that do not easily fall into another type)
  10. To derive new analytics tools

Articulating a definition of Earth Science Data Analytics through the ESIP Federation is important because the ESIP Federation is a leading organization in the advancement of Earth science information management and technology, and is in a position, through it’s members, to lead the way in understanding, defining, promoting, developing, and implementing Earth science data analytics in support of advancing the usage and usability of ‘big’, heterogeneous data. If adopted, the ESDA Cluster will publish the ESIP ESDA definition and goals, a breakthrough in understanding and promulgating the increasingly sophisticated requirements involved in Earth science research.