Earth Science Data Analytics/2015-8-20 Telecon

From Federation of Earth Science Information Partners

ESDA Telecon notes – 8/20/15

Known Attendees:

ESIP Host (Annie Burgess), Steve Kempler, Chung-lin Shie, Tiffany Mathews, Sean Barberie, Beth Huffer, Byron Peters, Denise, Jennifer Davis, Emily Northup, Soren Scott, Brand Niemann, Joan Aron, Rob Casey, Ward Fleir, Robert Downs, Victor Zlotnicki, Ethan McMahon

Agenda:

Agenda:

1. Summer sessions recap

2. (Coming closer to) Finalize ESDA Definitions and Types

3. Use Cases

4. Next Steps (further characterize types --> requirements)

5. Open Mic


Presentations:

None, this time.

Use Case Information: https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit

Notes:

Thank you all for attending and participating in a very productive meeting.


Firstly, allow me to apologize for being remiss for not introducing Sean Barberie to our group. Sean is an ESIP student fellow, who asked to join and help out the ESDA Cluster, as fellows do. Eager to learn, sharp with computer science/technology, and a good 'fellow', he is soon graduating (or recently graduated ... hint hint).


Steve's started by recapping the two ESIP ESDA summer sessions: 'Teaching Science Data Analytics Skills, and the Earth Science Data Scientist' (http://commons.esipfed.org/node/7999) and 'The Need for Earth Science Data Analytics to Facilitate Community Resilience (and other applications)' (http://commons.esipfed.org/node/7998). The former session provided, by three guest speakers, personal experience and insights into the expertise/training needed to be a Data Scientist, and perform Data Analytics, and examples of Earth science research that require such skills. The latter session made progress in developing a proposed ESIP Earth science data analytics definition, and defining Earth science data analytics types (renamed goals, in today's telecon). This discussion continued today...


The following was the Data Analytics definition unique to Earth science at the start of the discussion:

The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:

  • Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  • Data Reduction – Smartly removing data that do not fit research criteria
  • Data Analysis – Applying techniques/methods to derive results


Initiated by Sean, reminding us that Earth science data is spatial and temporal in nature, we realized that recognition of this characteristic of Earth science data is needed in our definition, to make it ESDA unique. Tiffany, Joan, others concurred, and Victor provided text that enhanced our ESDA definition, as follows:


The process of examining large amounts of spatial (3D), temporal, and/or spectral data of a variety of data types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:

  • Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  • Data Reduction – Smartly removing data that do not fit research criteria
  • Data Analysis – Applying techniques/methods to derive results


Like it?


We next moved into a discussion about ESDA types, and were reminded why it is important to have a clear definition of ESDA types: To better identify key needs that data analytics tools/techniques can be developed to address. (i.e., Data Analytics tools typically do not address all ESDA needs; It is possible that ESDA needs are not addressed by current Data Analytics tools (at least, very well). Bottom line: To specify ESDA tool/technique requirements by type.


It was quickly decided that 'types' was not a good term to specify the different ways ESDA are used nor indicative of results oriented Earth science data analytics. In addition, 'types' is a term used to describe data analytics more applicable in the business data analytics world. To avoid confusion, we blessed the term 'goals', and now have defined ESDA Goals as a means for distinguishing key needs that data analytics tools/techniques can be developed for.


ESDA Goals (read: Earth science data analytics needed ...)

1 To calibrate data

2 To validate data (note it does not have to be via data intercomparison)

3 To assess data quality

4 To perform course data preparation (e.g., subsetting, data mining, transformations, recover data)

5 To intercompare data (i.e., any data intercomparison; Could be used to better define validation/quality)

6 To tease out information from data

7 To glean knowledge from data and information

8 To forecast/predict phenomena (i.e., Special kind of conclusion)

9 To derive conclusions (i.e., that do not easily fall into another type)

10 To derive new analytics tools


Chung-Lin and Rob advocated for an 11th ESDA Goal: 'Improve data quality'. Post telecon communications has suggested it would just as good to redefine Goal #3, above, as: 'Assess and Improve Data Quality'. However, to scratch all itches, 'Assess and Improve Data Quality (includes: Evaluate, assess data; Improve dataset; Determine improvement actions)', is also offered. Note that this just provides more description of the goal, which is an action to be performed for all goals.


Thoughts?


Other goals suggestions:

  • Validate forecast models. But Goal #2 does not distinguish what data to validate.
  • Goal #1, Calibration, is needed before achieving all other goals. But calibration can be an end goal in itself, that requires it's own techniques of analytics.


During the telecon, Steve provided a 'to do' list to describe our road ahead, that included:


Almost done:

1. Finalize ESDA Definition and Goal categories

Start thinking about next:

2. Write a white paper for ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved

3. Acquire many more additional use cases

4. Characterize use cases by Goal categories and other analytics driving considerations

5. Derive requirements from #4

6. Survey existing data analytics tools/techniques

7. Write our paper describing ... all the above




Going to AGU?

The following Data Analytics / Big Data related sessions are listed to occur at the AGU in December:

  • Advanced Information Systems to Support Climate Projection Data Analysis

Gerald L Potter, Tsengdar J Lee, Dean Norman Williams, and Chris A Mattmann

  • Big Data Analytics for Scientific Data

Emily Law, Michael M Little, Daniel J Crichton, and Padma A Yanamandra-Fisher

  • Big Data in Earth Science – From Hype to Reality

Kwo-Sen Kuo, Rahul Ramachandran, Ben James Kingston Evans. and Mike M Little

  • Big Data in the Geosciences: New Analytics Methods and Parallel Algorithms

Jitendra Kumar and Forrest M Hoffman

  • Computing Big Earth Data

Michael M Little, Darren L. Smith, Piyush Mehrotra, and Daniel Duffy

  • Geophysical Science Data Analytics Use Case Scenarios

Steven J Kempler, Robert R Downs, Tiffany Joi Mathews, and John S Hughes

  • Man vs. Machine - Machine Learning and Cognitive Computing in the Earth Sciences

Jens F Klump, Xiaogang Ma, Jess Robertson and Peter A Fox

  • New approaches for designing Big Data databases

David W Gallaher and Glenn Grant

  • Partnerships and Big Data Facilities in a Big Data World

Kenneth S Casey and Danie Kinkade

  • Towards a Career in Data Science: Pathways and Perspectives

Karen I Stocks, Lesley A Wyborn, Ruth Duerr, and Lynn Yarmey


Next Telecon:

Thursday, September 17, 2015, 3:00 EST


Agenda:

Among other things, finalize ESDA Definition and Goals, and begin preparing statement for ESIP approval

Actions:

Steve: Finish adding ESDA Goal Description text

All: Review and provide comments to ESDA Definition and Goal categories