Difference between revisions of "Earth Science Data Analytics/2015-9-17 Telecon"

From Earth Science Information Partners (ESIP)
 
(2 intermediate revisions by the same user not shown)
Line 4: Line 4:
  
 
ESIP Host (Annie Burgess), Steve Kempler, Tiffany Mathews, Sean Barberie, Beth Huffer, Denise, Robert Downs, Thomas Hearty
 
ESIP Host (Annie Burgess), Steve Kempler, Tiffany Mathews, Sean Barberie, Beth Huffer, Denise, Robert Downs, Thomas Hearty
 +
  
 
===Agenda:===
 
===Agenda:===
Line 28: Line 29:
  
 
Use Case Information:  https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit
 
Use Case Information:  https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit
 +
  
 
===Notes:===
 
===Notes:===
Line 37: Line 39:
 
Discussion began around finalizing the ESIP ESDA Definition and Goals for endorsement by the ESIP Federation.  At this point the process for gaining ESIP approval was described:
 
Discussion began around finalizing the ESIP ESDA Definition and Goals for endorsement by the ESIP Federation.  At this point the process for gaining ESIP approval was described:
  
 +
* Bring a short white paper describing what is to be endorsed to the ESIP ExComm
 +
* This is followed by a 30 day review cycle,
 +
* Questions, suggestions, recommendations, to update proposed endorsement, are provided, or an explanation for not endorsing.
 +
* The endorsement is put up for vote
  
Bring a short white paper describing what is to be endorsed to the ESIP ExComm
+
Steve will initiate the writing of the white paper.  And seeking writing and reviewing partners.
 
 
This is followed by a 30 day review cycle,
 
 
 
Questions, suggestions, recommendations, to update proposed endorsement, are provided, or an explanation for not endorsing.
 
 
 
The endorsement s put up for vote
 
 
 
 
 
Steve will initiate the writing of the white paper.
 
 
 
 
 
 
 
 
 
Steve's started by recapping the two ESIP ESDA summer sessions: 'Teaching Science Data Analytics Skills, and the Earth Science Data Scientist' (http://commons.esipfed.org/node/7999) and 'The Need for Earth Science Data Analytics to Facilitate Community Resilience (and other applications)' (http://commons.esipfed.org/node/7998).  The former session provided, by three guest speakers, personal experience and insights into the expertise/training needed to be a Data Scientist, and perform Data Analytics, and examples of Earth science research that require such skills.  The latter session made progress in developing a proposed ESIP Earth science data analytics definition, and defining Earth science data analytics types (renamed goals, in today's telecon).  This discussion continued today...
 
 
 
 
 
The following was the Data Analytics definition unique to Earth science at the start of the discussion:
 
 
 
The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:
 
* Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
 
* Data Reduction – Smartly removing data that do not fit research criteria
 
* Data Analysis – Applying techniques/methods to derive results
 
 
 
  
Initiated by Sean, reminding us that Earth science data is spatial and temporal in nature, we realized that recognition of this characteristic of Earth science data is needed in our definition, to make it ESDA unique.  Tiffany, Joan, others concurred, and Victor provided text that enhanced our ESDA definition, as follows:
 
  
 +
After a few more minor tweaks, this ESIP cluster's definition for Earth Science Data Analytics is:
  
 
'''The process of examining large amounts of spatial (3D), temporal, and/or spectral data of a variety of data types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:'''
 
'''The process of examining large amounts of spatial (3D), temporal, and/or spectral data of a variety of data types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:'''
Line 72: Line 55:
  
  
Like it?
+
and the goals of Earth Science Data Analytics, in which such analytics can be categorized, include:
 
 
 
 
We next moved into a discussion about ESDA types, and were reminded why it is important to have a clear definition of ESDA types:  To better identify key needs that data analytics tools/techniques can be developed to address.  (i.e., Data Analytics tools typically do not address all ESDA needs; It is possible that ESDA needs are not addressed by current Data Analytics tools (at least, very well).  Bottom line:  To specify ESDA tool/technique requirements by type.
 
 
 
 
 
'''It was quickly decided that 'types' was not a good term to specify the different ways ESDA are used nor indicative of results oriented Earth science data analytics'''.  In addition, 'types' is a term used to describe data analytics more applicable in the business data analytics world.  To avoid confusion, we blessed the term 'goals', and now have '''defined ESDA Goals as a means for distinguishing key needs that data analytics tools/techniques can be developed for.'''
 
  
  
Line 104: Line 81:
  
  
Chung-Lin and Rob advocated for an 11th ESDA Goal:  'Improve data quality'.  Post telecon communications has suggested it would just as good to redefine Goal #3, above, as:  'Assess and Improve Data Quality'.  However, to scratch all itches, 'Assess and Improve Data Quality (includes:  Evaluate, assess data; Improve dataset; Determine improvement actions)', is also offered.  Note that this just provides more description of the goal, which is an action to be performed for all goals.
+
These will be the basis for the ESIP Federation definition and goals for Earth Science Data Analytics
 
 
 
 
Thoughts?
 
 
 
 
 
Other goals suggestions: 
 
 
 
* Validate forecast models.  But Goal #2 does not distinguish what data to validate. 
 
* Goal #1, Calibration, is needed before achieving all other goals.  But calibration can be an end goal in itself, that requires it's own techniques of analytics.
 
  
  
During the telecon, Steve provided a 'to do' list to describe our road ahead, that included:
+
During the telecon, Steve reviewed a 'to do' list to describe our road ahead, that included:
  
 
+
Done:
Almost done:
 
  
 
1.  Finalize ESDA Definition and Goal categories
 
1.  Finalize ESDA Definition and Goal categories
  
Start thinking about next:
+
Initiate:
  
 
2.  Write a white paper for ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved
 
2.  Write a white paper for ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved
Line 137: Line 104:
 
7.  Write our paper describing ... all the above
 
7.  Write our paper describing ... all the above
  
 +
 +
Questions to think about:
 +
 +
What is the best way to record use cases, and associated requirements, and matching tools?  A forum?
  
 
----------
 
----------
Line 183: Line 154:
 
Agenda:   
 
Agenda:   
  
Among other things, finalize ESDA Definition and Goals, and begin preparing statement for ESIP approval
+
Among other things, discuss statement for ESIP approval; Discuss process for matching use case requirements with capabilities of existing tools.
  
 
===Actions:===
 
===Actions:===
  
Steve:  Finish adding ESDA Goal Description text
+
Steve:  Initiate draft endorsement paper
 +
 
 +
Volunteers:  Review endorsement paper, when ready
  
All:  Review and provide comments to ESDA Definition and Goal categories
+
All:  Think about process for matching use case requirements with capabilities of existing tools.

Latest revision as of 13:06, November 6, 2015

ESDA Telecon notes – 9/17/15

Known Attendees:

ESIP Host (Annie Burgess), Steve Kempler, Tiffany Mathews, Sean Barberie, Beth Huffer, Denise, Robert Downs, Thomas Hearty


Agenda:

Agenda:

1. Use Cases

A. Updated Use Case Template with cluster recommendations. Added: Dominant Data Analytics B. Use Case Status – one more added

2. Finalize ESDA Analytics Definitions and Goals

3. Associate Analytics Tools/Techniques Requirements with Analytics Goals

4. Are we ready to propose an ESIP ESDA Definition and Goals statement for ESIP approval?

5. Open Mic


Presentations:

None, this time.

Use Case Information: https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit


Notes:

Thank you all for attending and participating in our telecon.


Discussion began around finalizing the ESIP ESDA Definition and Goals for endorsement by the ESIP Federation. At this point the process for gaining ESIP approval was described:

  • Bring a short white paper describing what is to be endorsed to the ESIP ExComm
  • This is followed by a 30 day review cycle,
  • Questions, suggestions, recommendations, to update proposed endorsement, are provided, or an explanation for not endorsing.
  • The endorsement is put up for vote

Steve will initiate the writing of the white paper. And seeking writing and reviewing partners.


After a few more minor tweaks, this ESIP cluster's definition for Earth Science Data Analytics is:

The process of examining large amounts of spatial (3D), temporal, and/or spectral data of a variety of data types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:

  • Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  • Data Reduction – Smartly removing data that do not fit research criteria
  • Data Analysis – Applying techniques/methods to derive results


and the goals of Earth Science Data Analytics, in which such analytics can be categorized, include:


ESDA Goals (read: Earth science data analytics needed ...)

1 To calibrate data

2 To validate data (note it does not have to be via data intercomparison)

3 To assess data quality

4 To perform course data preparation (e.g., subsetting, data mining, transformations, recover data)

5 To intercompare data (i.e., any data intercomparison; Could be used to better define validation/quality)

6 To tease out information from data

7 To glean knowledge from data and information

8 To forecast/predict phenomena (i.e., Special kind of conclusion)

9 To derive conclusions (i.e., that do not easily fall into another type)

10 To derive new analytics tools


These will be the basis for the ESIP Federation definition and goals for Earth Science Data Analytics


During the telecon, Steve reviewed a 'to do' list to describe our road ahead, that included:

Done:

1. Finalize ESDA Definition and Goal categories

Initiate:

2. Write a white paper for ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved

3. Acquire many more additional use cases

4. Characterize use cases by Goal categories and other analytics driving considerations

5. Derive requirements from #4

6. Survey existing data analytics tools/techniques

7. Write our paper describing ... all the above


Questions to think about:

What is the best way to record use cases, and associated requirements, and matching tools? A forum?



Going to AGU?

The following Data Analytics / Big Data related sessions are listed to occur at the AGU in December:

  • Advanced Information Systems to Support Climate Projection Data Analysis

Gerald L Potter, Tsengdar J Lee, Dean Norman Williams, and Chris A Mattmann

  • Big Data Analytics for Scientific Data

Emily Law, Michael M Little, Daniel J Crichton, and Padma A Yanamandra-Fisher

  • Big Data in Earth Science – From Hype to Reality

Kwo-Sen Kuo, Rahul Ramachandran, Ben James Kingston Evans. and Mike M Little

  • Big Data in the Geosciences: New Analytics Methods and Parallel Algorithms

Jitendra Kumar and Forrest M Hoffman

  • Computing Big Earth Data

Michael M Little, Darren L. Smith, Piyush Mehrotra, and Daniel Duffy

  • Geophysical Science Data Analytics Use Case Scenarios

Steven J Kempler, Robert R Downs, Tiffany Joi Mathews, and John S Hughes

  • Man vs. Machine - Machine Learning and Cognitive Computing in the Earth Sciences

Jens F Klump, Xiaogang Ma, Jess Robertson and Peter A Fox

  • New approaches for designing Big Data databases

David W Gallaher and Glenn Grant

  • Partnerships and Big Data Facilities in a Big Data World

Kenneth S Casey and Danie Kinkade

  • Towards a Career in Data Science: Pathways and Perspectives

Karen I Stocks, Lesley A Wyborn, Ruth Duerr, and Lynn Yarmey


Next Telecon:

Thursday, November 12, 2015, 3:00 EST


Agenda:

Among other things, discuss statement for ESIP approval; Discuss process for matching use case requirements with capabilities of existing tools.

Actions:

Steve: Initiate draft endorsement paper

Volunteers: Review endorsement paper, when ready

All: Think about process for matching use case requirements with capabilities of existing tools.