Difference between revisions of "Earth Science Data Analytics/2015-8-20 Telecon"

From Earth Science Information Partners (ESIP)
 
(12 intermediate revisions by the same user not shown)
Line 38: Line 38:
  
  
The telecon started with a Data Analytics definition unique Earth science that looked like this:
+
The following was the Data Analytics definition unique to Earth science at the start of the discussion:
  
 
The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:
 
The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:
  * Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
+
* Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  * Data Reduction – Smartly removing data that do not fit research criteria
+
* Data Reduction – Smartly removing data that do not fit research criteria
  * Data Analysis – Applying techniques/methods to derive results
+
* Data Analysis – Applying techniques/methods to derive results
  
  
We then discussed our 2 ESIP ESDA sessions.
+
Initiated by Sean, reminding us that Earth science data is spatial and temporal in nature, we realized that recognition of this characteristic of Earth science data is needed in our definition, to make it ESDA unique. Tiffany, Joan, others concurred, and Victor provided text that enhanced our ESDA definition, as follows:
  
1.  Teaching Science Data Analytics Skills, and the Earth Science Data Scientist (http://commons.esipfed.org/node/7999)
 
  
We will have 4 speakers, who will provide their experiences in being, or needing, a Data Scientist in their work.  The goal of this session is to discuss and extract real project data scientist/analytics experience needs, initiated by presentation and discussed by session participants.  Of special interest is bringing together people who have needs for data scientists (data analytics) and will be able to articulate those needs by the end of the session, and/or; stir ideas: for the use of data analytics in their research or to build tools/services for others. 
+
'''The process of examining large amounts of spatial (3D), temporal, and/or spectral data of a variety of data types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:'''
 +
* '''Data Preparation – Preparing heterogeneous data so that they can ‘play’ together'''
 +
* '''Data Reduction – Smartly removing data that do not fit research criteria'''
 +
* '''Data Analysis – Applying techniques/methods to derive results'''
  
Presenters:
 
  
Wade Bishop, School of Information Sciences, University of Tennessee
+
Like it?
  
Peter Fox, Earth & Environmental Sciences, Tetherless World Constellation, Rensselaer Polytechnic Institute
 
  
Lewis McGibbney, Computer Science for Data Intensive Systems Group, Jet Propulsion Laboratory
+
We next moved into a discussion about ESDA types, and were reminded why it is important to have a clear definition of ESDA types:  To better identify key needs that data analytics tools/techniques can be developed to address.  (i.e., Data Analytics tools typically do not address all ESDA needs; It is possible that ESDA needs are not addressed by current Data Analytics tools (at least, very well).  Bottom line:  To specify ESDA tool/technique requirements by type.
  
Karen Stocks, Director, Geological Data Center, Scripps Institution of Oceanography
 
  
 +
'''It was quickly decided that 'types' was not a good term to specify the different ways ESDA are used nor indicative of results oriented Earth science data analytics'''.  In addition, 'types' is a term used to describe data analytics more applicable in the business data analytics world.  To avoid confusion, we blessed the term 'goals', and now have '''defined ESDA Goals as a means for distinguishing key needs that data analytics tools/techniques can be developed for.'''
  
2. The Need for Earth Science Data Analytics to Facilitate Community Resilience (and other applications) (http://commons.esipfed.org/node/7998)
 
  
This will be an open forum idea sharing discussion session.  This cluster session will review our current work (for new participants), followed by discussion on the extent of social, economic, and environmental issues, as well as science research, in which the advancement of Earth science data analytics have had an impact. The goal of this discussion is to gain sufficient information to categorize how Earth science data analytics has come to be used in our society, and identify use cases that exemplify this
+
'''ESDA Goals (read: Earth science data analytics needed ...)'''
  
 +
'''1    To calibrate data'''
  
Today, we discussed:    - What should our summer session goals be and how do we get there? 
+
'''2    To validate data (note it does not have to be via data intercomparison)'''
  
The following goals were identified:
+
'''3    To assess data quality'''
  
 +
'''4    To perform course data preparation (e.g., subsetting, data mining, transformations, recover data)'''
  
1. Identify use cases that address societal issues, specifically
+
'''5    To intercompare data (i.e., any data intercomparison; Could be used to better define validation/quality)'''
  
2.  Identify datasets (Earth science and otherwise) needed to address the use case issues
+
'''6    To tease out information from data'''
  
3.  Identify techniques that may be applied to gleaning information out of the data targeted at addressing the issues
+
'''7    To glean knowledge from data and information'''
  
 +
'''8    To forecast/predict phenomena (i.e., Special kind of conclusion)'''
  
What we should keep in the back of our mind is how we can show the significance of Earth science data analytics in addressing societal issues.
+
'''9    To derive conclusions (i.e., that do not easily fall into another type)'''
  
 +
'''10  To derive new analytics tools'''
  
The telecon ended with a reminder by Annie that presenting a ESDA cluster poster at the summer meeting would be good.  Describing the activities, speakers, and things we learned would be a good idea.  We'll put a poster together based on information we shared through our website.
 
  
 +
Chung-Lin and Rob advocated for an 11th ESDA Goal:  'Improve data quality'.  Post telecon communications has suggested it would just as good to redefine Goal #3, above, as:  'Assess and Improve Data Quality'.  However, to scratch all itches, 'Assess and Improve Data Quality (includes:  Evaluate, assess data; Improve dataset; Determine improvement actions)', is also offered.  Note that this just provides more description of the goal, which is an action to be performed for all goals.
  
After our telecon Rob Casey shared a very insightful e-mail regarding ESDA's relationship to community resilience.  (Thanks Rob)  With permission, I am providing Rob's insights here, and feel it can stimulate further discussion this in July (unfortunately, Rob won't be able to make the meeting).
 
  
---------
+
Thoughts?
From Rob:
 
  
I think one of the first things to consider, which can help answer the 'why' of ESDA, is what does it mean to benefit society?  What does society need from science?  What is community resilience?
 
  
A community is resilient if it can effectively respond to unexpected events.
+
Other goals suggestions: 
  
A community is resilient if it can prepare or engineer for dangerous eventualities.
+
* Validate forecast models.  But Goal #2 does not distinguish what data to validate. 
 +
* Goal #1, Calibration, is needed before achieving all other goals.  But calibration can be an end goal in itself, that requires it's own techniques of analytics.
  
A community is resilient if it can avert a dangerous event through preemptive action.
 
  
 +
During the telecon, Steve provided a 'to do' list to describe our road ahead, that included:
  
The state of affairs with science as applied to benefitting society has been observation of past data to explain phenomena and detection systems that can serve as warning measures to protect a population from a danger in progress.
 
  
ESDA can serve to take us beyond this state of affairs at a number of levels:
+
Almost done:
  
* Gathering of a larger variety of datasets to apply correlations and identify possible precursors to adverse events -- this can improve early warning systems and enhance predictive algorithms to plot the course or effect of damaging events
+
1.  Finalize ESDA Definition and Goal categories
  
* Gathering and computation of a large enough set of data at high resolution to create detailed and reliable simulations of adverse events to establish hazard probabilities -- this informs disaster preparedness planning as well as engineering preparation
+
Start thinking about next:
  
*  Establishing associations of possible cause and effect occurrences, and applying these to predictive modelsKnowing causes of adverse effects can establish a target for mitigation or avoidance.  Accurate prediction of future effects can motivate society to act more quickly and effectively to curtail the looming issue.
+
2Write a white paper for ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved
  
* Refining the quality of gathered data will improve its usefulness and accuracy for long-tail studiesData preparation is a key ingredient to useful and meaningful analytics.  A proper program of data governance and continual data improvement ensures that data is always available and always useful.
+
3. Acquire many more additional use cases
 +
 
 +
4Characterize use cases by Goal categories and other analytics driving considerations
 +
 
 +
5.  Derive requirements from #4
 +
 
 +
6Survey existing data analytics tools/techniques
 +
 
 +
7.  Write our paper describing ... all the above
  
*  Good analytics also requires good tools and good visuals.  Can the right people see the information they need easily and readily.  Is it presented in such a way that it is useful, meaningful, and comphrensible to scientists and decision makers?  Is there a base of tool technology that allows for grassroots growth of data analysis with community contribution?
 
  
The theme to this session will need to apply what is possible with ESDA technology and techniques and bring it back to where society will benefit.  Society needs to evolve from studying a problem to predicting and even preventing a problem to attain maximum resilience.  So many of our ills that deal with nature and anthropogenic effects on nature can be readily listed and an agency studying it can be identified.
 
 
----------
 
----------
  
  
FYI:
+
Going to AGU?
  
The following Data Analytics / Big Data related sessions are listed to occur at the AGU next December:
+
The following Data Analytics / Big Data related sessions are listed to occur at the AGU in December:
  
 
* '''Advanced Information Systems to Support Climate Projection Data Analysis'''
 
* '''Advanced Information Systems to Support Climate Projection Data Analysis'''
Line 152: Line 158:
 
Karen I Stocks, Lesley A Wyborn, Ruth Duerr, and Lynn Yarmey
 
Karen I Stocks, Lesley A Wyborn, Ruth Duerr, and Lynn Yarmey
  
===Next Meetings:===
 
Tuesday, July 14, 2015, 8:30 PST, at Asilomar, first thing Tuesday morning
 
  
Thursday, July 16, 2015, 10:30 PST, at Asilomar
+
===Next Telecon:===
 +
 
 +
Thursday, September 17, 2015, 3:00 EST
  
  
 
Agenda:   
 
Agenda:   
  
See above
+
Among other things, finalize ESDA Definition and Goals, and begin preparing statement for ESIP approval
  
 +
===Actions:===
  
===Actions:===
+
Steve: Finish adding ESDA Goal Description text
  
All: Be at Asilomar
+
All: Review and provide comments to ESDA Definition and Goal categories

Latest revision as of 07:18, August 24, 2015

ESDA Telecon notes – 8/20/15

Known Attendees:

ESIP Host (Annie Burgess), Steve Kempler, Chung-lin Shie, Tiffany Mathews, Sean Barberie, Beth Huffer, Byron Peters, Denise, Jennifer Davis, Emily Northup, Soren Scott, Brand Niemann, Joan Aron, Rob Casey, Ward Fleir, Robert Downs, Victor Zlotnicki, Ethan McMahon

Agenda:

Agenda:

1. Summer sessions recap

2. (Coming closer to) Finalize ESDA Definitions and Types

3. Use Cases

4. Next Steps (further characterize types --> requirements)

5. Open Mic


Presentations:

None, this time.

Use Case Information: https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit

Notes:

Thank you all for attending and participating in a very productive meeting.


Firstly, allow me to apologize for being remiss for not introducing Sean Barberie to our group. Sean is an ESIP student fellow, who asked to join and help out the ESDA Cluster, as fellows do. Eager to learn, sharp with computer science/technology, and a good 'fellow', he is soon graduating (or recently graduated ... hint hint).


Steve's started by recapping the two ESIP ESDA summer sessions: 'Teaching Science Data Analytics Skills, and the Earth Science Data Scientist' (http://commons.esipfed.org/node/7999) and 'The Need for Earth Science Data Analytics to Facilitate Community Resilience (and other applications)' (http://commons.esipfed.org/node/7998). The former session provided, by three guest speakers, personal experience and insights into the expertise/training needed to be a Data Scientist, and perform Data Analytics, and examples of Earth science research that require such skills. The latter session made progress in developing a proposed ESIP Earth science data analytics definition, and defining Earth science data analytics types (renamed goals, in today's telecon). This discussion continued today...


The following was the Data Analytics definition unique to Earth science at the start of the discussion:

The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:

  • Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  • Data Reduction – Smartly removing data that do not fit research criteria
  • Data Analysis – Applying techniques/methods to derive results


Initiated by Sean, reminding us that Earth science data is spatial and temporal in nature, we realized that recognition of this characteristic of Earth science data is needed in our definition, to make it ESDA unique. Tiffany, Joan, others concurred, and Victor provided text that enhanced our ESDA definition, as follows:


The process of examining large amounts of spatial (3D), temporal, and/or spectral data of a variety of data types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:

  • Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  • Data Reduction – Smartly removing data that do not fit research criteria
  • Data Analysis – Applying techniques/methods to derive results


Like it?


We next moved into a discussion about ESDA types, and were reminded why it is important to have a clear definition of ESDA types: To better identify key needs that data analytics tools/techniques can be developed to address. (i.e., Data Analytics tools typically do not address all ESDA needs; It is possible that ESDA needs are not addressed by current Data Analytics tools (at least, very well). Bottom line: To specify ESDA tool/technique requirements by type.


It was quickly decided that 'types' was not a good term to specify the different ways ESDA are used nor indicative of results oriented Earth science data analytics. In addition, 'types' is a term used to describe data analytics more applicable in the business data analytics world. To avoid confusion, we blessed the term 'goals', and now have defined ESDA Goals as a means for distinguishing key needs that data analytics tools/techniques can be developed for.


ESDA Goals (read: Earth science data analytics needed ...)

1 To calibrate data

2 To validate data (note it does not have to be via data intercomparison)

3 To assess data quality

4 To perform course data preparation (e.g., subsetting, data mining, transformations, recover data)

5 To intercompare data (i.e., any data intercomparison; Could be used to better define validation/quality)

6 To tease out information from data

7 To glean knowledge from data and information

8 To forecast/predict phenomena (i.e., Special kind of conclusion)

9 To derive conclusions (i.e., that do not easily fall into another type)

10 To derive new analytics tools


Chung-Lin and Rob advocated for an 11th ESDA Goal: 'Improve data quality'. Post telecon communications has suggested it would just as good to redefine Goal #3, above, as: 'Assess and Improve Data Quality'. However, to scratch all itches, 'Assess and Improve Data Quality (includes: Evaluate, assess data; Improve dataset; Determine improvement actions)', is also offered. Note that this just provides more description of the goal, which is an action to be performed for all goals.


Thoughts?


Other goals suggestions:

  • Validate forecast models. But Goal #2 does not distinguish what data to validate.
  • Goal #1, Calibration, is needed before achieving all other goals. But calibration can be an end goal in itself, that requires it's own techniques of analytics.


During the telecon, Steve provided a 'to do' list to describe our road ahead, that included:


Almost done:

1. Finalize ESDA Definition and Goal categories

Start thinking about next:

2. Write a white paper for ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved

3. Acquire many more additional use cases

4. Characterize use cases by Goal categories and other analytics driving considerations

5. Derive requirements from #4

6. Survey existing data analytics tools/techniques

7. Write our paper describing ... all the above




Going to AGU?

The following Data Analytics / Big Data related sessions are listed to occur at the AGU in December:

  • Advanced Information Systems to Support Climate Projection Data Analysis

Gerald L Potter, Tsengdar J Lee, Dean Norman Williams, and Chris A Mattmann

  • Big Data Analytics for Scientific Data

Emily Law, Michael M Little, Daniel J Crichton, and Padma A Yanamandra-Fisher

  • Big Data in Earth Science – From Hype to Reality

Kwo-Sen Kuo, Rahul Ramachandran, Ben James Kingston Evans. and Mike M Little

  • Big Data in the Geosciences: New Analytics Methods and Parallel Algorithms

Jitendra Kumar and Forrest M Hoffman

  • Computing Big Earth Data

Michael M Little, Darren L. Smith, Piyush Mehrotra, and Daniel Duffy

  • Geophysical Science Data Analytics Use Case Scenarios

Steven J Kempler, Robert R Downs, Tiffany Joi Mathews, and John S Hughes

  • Man vs. Machine - Machine Learning and Cognitive Computing in the Earth Sciences

Jens F Klump, Xiaogang Ma, Jess Robertson and Peter A Fox

  • New approaches for designing Big Data databases

David W Gallaher and Glenn Grant

  • Partnerships and Big Data Facilities in a Big Data World

Kenneth S Casey and Danie Kinkade

  • Towards a Career in Data Science: Pathways and Perspectives

Karen I Stocks, Lesley A Wyborn, Ruth Duerr, and Lynn Yarmey


Next Telecon:

Thursday, September 17, 2015, 3:00 EST


Agenda:

Among other things, finalize ESDA Definition and Goals, and begin preparing statement for ESIP approval

Actions:

Steve: Finish adding ESDA Goal Description text

All: Review and provide comments to ESDA Definition and Goal categories