Earth Science Data Analytics/2015-8-20 Telecon

From Earth Science Information Partners (ESIP)

ESDA Telecon notes – 8/20/15

Known Attendees:

ESIP Host (Annie Burgess), Steve Kempler, Chung-lin Shie, Tiffany Mathews, Sean Barberie, Beth Huffer, Byron Peters, Denise, Jennifer Davis, Emily Northup, Soren Scott, Brand Niemann, Joan Aron, Rob Casey, Ward Fleir, Robert Downs, Victor Zlotnicki, Ethan McMahon

Agenda:

Agenda:

1. Summer sessions recap

2. (Coming closer to) Finalize ESDA Definitions and Types

3. Use Cases

4. Next Steps (further characterize types --> requirements)

5. Open Mic


Presentations:

None, this time.

Use Case Information: https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit

Notes:

Thank you all for attending and participating in a very productive meeting.


Firstly, allow me to apologize for being remiss for not introducing Sean Barberie to our group. Sean is an ESIP student fellow, who asked to join and help out the ESDA Cluster, as fellows do. Eager to learn, sharp with computer science/technology, and a good 'fellow', he is soon graduating (or recently graduated ... hint hint).


Steve's started by recapping the two ESIP ESDA summer sessions: 'Teaching Science Data Analytics Skills, and the Earth Science Data Scientist' (http://commons.esipfed.org/node/7999) and 'The Need for Earth Science Data Analytics to Facilitate Community Resilience (and other applications)' (http://commons.esipfed.org/node/7998). The former session provided, by three guest speakers, personal experience and insights into the expertise/training needed to be a Data Scientist, and perform Data Analytics, and examples of Earth science research that require such skills. The latter session made progress in developing a proposed ESIP Earth science data analytics definition, and defining Earth science data analytics types (renamed goals, in today's telecon). This discussion continued today...


The telecon started with a Data Analytics definition unique Earth science that looked like this:

The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information, involving one or more of the following:

  - Data Preparation – Preparing heterogeneous data so that they can ‘play’ together
  - Data Reduction – Smartly removing data that do not fit research criteria
  - Data Analysis – Applying techniques/methods to derive results


We then discussed our 2 ESIP ESDA sessions.

1. Teaching Science Data Analytics Skills, and the Earth Science Data Scientist (http://commons.esipfed.org/node/7999)

We will have 4 speakers, who will provide their experiences in being, or needing, a Data Scientist in their work. The goal of this session is to discuss and extract real project data scientist/analytics experience needs, initiated by presentation and discussed by session participants. Of special interest is bringing together people who have needs for data scientists (data analytics) and will be able to articulate those needs by the end of the session, and/or; stir ideas: for the use of data analytics in their research or to build tools/services for others.

Presenters:

Wade Bishop, School of Information Sciences, University of Tennessee

Peter Fox, Earth & Environmental Sciences, Tetherless World Constellation, Rensselaer Polytechnic Institute

Lewis McGibbney, Computer Science for Data Intensive Systems Group, Jet Propulsion Laboratory

Karen Stocks, Director, Geological Data Center, Scripps Institution of Oceanography


2. The Need for Earth Science Data Analytics to Facilitate Community Resilience (and other applications) (http://commons.esipfed.org/node/7998)

This will be an open forum idea sharing discussion session. This cluster session will review our current work (for new participants), followed by discussion on the extent of social, economic, and environmental issues, as well as science research, in which the advancement of Earth science data analytics have had an impact. The goal of this discussion is to gain sufficient information to categorize how Earth science data analytics has come to be used in our society, and identify use cases that exemplify this


Today, we discussed: - What should our summer session goals be and how do we get there?

The following goals were identified:


1. Identify use cases that address societal issues, specifically

2. Identify datasets (Earth science and otherwise) needed to address the use case issues

3. Identify techniques that may be applied to gleaning information out of the data targeted at addressing the issues


What we should keep in the back of our mind is how we can show the significance of Earth science data analytics in addressing societal issues.


The telecon ended with a reminder by Annie that presenting a ESDA cluster poster at the summer meeting would be good. Describing the activities, speakers, and things we learned would be a good idea. We'll put a poster together based on information we shared through our website.


After our telecon Rob Casey shared a very insightful e-mail regarding ESDA's relationship to community resilience. (Thanks Rob) With permission, I am providing Rob's insights here, and feel it can stimulate further discussion this in July (unfortunately, Rob won't be able to make the meeting).


From Rob:

I think one of the first things to consider, which can help answer the 'why' of ESDA, is what does it mean to benefit society? What does society need from science? What is community resilience?

A community is resilient if it can effectively respond to unexpected events.

A community is resilient if it can prepare or engineer for dangerous eventualities.

A community is resilient if it can avert a dangerous event through preemptive action.


The state of affairs with science as applied to benefitting society has been observation of past data to explain phenomena and detection systems that can serve as warning measures to protect a population from a danger in progress.

ESDA can serve to take us beyond this state of affairs at a number of levels:

  • Gathering of a larger variety of datasets to apply correlations and identify possible precursors to adverse events -- this can improve early warning systems and enhance predictive algorithms to plot the course or effect of damaging events
  • Gathering and computation of a large enough set of data at high resolution to create detailed and reliable simulations of adverse events to establish hazard probabilities -- this informs disaster preparedness planning as well as engineering preparation
  • Establishing associations of possible cause and effect occurrences, and applying these to predictive models. Knowing causes of adverse effects can establish a target for mitigation or avoidance. Accurate prediction of future effects can motivate society to act more quickly and effectively to curtail the looming issue.
  • Refining the quality of gathered data will improve its usefulness and accuracy for long-tail studies. Data preparation is a key ingredient to useful and meaningful analytics. A proper program of data governance and continual data improvement ensures that data is always available and always useful.
  • Good analytics also requires good tools and good visuals. Can the right people see the information they need easily and readily. Is it presented in such a way that it is useful, meaningful, and comphrensible to scientists and decision makers? Is there a base of tool technology that allows for grassroots growth of data analysis with community contribution?

The theme to this session will need to apply what is possible with ESDA technology and techniques and bring it back to where society will benefit. Society needs to evolve from studying a problem to predicting and even preventing a problem to attain maximum resilience. So many of our ills that deal with nature and anthropogenic effects on nature can be readily listed and an agency studying it can be identified.



FYI:

The following Data Analytics / Big Data related sessions are listed to occur at the AGU next December:

  • Advanced Information Systems to Support Climate Projection Data Analysis

Gerald L Potter, Tsengdar J Lee, Dean Norman Williams, and Chris A Mattmann

  • Big Data Analytics for Scientific Data

Emily Law, Michael M Little, Daniel J Crichton, and Padma A Yanamandra-Fisher

  • Big Data in Earth Science – From Hype to Reality

Kwo-Sen Kuo, Rahul Ramachandran, Ben James Kingston Evans. and Mike M Little

  • Big Data in the Geosciences: New Analytics Methods and Parallel Algorithms

Jitendra Kumar and Forrest M Hoffman

  • Computing Big Earth Data

Michael M Little, Darren L. Smith, Piyush Mehrotra, and Daniel Duffy

  • Geophysical Science Data Analytics Use Case Scenarios

Steven J Kempler, Robert R Downs, Tiffany Joi Mathews, and John S Hughes

  • Man vs. Machine - Machine Learning and Cognitive Computing in the Earth Sciences

Jens F Klump, Xiaogang Ma, Jess Robertson and Peter A Fox

  • New approaches for designing Big Data databases

David W Gallaher and Glenn Grant

  • Partnerships and Big Data Facilities in a Big Data World

Kenneth S Casey and Danie Kinkade

  • Towards a Career in Data Science: Pathways and Perspectives

Karen I Stocks, Lesley A Wyborn, Ruth Duerr, and Lynn Yarmey

Next Meetings:

Tuesday, July 14, 2015, 8:30 PST, at Asilomar, first thing Tuesday morning

Thursday, July 16, 2015, 10:30 PST, at Asilomar


Agenda:

See above


Actions:

All: Be at Asilomar