Earth Science Data Analytics/2016-6-16 Telecon

From Federation of Earth Science Information Partners

ESDA Telecon notes – 6/16/16

Known Attendees:

ESIP Hosts (Annie Burgess), Steve Kempler, Chung-Lin Shie, Robert Downs, Beth Huffer, Shea Caspersen, Tiffany Mathews, Byron Peters, Ethan McMahon, Sudhir Shrestha


Agenda:

Agenda

1. Tools and Techniques – Status

2. How can we best validate our work with ESDA users?

3. ESIP Summer Meeting Planning. Session Title: Earth Science Data Analytics Tools, Techniques and More (http://commons.esipfed.org/node/9140)

4. Open Mic – What else should we be addressing?


Presentations:

Reviewed tools/techniques matrix


Notes:

Thank you all for attending and participating in our telecon


Announcement: Bar will be giving a very valuable presentation at the July ESIP ESDA Cluster session on her experiences, as a student of Data Science. This will be very interesting to hear first hand experience from someone is going through the paces of 'what it takes to be a Data Scientist'


In this telecon, we explored and agreed on a different approach to organizing and categorizing the information we have so far gathered: ESDA types, techniques, tools, and goals, and how we relate them to each other.


Thus far, our approach has been: use cases --> derive requirements --> map tools/techniques for preparation, for reduction, for analysis....then gap analysis: Determine requirements that have no tools/techniques.


Because we can not get enough use cases to know all the requirements, we became somewhat stymied.


Our new approach is now: for preparation, for reduction, for analysis --> known techniques --> map to known tools... then gap analysis: determine known techniques that have no tools.


We can do this, because we know known techniques, and we know known tools. But what about requirements. Requirements that do not have known tools/techniques mapped to it, will either surface when we discover them, or, if requiring a new technique, may very well come with techniques created by the author of the requirement, and maybe even an author created tool.


The gap analysis that is more tangible that we can pursue is: determine known techniques that have no tools. Tools, after all, is what we all like to build, right?


Discussion led to Action Item #1, below.


To Do List:

Done:

1. Finalize ESDA Definition and Goal categories

2. Write letter to ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved

3. Characterize use cases by Goal categories and other analytics driving considerations

4. Derive requirements from #3

Underway:

5. Survey existing data analytics tools/techniques

6. Write our paper describing ... all the above


Next Meeting:

Durham


Agenda:

1. Observations in learning to be a Data Scientist - Bar

2. Techniques mapped to ESDA types and goals – Status

3. Tools mapped to techniques, and gaps - Status

4. How can we best validate our work with ESDA users?

5. Next Steps

6. Open Mic – What else should we be addressing?


Actions:

1. All: From the Techniques/Goals/Types matrix (https://docs.google.com/spreadsheets/d/1Xg4zYqAqrfu6NMdQtTYJn50J8kXpjpQBC0exX3XYOVQ/edit#gid=0)...

Go down ESDA Type columns entitled: Data Preparation, Data Reduction, and Data Analysis, and delete the techniques (starting on row 16) that can NOT be applied to that column ESDA type. Thus, the techniques that remain, are those that CAN be applied to that column ESDA type. If you are not sure what the technique is, please see technique descriptions at the Google Spreadsheet: https://docs.google.com/spreadsheets/d/1Xv8qySG4k6p8Y3rOYLonWlahwKPT86MCuJoIT3wske8/edit#gid=0 starting on line 33.

Please ask for permission to edit. So that we don't edit over each other, please download a copy to edit, and send back to Steve, or ask Steve to send you a copy to edit.

Please spend a little time and provide back by Friday, June 24. That would be great.

Step 2 would be mapping techniques to goals. More to come on this.


2. All - For our Cluster face to face in Durham: Please identify one or two multi-data researchers who would be willing to provide insights into their experiences and needs for accessing and preparing data for co-analysis of heterogeneous data.