Earth Science Data Analytics/2016-8-18 Telecon

From Earth Science Information Partners (ESIP)
< Earth Science Data Analytics
Revision as of 08:44, September 11, 2016 by Stevek (talk | contribs) (Created page with "ESDA Telecon notes – 8/18/16 ===Known Attendees:=== ESIP Hosts (Annie Burgess), Steve Kempler, Emily Northup, Beth Huffer, Byron Peters, Lindsay Barbieri, Chung-Lin Shie, ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

ESDA Telecon notes – 8/18/16

Known Attendees:

ESIP Hosts (Annie Burgess), Steve Kempler, Emily Northup, Beth Huffer, Byron Peters, Lindsay Barbieri, Chung-Lin Shie, Shea Caspersen, Annie Burgess, Tiffany Mathews, Angela Li, Robert Downs, Joan Aron, Tripp Corbett


Agenda:

Agenda

1. Next steps for ESDA Cluster

4. Open Mic – What else should we be addressing?


Presentations:

None


Notes:

Thank you all for attending and participating in our telecon


The topic, where can ESDA Cluster go from here, the following ideas were floated, consistent with our goals:

Validate our work Prototype Have a sandbox for using the tools List DA projects that people are working on List potential problems that DA could be applied to List challenges to doing more DA List potential solutions to the challenges Move from tabular data to data visualization

We can also become more involved with interagency work:

Intro/background related to Earth system science Current EPA/ORD/Earth science related work

  Infrastructure
  Analytics

Intra-agency work/potential collaborations

  EPA/CDC ecological niche` modeling of vector borne diseases potential
  NASA/NOAA/USGS/EPA CyAN project
   EPA/Chesbay Conservancy automated visual recognition  

Tools/techniques discussion/update? (could speak to the difference between machine learning/big data and traditional statics/statistical learning?)


Other moving thoughts: - What tools /techniques can we address first? What problem to solve - Invite and drill into projects already utilizing data analytics (e.g.,EPA Projects)

1. Tiffany: Focus on what individuals are working on that use particular ESDA techniques. Should be Use Case driven. Bring light to tool/technique via use case. Would need to choose candidates

2. Protoype: Utilize a ESDA tool to solve a data problem. Could have breakout groups to look at different tools/technique

2A. Joan: Look at what other ESIP Clusters are doing that can be helpful. For example, talk to Dave Jones, Disaster Cluster, about the data analytics he is utilizing.

     Shea: Brought up Chris Mattman’s SciSpark talk.
     Can we be the conduit between ESIP ESDA tool interests (e.g., SciSPark) and heterogeneous data innsue (e.g., Disaster Data).  This may be a good prototype that might experimentally benefit both groups, and be exemplary for further such prototyping.

Also, let’s take the opportunity to entice others interested in ESDA technologies

Also, let’s provde a activity slide for Annie for SciDataCon (Steve can do this)


Discussion led to Action Item #1, below.


To Do List:

Done:

1. Finalize ESDA Definition and Goal categories

2. Write letter to ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved

3. Characterize use cases by Goal categories and other analytics driving considerations

4. Derive requirements from #3

Underway:

5. Survey existing data analytics tools/techniques

6. Write our paper describing ... all the above


Next Meeting:

Durham


Agenda:

1. Observations in learning to be a Data Scientist - Bar

2. Techniques mapped to ESDA types and goals – Status

3. Tools mapped to techniques, and gaps - Status

4. How can we best validate our work with ESDA users?

5. Next Steps

6. Open Mic – What else should we be addressing?


Actions:

1. All: From the Techniques/Goals/Types matrix (https://docs.google.com/spreadsheets/d/1Xg4zYqAqrfu6NMdQtTYJn50J8kXpjpQBC0exX3XYOVQ/edit#gid=0)...

Go down ESDA Type columns entitled: Data Preparation, Data Reduction, and Data Analysis, and delete the techniques (starting on row 16) that can NOT be applied to that column ESDA type. Thus, the techniques that remain, are those that CAN be applied to that column ESDA type. If you are not sure what the technique is, please see technique descriptions at the Google Spreadsheet: https://docs.google.com/spreadsheets/d/1Xv8qySG4k6p8Y3rOYLonWlahwKPT86MCuJoIT3wske8/edit#gid=0 starting on line 33.

Please ask for permission to edit. So that we don't edit over each other, please download a copy to edit, and send back to Steve, or ask Steve to send you a copy to edit.

Please spend a little time and provide back by Friday, June 24. That would be great.

Step 2 would be mapping techniques to goals. More to come on this.


2. All - For our Cluster face to face in Durham: Please identify one or two multi-data researchers who would be willing to provide insights into their experiences and needs for accessing and preparing data for co-analysis of heterogeneous data.