Difference between revisions of "Earth Science Data Analytics/2016-1-21 Telecon"

From Earth Science Information Partners (ESIP)
 
(One intermediate revision by the same user not shown)
Line 3: Line 3:
 
===Known Attendees:===
 
===Known Attendees:===
  
ESIP Host (Erin Robinson), Steve Kempler, Tiffany Mathews, Lindsay Barberie, Chung-Lin Shie, Robert Downs, Beth Huffer, Joan Aron, Brian Wee(?), Byron Peters, Abby Benson, H. Joe Lee
+
ESIP Host (Erin Robinson), Steve Kempler, Tiffany Mathews, Lindsay Barberie, Chung-Lin Shie, Robert Downs, Beth Huffer, Joan Aron, Brian Johnson, Byron Peters, Abby Benson, H. Joe Lee
  
  
Line 32: Line 32:
  
  
Excellent discussion…  
+
Excellent discussion… Thanks Bar, for taking great notes.
  
  
 
Steve started by summarizing the January ESIP ESDA Cluster Meeting.  The ESDA Cluster continues to attract people who are interested in learning about data analytics, what it means , and how it 'fits' in the work we do.  This has made the cluster, unlike most other ESIP groups, more academic in nature.  Data Analytics takes on new methodologies of data analysis born out of the innovative opportunities created by researchers, afforded by the readily available explosion of heterogeneous information.  This is new ground, and we are all learning the landscape, ultimately to understand the technology/methodology gaps we can fill.  At the Cluster meeting, we discussed our work in collecting use cases, potential Earth science data analytics tools and techniques employed, our AGU study which entails studying the analysis methodologies of several dozen science projects, use cases, and next steps.
 
Steve started by summarizing the January ESIP ESDA Cluster Meeting.  The ESDA Cluster continues to attract people who are interested in learning about data analytics, what it means , and how it 'fits' in the work we do.  This has made the cluster, unlike most other ESIP groups, more academic in nature.  Data Analytics takes on new methodologies of data analysis born out of the innovative opportunities created by researchers, afforded by the readily available explosion of heterogeneous information.  This is new ground, and we are all learning the landscape, ultimately to understand the technology/methodology gaps we can fill.  At the Cluster meeting, we discussed our work in collecting use cases, potential Earth science data analytics tools and techniques employed, our AGU study which entails studying the analysis methodologies of several dozen science projects, use cases, and next steps.
  
The discussion transformed into what we most recently learned and how we should go forward with it.  That being, of the three types of ESDA, Data Preparation, Data Reduction, and Data Analysis, the latter is the most difficult to develop an approach for, because science research/analysis is very individual, sometimes unique; libraries of mathematical tools already exist, and; the plethora of specific research models are unique.
+
The discussion transformed into what we most recently learned and how we should go forward with it.  That being, of the three types of ESDA, Data Preparation, Data Reduction, and Data Analysis, the latter is the most difficult to develop an approach for, because science research/analysis is very individual; libraries of mathematical tools already exist, and; the plethora of specific research models are unique, and well understand by the researcher, rendering us no real opportunity to add value for large groups of users per model.
  
 +
The group decided that heterogeneous Data Preparation is where the most pain points are, and tools/techniques that target heterogeneous data preparation should be targeted first.  Addressing Data Preparation needs will directly help Data Analysis, even if not specific research analysis.
  
 +
With this approach, it was noted that we need to invite more scientists to our cluster to provide more insights to their experiences and needs regarding the co-analysis of heterogeneous data.  Perhaps we should institute a 'science advisory board' (maybe ESIP would/will).  Also, we should include applications researchers, who naturally work with various datasets, often not even in the same discipline, to derive results from their studies.
  
 +
Brian suggested we can invite people from the SMAP science team and ISAT2 'early adapters', two communities Brian is familiar with.  Likewise, Chung-Lin suggested GPM data users.  Airborne data users who also work with satellite data are also excellent candidates,  See action below.
  
 
+
Also, discussedReference to DARWIN-CORE (standards) and ESRI (GIS) tools, both of which facilitate a better understanding of multi-data analysis.
 
 
ESDA donations and goals are ready to be presented to the ESIP ExComm.  Erin explained that once we sen our 'letter' to them, they ail review it, put it out to the Assembly for comment for 30 days, and then put it to vote.  '''Pretty exciting!  No other organization has endorsed an Earth Science Data Analytics definition''', although a few struggle to derive one.
 
 
 
 
 
Briefly discussing agenda item 3, we will have one ESIP Meeting cluster session that will contain, at this timeA presentation by Steve Ambrose (NASA NCCS) to discuss new tools developed by NCCS for utilizing cloud based data (we touched upon whether ESDA had the most appropriate audience for this presentation, Erin thought maybe the Cloud session would be better); And a Gap Analysis between ESDA requirements and available tools/techniques discussion.
 
 
 
 
 
The remainder of the telecon focused on preparing for the Gap Analysis discussion (agenda item 2).  Steve started with a table that will be included on our AGU poster showing ESDA types, and applicable tools, techniques and integrated systems. 
 
 
 
[[Image:AGU_ESDA_Tools.xlsx|500px]]
 
 
 
 
 
The thought here is to provide short descriptions of the tools, techniques and integrated systems, so that they can then be mapped to ESDA requirements, combing out gaps and areas needing improvement (added by Chung-Lin).  Ethan provided us a link to a Booz Allen paper entitled:  The Field Guide to Data Science (http://www.boozallen.com/insights/2015/12/data-science-field-guide-second-edition), which generated much discussion.  The paper provided additional and in depth insights into a more generic view of Data Analytics goals, providing detailed 'roadmaps' to analytic techniques appropriate for the analytics goal.  This hits very close to home to what we are also trying to achieve.  After some thought and discussion, we decided to study the paper's goals and associated techniques to determine which apply to our targeted Earth science data analytics goals, thus ensuring that our ESDA goals are afforded the opportunity of being mapped to the detailed techniques provided by the Booz Allen paper.  Seems like melding the best of both efforts. 
 
 
 
 
 
After action items were assigned, and edits to the ESDA spreadsheet (some reflected above), we adjourned.
 
 
 
 
 
 
 
'''WInter ESIP ESDA Cluster Session Abstract: '''
 
 
 
The Earth Science Data Analytics (ESDA) Cluster has made great strides in understanding the utilization of data analytics in Earth science, an area virtually untouched in the literature.  In achieiving its goal to support advancing science research that increasingly includes very large volumes of heterogeneous data, the ESDA Cluster has defined terms, documented use cases, and loosely identified tools and technologies that faciltate a better understanding of the needs of Earth science research.
 
 
 
 
 
This cluster session will discuss and initate the work still to be done, including evaluating use cases, extracting data analytics requirements from use cases (this will be a major part of the discussion), survey exisiting data anlytics tools and techniques, and sharing derived ESDA requirements and found technology gaps with the ESIP group interested in 'Emerging Big Data Technologies for Geoscience'.
 
  
  
Line 103: Line 81:
 
Agenda:   
 
Agenda:   
  
Let's invite the ESIP community and present the work of our previous 19 telecons and 5 face-to-faces
+
'''Presentation:  Let's invite the ESIP community and present the work of our previous 19 telecons and 5 face-to-faces'''
 +
 
  
 
===Actions:===
 
===Actions:===
  
1. Steve, Joan, Ethan, Sean, Chung-Lin, Rob, Thomas:
+
1. Brian and Chung-Lin - Please identify one or two multi-data researchers who would be willing to provide insights into their experiences and needs for accessing and preparing data for co-analysis of heterogeneous dataLet's invite them to our March telecom for an informal discussion(If we can get 3 to 5 people, we can have a 'panel' Q&A session)
 
 
a.  Read paper provided by Ethan:  http://www.boozallen.com/insights/2015/12/data-science-field-guide-second-edition
 
 
 
bDescribe the ESDA tools/techniques we identified on our matrix shown above (More details to follow in e-mail)
 
 
 
cMap techniques defined in the boozallen paper to our ESDA goals requirements, as appropriate (More details to follow in e-mail)
 
 
 
  
2.  All other ESDA members: Help… please let Steve know if you can help us with Action #1
+
2.  Steve, others to be contacted: Complete categorization of all identified ESDA tools/techniques by Data Preparation, Data Reduction, and Data Analysis.  (https://docs.google.com/spreadsheets/d/1zMczlWZnQUiubyfcLjwDIQ-SGm7Sm5C2Wo9bc75V-qk/edit#gid=0)

Latest revision as of 19:21, February 6, 2016

ESDA Telecon notes – 1/21/16

Known Attendees:

ESIP Host (Erin Robinson), Steve Kempler, Tiffany Mathews, Lindsay Barberie, Chung-Lin Shie, Robert Downs, Beth Huffer, Joan Aron, Brian Johnson, Byron Peters, Abby Benson, H. Joe Lee


Agenda:

Agenda

1. ESIP Cluster Meeting recap

2. ESDA for Data Preparation, Data Reduction, Data Analysis – Where can Information Technology make the biggest impact?

3. Tools and Techniques – How can we best organize the plethora of tools and techniques we have uncovered?

4. Open Mic – What else should we be addressing?


Presentations:

None, this time.

Use Case Information: https://docs.google.com/document/d/1U1mAt4ZjJqXeNmtRoE4VbI1nBgS1v7DzeHib_7mzOF8/edit


Notes:

Thank you all for attending and participating in our telecon.


Excellent discussion… Thanks Bar, for taking great notes.


Steve started by summarizing the January ESIP ESDA Cluster Meeting. The ESDA Cluster continues to attract people who are interested in learning about data analytics, what it means , and how it 'fits' in the work we do. This has made the cluster, unlike most other ESIP groups, more academic in nature. Data Analytics takes on new methodologies of data analysis born out of the innovative opportunities created by researchers, afforded by the readily available explosion of heterogeneous information. This is new ground, and we are all learning the landscape, ultimately to understand the technology/methodology gaps we can fill. At the Cluster meeting, we discussed our work in collecting use cases, potential Earth science data analytics tools and techniques employed, our AGU study which entails studying the analysis methodologies of several dozen science projects, use cases, and next steps.

The discussion transformed into what we most recently learned and how we should go forward with it. That being, of the three types of ESDA, Data Preparation, Data Reduction, and Data Analysis, the latter is the most difficult to develop an approach for, because science research/analysis is very individual; libraries of mathematical tools already exist, and; the plethora of specific research models are unique, and well understand by the researcher, rendering us no real opportunity to add value for large groups of users per model.

The group decided that heterogeneous Data Preparation is where the most pain points are, and tools/techniques that target heterogeneous data preparation should be targeted first. Addressing Data Preparation needs will directly help Data Analysis, even if not specific research analysis.

With this approach, it was noted that we need to invite more scientists to our cluster to provide more insights to their experiences and needs regarding the co-analysis of heterogeneous data. Perhaps we should institute a 'science advisory board' (maybe ESIP would/will). Also, we should include applications researchers, who naturally work with various datasets, often not even in the same discipline, to derive results from their studies.

Brian suggested we can invite people from the SMAP science team and ISAT2 'early adapters', two communities Brian is familiar with. Likewise, Chung-Lin suggested GPM data users. Airborne data users who also work with satellite data are also excellent candidates, See action below.

Also, discussed: Reference to DARWIN-CORE (standards) and ESRI (GIS) tools, both of which facilitate a better understanding of multi-data analysis.


To Do List:

Done:

1. Finalize ESDA Definition and Goal categories

2. Write letter to ESIP Executive Committee proposing that the ESDA Definitions and Goal categories be ESIP approved

3. Characterize use cases by Goal categories and other analytics driving considerations

4. Derive requirements from #3

Underway:

5. Further validate requirements with (many) more additional use cases

6. Survey existing data analytics tools/techniques

7. Write our paper describing ... all the above


Questions to think about:

What is the best way to record use cases, and associated requirements, and matching tools? A forum?


Next Telecon:

February 18, 2016 ESDA Telecon XX (our 20th telecon)


Agenda:

Presentation: Let's invite the ESIP community and present the work of our previous 19 telecons and 5 face-to-faces


Actions:

1. Brian and Chung-Lin - Please identify one or two multi-data researchers who would be willing to provide insights into their experiences and needs for accessing and preparing data for co-analysis of heterogeneous data. Let's invite them to our March telecom for an informal discussion. (If we can get 3 to 5 people, we can have a 'panel' Q&A session)

2. Steve, others to be contacted: Complete categorization of all identified ESDA tools/techniques by Data Preparation, Data Reduction, and Data Analysis. (https://docs.google.com/spreadsheets/d/1zMczlWZnQUiubyfcLjwDIQ-SGm7Sm5C2Wo9bc75V-qk/edit#gid=0)