Difference between revisions of "Earth Science Data Analytics/2016-11-17 Telecon"

From Earth Science Information Partners (ESIP)
(Created page with "ESDA Telecon notes – 11/17/16 ===Known Attendees:=== ESIP Host (Bruce Caron), Lindsay Barbieri, Steve Kempler, Beth Huffer, Tiffany Mathews, Shea Caspersen, Dan Zalles, Jo...")
 
Line 13: Line 13:
 
- Report back on two other ESIP Winter Meeting Sessions that are shaping up:
 
- Report back on two other ESIP Winter Meeting Sessions that are shaping up:
 
   
 
   
1.      ESIP Drone Cluster: Data Linking and Data Fusion. This session may pertain to Earth Science Data Analytics, especially for geospatial data analysis and management, and may be of interest to know about.
+
    1.      ESIP Drone Cluster: Data Linking and Data Fusion. This session may pertain to Earth Science Data Analytics, especially for geospatial data analysis and management, and may be of interest to know about.
  
2.      Informatics in Social-Environmental Systems. This session may pertain to Earth Science Data Analytics, especially for linking, analyzing and managing heterogeneous data from different disciplines within the earth sciences and the social sciences.   
+
    2.      Informatics in Social-Environmental Systems. This session may pertain to Earth Science Data Analytics, especially for linking, analyzing and managing heterogeneous data from different disciplines within the earth sciences and the social sciences.   
  
 
- Discussion of other potential ESIP Sessions that may be of interest to this group
 
- Discussion of other potential ESIP Sessions that may be of interest to this group
Line 24: Line 24:
  
  
Bar gave a report out on the survey she led asking people on the ESDA mailing list:
+
Thanks to all again for participating in a very productive telecon.
  
1. Do you consider yourself a Data Scientists and/or an Earth Scientist
+
First thing we did was reviewed the highlights from our previous telecon. That is, regarding what direction to go, as we transition into the next phase of ESDA Cluster efforts, the possibilities include:
  
28 Respondents
+
1. Examining/prototyping technical analytics solutions
  
Data Scientist?: Yes – 14; No – 11; Sometimes - 2
+
2. Soliciting challenges from scientists; Connect Earth Scientists with Data Scientists
  
Earth Scientist?: Yes – 14; No – 14
+
The consensus in today’s telecon is to, indeed, do both, basically, by soliciting challenges from data users, and using these challenges to examine specific technical analytics solutions.  It became clear that to accomplish this, we would need to invite researchers to describe the work that requires the utilization of multiple datasets, and thus data analytics, to address their data analysis challenges.
  
 +
Discussion from here focused on preparations for the ESIP Meeting ESDA Cluster Session.
  
2. What are 1-2 data / analytical challenges you see in your Earth Science Discipline(s)? (challenges could be specific or broad, technical or social):
+
Bar drew in the idea of tying in other ESIP sessions (at the January Federation Meeting) who may also be addressing challenges that require the utilization of data analytics.  This makes sense… we should start with the requirements to which data analytics can be applied.  A session that the ESDA Cluster can have a mutually beneficial connection to is the Informatics in Social-Environmental Systems (http://commons.esipfed.org/node/9504)
  
 +
For reference, Goals of the the Informatics in Social-Environmental Systems session: (1) scope the S-E landscape: hear from people working on a variety of collaborative social-environmental systems initiatives (2) identify and synthesize key informatics challenges within S-E initiatives (3) develop or improve on a framework that incorporates informatics in S-E initiatives by drawing on ESIP knowledge and expertise (4) plan goals for S-E Informatics work after session, and situate ESIP within those goals (publication, workshops/testbed, ESIP future and collaboration with clusters, etc.)
  
Answer:
+
  
- Managing data the way it is needed to answer a given question and yet be made useful for others
+
The Informatics in Social-Environmental Systems session will have researchers invited to discuss their challenges in bringing data to decision making.  A key group that may be able to employ data analytics to support their work.  Perhaps we can also invite some of these guests to the ESDA Cluster session, along with our own guests that we would like to have describe data usage challenges and research data needs.
  
- Ensuring the data are reproducible.  
+
Beth also suggested that the Semantic Community Engagement Plan session (http://commons.esipfed.org/node/9545), will be engaging outside people who may be able to contribute to the ESDA Cluster discussion.
  
- Data sharing
+
For reference the abstract for the Semantic Community Engagement Plan session:  We'll discuss how to promote the adoption of semantic technologies by other ESIP and non-ESIP groups. In particular, we will try to develop a plan for promoting use of the new ontology portal. Topics for discussion will include ideas for how to train others in using the portal; how to promote portal use; governance
  
o Proprietary mindset in data collection / generation
+
  
o Not knowing how to share data openly: where / what format / how to document / make citable
+
Thus, we all agreed that it is time to transition to bringing in in different perspectives; Different perspectives of data usage challenges surfaced by experienced data users.  (This is different from past invited speakers who ‘taught’ us data analytics through descriptions of their research.)
  
- Legacy data: additional problem of missing information, degraded items/information/technology
+
  
o the vague idea that it’s around here somewhere, but where exactly and can I still read that file?
+
Dan suggested having a panel where scientists can tell their ‘stories’ to Data Scientists.  This was met with group consensus.  Shea has some possible candidates who can be responsive Data Scientists
  
- Data Discovery actually locating it can be problematic even if you know data exists
+
  
o Example: using geophysical well logs of a particular type for a particular area offshore, and we know that BOEM has these logs. But figuring out what logs in particular we want, and then how to gain access to them (let alone being able to assess their usability) has been a lesson in “omg please why don’t you use controlled vocabularies” and “how many different ways can you actually describe an electric log?”
+
ACTION to Each of Us: Identify people who can contribute to better understanding the broader ranging data challenges.  Provide Bar with 1 or 2 names of individuals who we can invite to our Cluster session. Please do this soon so we can have enough time to invite them.  Note that potential guests who live in the Washington DC have a better chance of attending (and at less personal cost).  If you know a good candidate who is already attending the ESIP meeting, please also let Bar know
  
- Integrating datasets from multiple data providers into a common standard
+
  
o Example: Biological data are usually captured using methods and structure that fit the particular focused research question of the PI. This works fine until the data need to be integrated into a global data system.
+
ACTION, Beth and Bar: If interested, discuss potential synergy between the Semantic and ESDA groups
  
- Frequently researchers don't want to spend the time after the research is complete to align the data with a standard.
+
  
o If we could figure out an automated way of aligning biological data with a standard that would reduce the burden on PIs and managers of global integrated databases to move data into the system
+
We also need help leading the Cluster Session.  Those who will be present at the meeting are:  Bar, Beth. Shea is not sure if he will be attending, and Dan, Abby, Steve ,and Tiffany will not be able to attend, but will help in the preparations (agenda planning, e-mails, phone calls, agenda review, etc.)
  
- Creating/Developing/Providing data services that enable users efficiently (i.e., properly and quickly) acquiring the data sets they want/need out of the massive Earth Science Data products available in US or/and (literally) everywhere around the World.
+
  
- Making data findable by scientists, across multiple repositories, websites, data assembly centers, etc.
+
As part of the discussion of linking ESDA Cluster efforts with other ESIP groups, Bar suggested that there may be synergy with the Drone Cluster, who are looking at ways to link drone imagery/measurements to satellite imagery/measurements.  Obviously these are two sources of heterogeneous data that together can bear much fruit, and a place where data analytics can potentially play a role.  This represents another excellent application needing heterogeneous data preparation, reduction, and analysis techniques.  “A data analytics problem”.
  
- Connecting related data: connecting data from the same sample/cruise/project distributed across different repositories, connecting different versions of data and processed products to raw data in a way that the scientists knows what they need to use, connecting data in repositories to publications.
+
  
 +
Thus, it appears we have a few options in front of us where we can apply our understanding of data analytics, and help those interested in the application.  Having people who represent these applications discuss their needs with the ESDA Cluster might hopefully lead to folks at the ESDA Session to be interested in helping solve applications data challenges.
  
3. Which Earth Science discipline(s) are you most familiar with?
+
  
 +
Bar ended the telecon expressing the benefits of having an ESIP Student Fellow available to support the ESDA Cluster.  Bruce said he would get back with Bar on this.
  
Answer:
+
  
- Climate data
+
Excellent telecon!
  
- Geoscience (geology and geophsyics)
+
 +
===Actions:===
  
- Biology/ecology
 
  
- Meteorology: Water and Energy Cycles; Large-Scale Atmospheric Circulations; Atmospheric Dynamics; Hurricanes; Clouds
+
ACTIONS:
  
- Oceanography: Air-Sea Interactions; Turbulent Fluxes
+
ACTION to Each of Us: Identify people who can contribute to better understanding the broader ranging data challenges.  Provide Bar with 1 or 2 names of individuals who we can invite to our Cluster session.  Please do this soon so we can have enough time to invite them.  Note that potential guests who live in the Washington DC have a better chance of attending (and at less personal cost).  If you know a good candidate who is already attending the ESIP meeting, please also let Bar know
  
- Oceanography
+
ACTION, Beth and BarIf interested, discuss potential synergy between the Semantic and ESDA groups
 
 
We then discussed how we can use this information and formulate a session for the winter meeting that would address the theme that began to surface from our discussion, and pave the way for our next phase of work:  
 
 
 
Connecting the data usage challenges of Earth Scientists to the Data Scientists ability to support Earth science research through the utilization of data analytics.
 
 
 
More specifically, the following comments/ideas were discussed to draw out the theme of the next ESIP ESDA Cluster face-to-face:
 
 
 
- Need to think about how we can move forward
 
 
 
- Possible TestBed Options: Analytics “Readiness” Levels → understand the different steps to make data easily analyzable. Providing a framework around analytics readiness levels -- could help scientists and data providers secure funding: “with x amount of additional funding we could become analytics ready”. For example:  Machine level would be high readiness leve; User friendliness would be low readiness level.
 
 
 
- Elicit scientists’ data challenges to attempt to match up data scientists with Earth scientists to help solve data challenges – link to Federation solutions
 
 
 
- Make a Federation wide ‘Call to Scientists’ to attend the ESDA session
 
 
 
- How do we communicate between data and scientists
 
 
 
- Provide metadata template that includes information that makes data easy to use
 
 
 
- Near Real Time (NRT) data meeting – need for intermediate products.  Thus need predictive analytics software for NRT data
 
 
 
- Explore Instring Analysis, Machine Learning
 
 
 
- Survey different perspectives at ESDA Session Stimulus for next steps
 
 
 
- Build on previous work – Technologies/tools previously identified
 
 
 
 
 
Thus, we have two potential ways of going:
 
 
 
 
 
1. Examining/prototyping technical analytics solutions
 
 
 
2. Soliciting challenges from scientists; Connect Earth Scientists with Data Scientists
 
 
 
 
 
And we can provide both perspectives by:
 
 
 
 
 
1. ‘Calling all Scientists’ to attend the ESDA Session
 
 
 
2. Solicit Earth Scientist Data usage challenges
 
 
 
3. Make the connections between Earth Scientists with Data Scientists
 
 
 
4. Then, explore technical solutions that may address challenges
 
 
 
  
 +
  
 
===Next Meeting:===
 
===Next Meeting:===

Revision as of 12:18, December 5, 2016

ESDA Telecon notes – 11/17/16

Known Attendees:

ESIP Host (Bruce Caron), Lindsay Barbieri, Steve Kempler, Beth Huffer, Tiffany Mathews, Shea Caspersen, Dan Zalles, Joan Aron, Abby Benson, Chung-Lin Shie, Robert Casey, Tripp Corbett


Agenda:

- Continuing discussion on ESDA ESIP Winter Session

- Report back on two other ESIP Winter Meeting Sessions that are shaping up:

    1.      ESIP Drone Cluster: Data Linking and Data Fusion. This session may pertain to Earth Science Data Analytics, especially for geospatial data analysis and management, and may be of interest to know about.
    2.      Informatics in Social-Environmental Systems. This session may pertain to Earth Science Data Analytics, especially for linking, analyzing and managing heterogeneous data from different disciplines within the earth sciences and the social sciences.  

- Discussion of other potential ESIP Sessions that may be of interest to this group


Notes:

Thanks to all again for participating in a very productive telecon.

First thing we did was reviewed the highlights from our previous telecon. That is, regarding what direction to go, as we transition into the next phase of ESDA Cluster efforts, the possibilities include:

1. Examining/prototyping technical analytics solutions

2. Soliciting challenges from scientists; Connect Earth Scientists with Data Scientists

The consensus in today’s telecon is to, indeed, do both, basically, by soliciting challenges from data users, and using these challenges to examine specific technical analytics solutions. It became clear that to accomplish this, we would need to invite researchers to describe the work that requires the utilization of multiple datasets, and thus data analytics, to address their data analysis challenges.

Discussion from here focused on preparations for the ESIP Meeting ESDA Cluster Session.

Bar drew in the idea of tying in other ESIP sessions (at the January Federation Meeting) who may also be addressing challenges that require the utilization of data analytics. This makes sense… we should start with the requirements to which data analytics can be applied. A session that the ESDA Cluster can have a mutually beneficial connection to is the Informatics in Social-Environmental Systems (http://commons.esipfed.org/node/9504)

For reference, Goals of the the Informatics in Social-Environmental Systems session: (1) scope the S-E landscape: hear from people working on a variety of collaborative social-environmental systems initiatives (2) identify and synthesize key informatics challenges within S-E initiatives (3) develop or improve on a framework that incorporates informatics in S-E initiatives by drawing on ESIP knowledge and expertise (4) plan goals for S-E Informatics work after session, and situate ESIP within those goals (publication, workshops/testbed, ESIP future and collaboration with clusters, etc.)


The Informatics in Social-Environmental Systems session will have researchers invited to discuss their challenges in bringing data to decision making. A key group that may be able to employ data analytics to support their work. Perhaps we can also invite some of these guests to the ESDA Cluster session, along with our own guests that we would like to have describe data usage challenges and research data needs.

Beth also suggested that the Semantic Community Engagement Plan session (http://commons.esipfed.org/node/9545), will be engaging outside people who may be able to contribute to the ESDA Cluster discussion.

For reference the abstract for the Semantic Community Engagement Plan session: We'll discuss how to promote the adoption of semantic technologies by other ESIP and non-ESIP groups. In particular, we will try to develop a plan for promoting use of the new ontology portal. Topics for discussion will include ideas for how to train others in using the portal; how to promote portal use; governance


Thus, we all agreed that it is time to transition to bringing in in different perspectives; Different perspectives of data usage challenges surfaced by experienced data users. (This is different from past invited speakers who ‘taught’ us data analytics through descriptions of their research.)


Dan suggested having a panel where scientists can tell their ‘stories’ to Data Scientists. This was met with group consensus. Shea has some possible candidates who can be responsive Data Scientists


ACTION to Each of Us: Identify people who can contribute to better understanding the broader ranging data challenges. Provide Bar with 1 or 2 names of individuals who we can invite to our Cluster session. Please do this soon so we can have enough time to invite them. Note that potential guests who live in the Washington DC have a better chance of attending (and at less personal cost). If you know a good candidate who is already attending the ESIP meeting, please also let Bar know


ACTION, Beth and Bar: If interested, discuss potential synergy between the Semantic and ESDA groups


We also need help leading the Cluster Session. Those who will be present at the meeting are: Bar, Beth. Shea is not sure if he will be attending, and Dan, Abby, Steve ,and Tiffany will not be able to attend, but will help in the preparations (agenda planning, e-mails, phone calls, agenda review, etc.)


As part of the discussion of linking ESDA Cluster efforts with other ESIP groups, Bar suggested that there may be synergy with the Drone Cluster, who are looking at ways to link drone imagery/measurements to satellite imagery/measurements. Obviously these are two sources of heterogeneous data that together can bear much fruit, and a place where data analytics can potentially play a role. This represents another excellent application needing heterogeneous data preparation, reduction, and analysis techniques. “A data analytics problem”.


Thus, it appears we have a few options in front of us where we can apply our understanding of data analytics, and help those interested in the application. Having people who represent these applications discuss their needs with the ESDA Cluster might hopefully lead to folks at the ESDA Session to be interested in helping solve applications data challenges.


Bar ended the telecon expressing the benefits of having an ESIP Student Fellow available to support the ESDA Cluster. Bruce said he would get back with Bar on this.


Excellent telecon!


Actions:

ACTIONS:

ACTION to Each of Us: Identify people who can contribute to better understanding the broader ranging data challenges. Provide Bar with 1 or 2 names of individuals who we can invite to our Cluster session. Please do this soon so we can have enough time to invite them. Note that potential guests who live in the Washington DC have a better chance of attending (and at less personal cost). If you know a good candidate who is already attending the ESIP meeting, please also let Bar know

ACTION, Beth and Bar: If interested, discuss potential synergy between the Semantic and ESDA groups


Next Meeting:

November 17, 2016


Agenda:

1. Next Steps

2. Open Mic – What else should we be addressing?