Earth Science Data Analytics/2016-10-20 Telecon
ESDA Telecon notes – 10/20/16
ESIP Hosts (Bruce Caron), Lindsay Barbieri, Steve Kempler, Beth Huffer, Byron Peters, Shea Caspersen, Dan Zalles, Abby Benson, Chung-Lin Shie, Robert Downs, Tripp Corbett, Paul Lomieux
◦ Go over Survey Results (for fun & context)
◦ Talk about the full list of Earth Science data challenges we solicited
◦ Discuss the ones we prepped the community with
▪ maybe have a collaborative document where people can add their solutions to as we talk about them.
◦ META Discussion: Is this direction helpful? How can we bring more ES / DS people together in a useful way? Can we do something like this for the Winter Session?
◦ ESIP Winter Session: Need an abstract proposal by Oct 31st --> who wants to help structure it? Who will be at the ESIP Meeting? (Bar will no longer be a student fellow!)
Bar gave a report out on the survey she led asking people on the ESDA mailing list:
1. Do you consider yourself a Data Scientists and/or an Earth Scientist
Data Scientist?: Yes – 14; No – 11; Sometimes - 2
Earth Scientist?: Yes – 14; No – 14
2. What are 1-2 data / analytical challenges you see in your Earth Science Discipline(s)? (challenges could be specific or broad, technical or social):
- Managing data the way it is needed to answer a given question and yet be made useful for others
- Ensuring the data are reproducible.
- Data sharing
o Proprietary mindset in data collection / generation
o Not knowing how to share data openly: where / what format / how to document / make citable
- Legacy data: additional problem of missing information, degraded items/information/technology
o the vague idea that it’s around here somewhere, but where exactly and can I still read that file?
- Data Discovery actually locating it can be problematic even if you know data exists
o Example: using geophysical well logs of a particular type for a particular area offshore, and we know that BOEM has these logs. But figuring out what logs in particular we want, and then how to gain access to them (let alone being able to assess their usability) has been a lesson in “omg please why don’t you use controlled vocabularies” and “how many different ways can you actually describe an electric log?”
- Integrating datasets from multiple data providers into a common standard
o Example: Biological data are usually captured using methods and structure that fit the particular focused research question of the PI. This works fine until the data need to be integrated into a global data system.
- Frequently researchers don't want to spend the time after the research is complete to align the data with a standard.
o If we could figure out an automated way of aligning biological data with a standard that would reduce the burden on PIs and managers of global integrated databases to move data into the system
- Creating/Developing/Providing data services that enable users efficiently (i.e., properly and quickly) acquiring the data sets they want/need out of the massive Earth Science Data products available in US or/and (literally) everywhere around the World.
- Making data findable by scientists, across multiple repositories, websites, data assembly centers, etc.
- Connecting related data: connecting data from the same sample/cruise/project distributed across different repositories, connecting different versions of data and processed products to raw data in a way that the scientists knows what they need to use, connecting data in repositories to publications.
3. Which Earth Science discipline(s) are you most familiar with?
- Climate data
- Geoscience (geology and geophsyics)
- Meteorology: Water and Energy Cycles; Large-Scale Atmospheric Circulations; Atmospheric Dynamics; Hurricanes; Clouds
- Oceanography: Air-Sea Interactions; Turbulent Fluxes
We then discussed how we can use this information and formulate a session for the winter meeting that would address the theme that began to surface from our discussion, and pave the way for our next phase of work:
Connecting the data usage challenges of Earth Scientists to the Data Scientists ability to support Earth science research through the utilization of data analytics.
More specifically, the following comments/ideas were discussed to draw out the theme of the next ESIP ESDA Cluster face-to-face:
- Need to think about how we can move forward
- Possible TestBed Options: Analytics “Readiness” Levels → understand the different steps to make data easily analyzable. Providing a framework around analytics readiness levels -- could help scientists and data providers secure funding: “with x amount of additional funding we could become analytics ready”. For example: Machine level would be high readiness leve; User friendliness would be low readiness level.
- Elicit scientists’ data challenges to attempt to match up data scientists with Earth scientists to help solve data challenges – link to Federation solutions
- Make a Federation wide ‘Call to Scientists’ to attend the ESDA session
- How do we communicate between data and scientists
- Provide metadata template that includes information that makes data easy to use
- Near Real Time (NRT) data meeting – need for intermediate products. Thus need predictive analytics software for NRT data
- Explore Instring Analysis, Machine Learning
- Survey different perspectives at ESDA Session Stimulus for next steps
- Build on previous work – Technologies/tools previously identified
Thus, we have two potential ways of going:
1. Examining/prototyping technical analytics solutions
2. Soliciting challenges from scientists; Connect Earth Scientists with Data Scientists
And we can provide both perspectives by:
1. ‘Calling all Scientists’ to attend the ESDA Session
2. Solicit Earth Scientist Data usage challenges
3. Make the connections between Earth Scientists with Data Scientists
4. Then, explore technical solutions that may address challenges
November 17, 2016
1. Next Steps
2. Open Mic – What else should we be addressing?