Earth Science Data Analytics/Earth Science Data Analytics Telecons/2014-02-20 Telecon
ESDA Telecom notes – 2/20/14
ESIP Host (Carol or Erin), H. Joe Lee, Steve Kempler, Nancy Hoebelheinrich, Tiffany Mathews, Joan Aron, Glenn Rutledge, Robert Downs, chung-lin shie, Manil, S Doman Bennett, Ethan Davis, Karen Moe, Chris L, Robert Casey, Ken Keiser, Bob Chen, Stephane Beland, Eric, Deborah Smith, Smiley, Rahul Ramachandran, Phil Jones, Kent Yang, Wo Chang, Sara Graves, John Evans, Emily Law, Ryan Bowe, Seung Hee Kim, Crystal Freebourn, Rudy Husar, Brand Niemann, Jay Morris (CLASS), Thomas Huang, Ed Armstrong, Aleksandar Jelenak, chris lynnes
1 Analytics and Data Scientist...in the Federation (what we can contribute to the field)
- Flush out what expertise we have in the Federation on Analytics Techniques and Data Science. This can result in a collection of text summarizing our experience/expertise. Regarding Data Scientist, what would you like to see a Data Scientist do to help you in your work.
2 Other Activities:
RDA Big Data Analytics Interest Group (Infrastructure Working Group) - Rahul will provide us a briefing
NIST Big Data Program - Wo Chang will provide us a briefing
- RDA and NIST Big Data initiatives have jointly focused their interests on the
- Big Data Analytics Interest Group, and in particular, the Infrastructure Working Group ‘to establish best practices implementation guidelines for how to deploy and manage big data applications using NIST Big Data Reference Architecture (NBD-RA) and other big data architectures along with best technologies available today to meet the ever challenging big data application demands’
- As a group, is this something we should collaborate with? How can we contribute?
3 Data Scientist as a data user
- Data Scientists are, obviously, data users. Data scientists: What are your Earth science data needs in regards to accessing and using Earth science data? Looking for people with experience
More than 40 people attended this telecom. Interest is high. As in any start-up group addressing an area with extensive components that can be addressed in various ways, we too will coalesce in one or maybe more directions.
The purpose of this telecom was to initiate discussion on Earth Science Data Analytics and the Data Scientist to start the coalescing process that would result in ESIP contributions to, ultimately, facilitate the advancement of Earth science.
The following show the process commencing and several potential actionable ideas that have so far come forth. Please feel free to add additional comments to the meeting notes or send me an e-mail.
- We should look at inventory activities pursued outside ESIP (Emily L)
- John Schnase (GSFC) has relevant activities related to ‘Climate Analytics-as-a-Service’ (Chris L)
- We should also look into inviting individuals from other groups (e.g., CODATA, NSF, IEEE) (Bob C, who will help look for/provide points of contact)
- There is a growing amount of literature addressing data analytics. E.g., “Doing Data Science” by Cathy O’Neil (Bob C)
- Very nice presentation: ‘Demystifying Data Science’ by Natasha Balac (http://bigdatawg.nist.gov/_uploadfiles/M0169_v1_9072641833.pdf). I am curious how/if you ESIP Data Scientists resonate with this presentation
- NIST provides an excellent list of ‘Big Data Analytics’ reading material: http://bigdatawg.nist.gov/_uploadfiles/M0264_v1_5728417524.pdf
Ideas (potential direction) and Other Notes:
- Idea: What does analytics mean in Earth science. Currently, tools are crude. We can we help users find what they are looking for (Chris L)
- Idea: We can define the analytics toolset (focusing on Earth science) (Sara G)?
- Idea: We can assemble end-to-end team(s) that together address various aspects of data analytics (and, more broadly, Data Science. This would also surface gaps in our expertise. (Bob C)
- Note: Data Science is much bigger than analytics (Sara, others). Thus, let’s not treat them the same. (We can address both topics, but not as one topic)
RDA Highlights (thanks to Rahul)
- Idea: We can provide ESIP Earth science expertise to support RDA activities (e.g.,use cases) (Sara G, Nancy H)
- Idea: We can identify cross domain commonalities (Emily L)
NIST highlights (thanks to Wo) – See presentation
- Idea: We can better understand and provide potential ESIP expertise to NIST activities
Post Telecom Comments:
- Idea: Data Supplier vs. Data User perspectives. We can surface/organize the analytics needs and use cases from both perspectives (as noted below, related Bob’s idea above)
Comment 1 (from Rudy H):
- Another dimension of delineating Data Scientist and Data Analytics is along the Data Creator/Provider < --- > Data End User axis. -- The perspectives and the needs of Data Science and Data Analytics are very different where you are along that axis. -- Typically a real gap exists between the two perspectives,
Comment 2 (from Joan A):
- My main comment is that the telecom tended to focus more on the suppliers of tools. This should be complemented by attention to the demand side. I am thinking of environmental monitoring and protection decision-makers who need interaction with the suppliers of the technologies. ESIP has a niche in contributing to this understanding. Bob Chen's comments about examining the whole process and comments about use cases fit in here. I have a particular interest in the perspective as a user in how data analytics and sharing can support better decisions linking environmental protection and public health.
- Idea: We can consider focusing on the collection of case studies where organizations have implemented big data solutions to problems, carried out analytics, quality assurance, and have allowed policy makers to make informed decisions based on the end products of data science. From this body of work, which can highlight both successes and failures, I think that the group can begin to form recommendations on how organizations should proceed in data science based on their particular goals. It can also serve as a bed of research for data scientists and IT staff to consider alternatives to their own approaches. (Rob C)
- Targeting: March 20, 3:00 EST
- Looking for help setting the agenda (contact Steve) drawing from ‘ideas’ provided above – Eric K?, Brand N? (help address Data Scientist related activities), Emily L? Others?
- Invite 2 guest speakers to discuss their Analytics activities