Difference between revisions of "Earth Science Data Analytics/2014-03-20 Telecon"

From Earth Science Information Partners (ESIP)
m (1 revision imported)
 
(27 intermediate revisions by 3 users not shown)
Line 2: Line 2:
  
 
===Known Attendees:===
 
===Known Attendees:===
# ESIP Host (Carol or Erin)
+
ESIP Host (Erin), Bamshad Mobasher, Steve Kempler, Seung Hee Kim, John Schnase, Joan Aron, Helen Conover, Robert Downs, Ari Posner, Emily Law, Fritz Vanwijngaarden, Chung-Lin Shie, Jennifer Davis, Rama, Bruce Caron, Brand Niemann, Anjanette Hawk, Rudy Husar, Thomas Huang, Deborah Smith, Smiley, John Farley, Sara Graves, Beth Huffer
# Bamshad Mobasher
 
# Steve Kempler
 
# Seung Hee Kim
 
# John Schnase
 
# Joan Aron
 
# Helen Conover
 
# Robert Downs
 
# Ari Posner
 
# Emily Law
 
# fritz vanwijngaarden
 
# chung-lin shie
 
# Jennifer Davis
 
# Rama
 
# Bruce Caron
 
# Brand Niemann
 
# Anjanette Hawk
 
# Rudy Husar
 
# Thomas Huang
 
# Deborah Smith
 
# Smiley
 
# John Farley
 
# Sara Graves
 
# Beth Huffer
 
  
 
===Agenda:===
 
===Agenda:===
# Topics to better understand, so far: - Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community: Sorting out Data Science and Data Analytics   
+
 
# Two Guest Speakers – What are people doing with Data Analytics
+
1 Topics to better understand, so far:  
 +
 
 +
- Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community: Sorting out Data Science and Data Analytics   
 +
 
 +
2 Two Guest Speakers – What are people doing with Data Analytics
  
 
- Dr. John Schnase, NASA/GSFC: Hands on Experience: Big Data Challenges
 
- Dr. John Schnase, NASA/GSFC: Hands on Experience: Big Data Challenges
- Prof. Bamshad Mobasher, Professor of Data Analytics, DePaul Univeristy: Data Analytics Masters Degree Overview
 
 
# ESDA Activities
 
  
 +
- Prof. Bamshad Mobasher, Professor of Data Analytics, DePaul Univeristy: Data Analytics Masters Degree Overview 
 +
 +
3 ESDA Activities
 
Discussion:  These are solid activities that have been suggested so far:
 
Discussion:  These are solid activities that have been suggested so far:
  
 
- Compile use cases (include producer/supplier and data user analytics utilization)
 
- Compile use cases (include producer/supplier and data user analytics utilization)
 +
 
- Compile analytics tools (internal and external to ESIP)
 
- Compile analytics tools (internal and external to ESIP)
 +
 
- Do gap analysis
 
- Do gap analysis
  
  
Important Links:
+
Referenced Material:
* [http://www.mastersindatascience.org Education for Data Scientists]
+
* [[Media: SchnaseCAaaS.pdf| Schnase: MERRA Analytic Services paper]]
* [https://rd-alliance.org/groups/big-data-analytics-ig/wiki/big-data-analytics-interest-group-charter.html
 RDA Big Data Analytics Interest Group Charter]
+
* [http://semanticommunity.info/@api/deki/files/28785/Kahn_GMU_22Feb_2013.ppt Ralph Kahn, "Why we need huge datasets of Earth observations…"]
* [http://bigdatawg.nist.gov/home.php NIST Big Data Program]
 
  
  
 
Presentations:
 
Presentations:
 
* [[Media: ESIP_SteveK_ESDA.ppt| Steve's telecom Presentation]]
 
* [[Media: ESIP_SteveK_ESDA.ppt| Steve's telecom Presentation]]
* [[Media: NIST_WoChang_BigData.ppt|NIST Big Data Presentation]]
+
* [http://semanticommunity.info/Data_Science/ESIP_Earth_Sciences_Data_Analytics#Guest_Speaker Brand's Presentation: Sorting out Data Science and Data Analytics]
 +
* [[Media: SchnaseMERRA.pdf| John's Presentation: Hands on Experience: Big Data Challenges]]
 +
* [[Media: DePaulU.pdf|  Bamshad's Presentation: DePaul Univeristy: Data Analytics Masters Degree Overview]]
 +
 
 +
===Notes:===
 +
 
 +
Good News:  We had 3 excellent speakers to discuss: Data Science/Data Analytics (Brand Niemann); An application of Data Analytics on the MERRA (data assimilation) dataset (John), and; Data Analytics Master Program Approach/Overview at DePaul University (Bamshad)
 +
 
 +
Bad News: The Telecom Convener did not plan enough time for what the presentations deservingly used, thus we did not get to Agenda Item 3. (More on item 3 later)
 +
 
 +
The cluster started with some thoughts for Cluster objectives and direction based on February’s telecom ideas (see notes from February’s telecom).  Basically, It seems that this Cluster can serve multiple purposes to address the various levels of members understanding and interests regarding Data Analytics.  This includes:
 +
 
 +
-  ‘Academic’ discussions that allow all of us to be better educated and on the same page in understanding the various aspects of Data Analytics
 +
 
 +
-  Bringing in guest speakers to describe overviews of external efforts and further teach us about the broader use of Data Analytics.  (We can always invite speakers back to learn more)
 +
 
 +
-  Activities that ESIP members can actually address and tackle
 +
 
 +
As a start, this will lay groundwork for our understanding, as the field evolves, and the individual and collective interests of this cluster evolve, in turn, the cluster objectives can evolve.
 +
This will be put out as the basis of the ESDA cluster mission/objectives.  Please take a look at tit at the top of our Wiki ‘ESDA Home Page’. 
 +
Please provide comments on what you think of it, does it address your expectations, and/or what else we should include.
  
  
===Notes:===
+
Take a look at Brand’s presentation.  It provides a real breadth of information regarding Data Science, Data Analytics, what Data Scientists do, current activities in the field, more.  Remember: ‘…try to make a story out of the data’.
 +
 
 +
John’s presentation was equally interesting, describing how he applies analytics (MapReduce) to the MERRA datasets.
 +
 
 +
Not to be outdone, Bamshad gave a great overview of DePaul University’s Data Analytics program, the types of course taught, a little philosophy behind the program, and the domain areas on which the program focus.
 +
 
 +
BTW, here is my new favorite predictive analytics figure describing the CRISP-DM process found in both, Brand and Bamshad’s presentations.  Only I would substitute ‘Business Understanding’ with ‘Domain Expertise’, to make it more generic.
  
More than 40 people attended this telecom.  Interest is high.  As in any start-up group addressing an area with extensive components that can be addressed in various ways, we too will coalesce in one or maybe more directions. 
 
  
The purpose of this telecom was to initiate discussion on Earth Science Data Analytics and the Data Scientist to start the coalescing process that would result in ESIP contributions to, ultimately, facilitate the advancement of Earth science.  
+
[[Image:predanal.png|500px]]
  
The following show the process commencing and several potential actionable ideas that have so far come forth.  Please feel free to add additional comments to the meeting notes or send me an e-mail.
 
  
External Activities:
 
  
* We should look at inventory activities pursued outside ESIP (Emily L)
+

Time ran out to discuss the third agenda item.  This will be discussed at the next telecom (April 17), and provided here for your contemplation:
* John Schnase (GSFC) has relevant activities related to ‘Climate Analytics-as-a-Service’  (Chris L)
+
ESDA Activity
* We should also look into inviting individuals from other groups (e.g., CODATA, NSF, IEEE)  (Bob C, who will help look for/provide points of contact)
+
* Compile use cases (include producer/supplier and data user analytics utilization) - Need 2 to 4 owners
 +
* Compile analytics tools (internal and external to ESIP) – Need 2 to 4 owners (preferably different)
 +
* Do gap analysis – Need to 2 to 4 owners (different or some from above groups)
  
Information Sharing:
+
And Potential Future Activities (as of today)
 +
- Examine project long case studies to determine successfulness of using data analytics in the project (i.e., lessons learned)
 +
- Oh yeah: Create a Cluster Mission Statement and Objectives
 +
- Report out to the Federation All
  
* There is a growing amount of literature addressing data analytics.  E.g., “Doing Data Science” by Cathy O’Neil (Bob C)
 
* Very nice presentation:  ‘Demystifying Data Science’ by Natasha Balac (http://bigdatawg.nist.gov/_uploadfiles/M0169_v1_9072641833.pdf).  I am curious how/if you ESIP Data Scientists resonate with this presentation
 
* NIST provides an excellent list of ‘Big Data Analytics’ reading material:  http://bigdatawg.nist.gov/_uploadfiles/M0264_v1_5728417524.pdf
 
  
Ideas (potential direction) and Other Notes:
+
For reference, I repeat some of the key ideas that came out of the February telecom.  
* Idea:  What does analytics mean in Earth science.  Currently, tools are crude.  We can we help users find what they are looking for (Chris L)
 
* Idea:  We can define the analytics toolset (focusing on Earth science) (Sara G)?
 
* Idea:  We can assemble end-to-end team(s) that together address various aspects of data analytics (and, more broadly, Data Science.  This would also surface gaps in our expertise. (Bob C)
 
* Note:  Data Science is much bigger than analytics (Sara, others).  Thus, let’s not treat them the same. (We can address both topics, but not as one topic)
 
  
RDA Highlights (thanks to Rahul)
+
We can define the analytics toolset (focusing on Earth science)  
* Idea: We can provide ESIP Earth science expertise to support RDA activities (e.g.,use cases)  (Sara G, Nancy H)
 
* Idea: We can identify cross domain commonalities (Emily L)
 
  
NIST highlights (thanks to Wo) – See presentation
+
▪ We can assemble end-to-end team(s) that together address various aspects of data analytics (and, more broadly, Data Science. This would also surface gaps in our expertise.
* Idea: We can better understand and provide potential ESIP expertise to NIST activities
 
  
Post Telecom Comments:
+
We can better understand and provide potential ESIP expertise to NIST activities
* Idea:  Data Supplier vs. Data User perspectives.  We can surface/organize the analytics needs and use cases from both perspectives (as noted below, related Bob’s idea above)
 
  
Comment 1 (from Rudy H):
+
Data Supplier vs. Data User perspectives. We can surface/organize the analytics needs and use cases from both perspectives
* Another dimension of delineating Data Scientist and Data Analytics is along the Data Creator/Provider < --- > Data End User axis.   -- The perspectives and the needs of Data Science and Data Analytics are very different where you are along that axis.  -- Typically a real gap exists between the two perspectives,
 
  
Comment 2 (from Joan A):
+
▪ Another dimension of delineating Data Scientist and Data Analytics is along the Data Creator/Provider < --- > Data End User axis. -- The perspectives and the needs of Data Science and Data Analytics are very different where you are along that axis. -- Typically a real gap exists between the two perspectives
* My main comment is that the telecom tended to focus more on the suppliers of tools.   This should be complemented by attention to the demand side.  I am thinking of environmental monitoring and protection decision-makers who need interaction with the suppliers of the technologies.  ESIP has a niche in contributing to this understanding.    Bob Chen's comments about examining the whole process and comments about use cases fit in here.  I have a particular interest in the perspective as a user in how data analytics and sharing can support better decisions linking environmental protection and public health.
 
* Idea:  We can consider focusing on the collection of case studies where organizations have implemented big data solutions to problems, carried out analytics, quality assurance, and have allowed policy makers to make informed decisions based on the end products of data science.  From this body of work, which can highlight both successes and failures, I think that the group can begin to form recommendations on how organizations should proceed in data science based on their particular goals. It can also serve as a bed of research for data scientists and IT staff to consider alternatives to their own approaches. (Rob C)
 
  
 +
▪ Idea: We can consider focusing on the collection of case studies where organizations have implemented big data solutions to problems, carried out analytics, quality assurance, and have allowed policy makers to make informed decisions based on the end products of data science. From this body of work, which can highlight both successes and failures, I think that the group can begin to form recommendations on how organizations should proceed in data science based on their particular goals. It can also serve as a bed of research for data scientists and IT staff to consider alternatives to their own approaches.
  
 
===Next Telecon:===
 
===Next Telecon:===
* Targeting: March 20, 3:00 EST
+
* April 17, 3:00 EST (third Thursday of each month)
* Looking for help setting the agenda (contact Steve) drawing from ‘ideas’ provided above – Eric K?, Brand N? (help address Data Scientist related activities), Emily L? Others?
+
* Agenda (as of now)
* Invite 2 guest speakers to discuss their Analytics activities
+
 
 +
- Analytics related topic to better understand.  DOES ANYBODY HAVE A TOPIC THEY WISH TO BETTER UNDERSTAND
 +
 
 +
- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities
 +
 
 +
- ESDA Activities

Latest revision as of 09:53, October 8, 2021

ESDA Telecom notes – 3/20/14

Known Attendees:

ESIP Host (Erin), Bamshad Mobasher, Steve Kempler, Seung Hee Kim, John Schnase, Joan Aron, Helen Conover, Robert Downs, Ari Posner, Emily Law, Fritz Vanwijngaarden, Chung-Lin Shie, Jennifer Davis, Rama, Bruce Caron, Brand Niemann, Anjanette Hawk, Rudy Husar, Thomas Huang, Deborah Smith, Smiley, John Farley, Sara Graves, Beth Huffer

Agenda:

1 Topics to better understand, so far:

- Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community: Sorting out Data Science and Data Analytics

2 Two Guest Speakers – What are people doing with Data Analytics

- Dr. John Schnase, NASA/GSFC: Hands on Experience: Big Data Challenges

- Prof. Bamshad Mobasher, Professor of Data Analytics, DePaul Univeristy: Data Analytics Masters Degree Overview

3 ESDA Activities Discussion: These are solid activities that have been suggested so far:

- Compile use cases (include producer/supplier and data user analytics utilization)

- Compile analytics tools (internal and external to ESIP)

- Do gap analysis


Referenced Material:


Presentations:

Notes:

Good News: We had 3 excellent speakers to discuss: Data Science/Data Analytics (Brand Niemann); An application of Data Analytics on the MERRA (data assimilation) dataset (John), and; Data Analytics Master Program Approach/Overview at DePaul University (Bamshad)

Bad News: The Telecom Convener did not plan enough time for what the presentations deservingly used, thus we did not get to Agenda Item 3. (More on item 3 later)

The cluster started with some thoughts for Cluster objectives and direction based on February’s telecom ideas (see notes from February’s telecom). Basically, It seems that this Cluster can serve multiple purposes to address the various levels of members understanding and interests regarding Data Analytics. This includes:

- ‘Academic’ discussions that allow all of us to be better educated and on the same page in understanding the various aspects of Data Analytics

- Bringing in guest speakers to describe overviews of external efforts and further teach us about the broader use of Data Analytics. (We can always invite speakers back to learn more)

- Activities that ESIP members can actually address and tackle

As a start, this will lay groundwork for our understanding, as the field evolves, and the individual and collective interests of this cluster evolve, in turn, the cluster objectives can evolve. This will be put out as the basis of the ESDA cluster mission/objectives. Please take a look at tit at the top of our Wiki ‘ESDA Home Page’. Please provide comments on what you think of it, does it address your expectations, and/or what else we should include.


Take a look at Brand’s presentation. It provides a real breadth of information regarding Data Science, Data Analytics, what Data Scientists do, current activities in the field, more. Remember: ‘…try to make a story out of the data’.

John’s presentation was equally interesting, describing how he applies analytics (MapReduce) to the MERRA datasets.

Not to be outdone, Bamshad gave a great overview of DePaul University’s Data Analytics program, the types of course taught, a little philosophy behind the program, and the domain areas on which the program focus.

BTW, here is my new favorite predictive analytics figure describing the CRISP-DM process found in both, Brand and Bamshad’s presentations. Only I would substitute ‘Business Understanding’ with ‘Domain Expertise’, to make it more generic.


Predanal.png



Time ran out to discuss the third agenda item. This will be discussed at the next telecom (April 17), and provided here for your contemplation: ESDA Activity

  • Compile use cases (include producer/supplier and data user analytics utilization) - Need 2 to 4 owners
  • Compile analytics tools (internal and external to ESIP) – Need 2 to 4 owners (preferably different)
  • Do gap analysis – Need to 2 to 4 owners (different or some from above groups)

And Potential Future Activities (as of today) - Examine project long case studies to determine successfulness of using data analytics in the project (i.e., lessons learned) - Oh yeah: Create a Cluster Mission Statement and Objectives - Report out to the Federation All


For reference, I repeat some of the key ideas that came out of the February telecom.

▪ We can define the analytics toolset (focusing on Earth science)

▪ We can assemble end-to-end team(s) that together address various aspects of data analytics (and, more broadly, Data Science. This would also surface gaps in our expertise.

▪ We can better understand and provide potential ESIP expertise to NIST activities

▪ Data Supplier vs. Data User perspectives. We can surface/organize the analytics needs and use cases from both perspectives

▪ Another dimension of delineating Data Scientist and Data Analytics is along the Data Creator/Provider < --- > Data End User axis. -- The perspectives and the needs of Data Science and Data Analytics are very different where you are along that axis. -- Typically a real gap exists between the two perspectives

▪ Idea: We can consider focusing on the collection of case studies where organizations have implemented big data solutions to problems, carried out analytics, quality assurance, and have allowed policy makers to make informed decisions based on the end products of data science. From this body of work, which can highlight both successes and failures, I think that the group can begin to form recommendations on how organizations should proceed in data science based on their particular goals. It can also serve as a bed of research for data scientists and IT staff to consider alternatives to their own approaches.

Next Telecon:

  • April 17, 3:00 EST (third Thursday of each month)
  • Agenda (as of now)

- Analytics related topic to better understand. DOES ANYBODY HAVE A TOPIC THEY WISH TO BETTER UNDERSTAND

- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities

- ESDA Activities