Difference between revisions of "Earth Science Data Analytics/2014-04-17 Telecon"

From Earth Science Information Partners (ESIP)
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
SDA Telecom notes – 4/17/14
+
 
 +
ESDA Telecom notes – 4/17/14
  
 
===Known Attendees:===
 
===Known Attendees:===
  
 
+
Erin (ESIP Host), Steve Kempler, Brand Niemann, Seung Hee Kim, Robert Downs, Josh Young, chung-lin shie, Ken Keiser, Rudy Husar, fritz vanwijngaarden, Eric Kihn, John, Tiffany Mathews, suhung shen, Rahul Ramachandran, Walt Baskin, Joan Aron
  
 
===Agenda:===
 
===Agenda:===
  
10 minutes – Steve
+
1 Present new Cluster Information Sharing Webasites -Steve
  
 
Introduction to the Earth Science Data Analytics Discussion Forum - http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum
 
Introduction to the Earth Science Data Analytics Discussion Forum - http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum
 +
 
Introduction to the Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection
 
Introduction to the Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection
  
  
10 minutes – Joan Aron – To Present:
+
2 – Joan Aron – To Present: Data Analytics Needs Scenario
 
 
Data Analytics Needs Scenario
 
 
 
  
10 minutes – Rudy Husar – To present:
 
  
User-Oriented Data Analytics and Tools in DataFed
+
3 – Rudy Husar – To present:  User-Oriented Data Analytics and Tools in DataFed
  
  
20 minutes – Tiffany Matthews – To lead discussion:
+
4 – Tiffany Matthews – To lead discussion:
  
 
'enabling users to leverage data to observe more phenomena than what can be identified when studying an average'.
 
'enabling users to leverage data to observe more phenomena than what can be identified when studying an average'.
Line 31: Line 29:
  
  
Referenced Material:
+
Presentations:
* [[Media: SchnaseCAaaS.pdfSchnase: MERRA Analytic Services paper]]
+
* [[Media: Aron-Data Analytics Needs Scenario.pptxJoan Aron: Data Analytics Needs Scenario - 4/17/14]]
* [http://semanticommunity.info/@api/deki/files/28785/Kahn_GMU_22Feb_2013.ppt Ralph Kahn, "Why we need huge datasets of Earth observations…"]
+
* [[Media: Rudy140417_ESIP_DataAnalytics2.pptx |  Rudy Husar: User-Oriented Data Analytics and Tools using the Federated Data System DataFed - 4/17/14]]
 +
* [[Media: ASDC Analytics Discussion.pdf|  Tiffany Mathews: Atmospheric Science Data Center Sample Analytics Use Cases - 4/17/14]]
  
 
Presentations:
 
* [[Media: ESIP_SteveK_ESDA.ppt| Steve's telecom Presentation]]
 
* [http://semanticommunity.info/Data_Science/ESIP_Earth_Sciences_Data_Analytics#Guest_Speaker Brand's Presentation: Sorting out Data Science and Data Analytics]
 
* [[Media: SchnaseMERRA.pdf|  John's Presentation: Hands on Experience: Big Data Challenges]]
 
* [[Media: DePaulU.pdf|  Bamshad's Presentation: DePaul Univeristy: Data Analytics Masters Degree Overview]]
 
  
 
===Notes:===
 
===Notes:===
  
Good News:  We had 3 excellent speakers to discuss: Data Science/Data Analytics (Brand Niemann); An application of Data Analytics on the MERRA (data assimilation) dataset (John), and; Data Analytics Master Program Approach/Overview at DePaul University (Bamshad)
+
Today, from Joan, Rudy, and Tiffany, we received three excellent, insightful presentations regarding the need of data analytics from a user perspective, and a data discovery perspective, as well as useful tools that can help the data user.  Please give them a look via the links above.
  
Bad News: The Telecom Convener did not plan enough time for what the presentations deservingly used, thus we did not get to Agenda Item 3. (More on item 3 later)
 
  
The cluster started with some thoughts for Cluster objectives and direction based on February’s telecom ideas (see notes from February’s telecom). Basically, It seems that this Cluster can serve multiple purposes to address the various levels of members understanding and interests regarding Data Analytics. This includes:
+
'''The ESDA Discussion Forum is open for topic requests, ideas, references, and continued telecom discussion -
 +
http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum'''
  
-  ‘Academic’ discussions that allow all of us to be better educated and on the same page in understanding the various aspects of Data Analytics
 
  
Bringing in guest speakers to describe overviews of external efforts and further teach us about the broader use of Data Analytics.  (We can always invite speakers back to learn more)
+
Highlights from Joan's presentation:<br />
 +
- Provides an end user perspective for data analytics tools/technique needs: Risk Analysis, trends of Near Real Time data<br />
 +
- Need for linking continuous data from various sources<br />
 +
- Use case: Linking Climate and Ar Quality<br />
  
-  Activities that ESIP members can actually address and tackle
 
  
As a start, this will lay groundwork for our understanding, as the field evolves, and the individual and collective interests of this cluster evolve, in turn, the cluster objectives can evolve.
+
Highlights from Rudy's presentation:<br />
This will be put out as the basis of the ESDA cluster mission/objectives.  Please take a look at tit at the top of our Wiki ‘ESDA Home Page’. 
+
- Also, provides an end user perspective for Air Quality Decision Systems needed analytics<br />
Please provide comments on what you think of it, does it address your expectations, and/or what else we should include.
+
- DataFed provides a shared data pool (multiple sources), data browser, event screening, data and trend analysis<br />
  
  
Take a look at Brand’s presentation.  It provides a real breadth of information regarding Data Science, Data Analytics, what Data Scientists do, current activities in the field, more.  Remember: ‘…try to make a story out of the data’.
+
Highlights from Tiffany's presentation<br />
 +
- From the data provider point of view, provides this excellent perspective:  "enable users to leverage data to observe more phenomena than what can be identified by studying an average<br />
 +
- Discussed dataset inter-calibrarions, inter-comparisons, finding data that is meaningful, and being able to analyze original source data associated with higher level data of interest.<br />
  
John’s presentation was equally interesting, describing how he applies analytics (MapReduce) to the MERRA datasets.
 
  
Not to be outdone, Bamshad gave a great overview of DePaul University’s Data Analytics program, the types of course taught, a little philosophy behind the program, and the domain areas on which the program focus.
+
Tiffany next led a discussion to answer the following questions:<br />
  
BTW, here is my new favorite predictive analytics figure describing the CRISP-DM process found in both, Brand and Bamshad’s presentations. Only I would substitute ‘Business Understanding’ with ‘Domain Expertise’, to make it more generic.
+
1. What are your most time consuming data tasks that can leverage analytics?<br />
 +
2. Identify and discuss different types of analytics<br />
 +
3. What kind of data analytics is needed for specific use cases?<br />
 +
4. Identify tools and technologies that address different types of analytics<br />
  
  
[[Image:predanal.png|500px]]
+
(Of course,) We did not get through all questions, but after a very good discussion, '''we decided to post the questions on the 'ESDA discussion Forum' (http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum)  and continue discussion on the forum (I encourage all to participate with questions, answers, and experience)'''
  
  
 +
Discussion highlights (thus far), focusing on the different types of data analytics:<br />
 +
[[Image:onemoretype.png|500px]]
  

Time ran out to discuss the third agenda item.  This will be discussed at the next telecom (April 17), and provided here for your contemplation:
 
ESDA Activity
 
- Compile use cases (include producer/supplier and data user analytics utilization) - Need 2 to 4 owners
 
- Compile analytics tools (internal and external to ESIP) – Need 2 to 4 owners (preferably different)
 
- Do gap analysis – Need to 2 to 4 owners (different or some from above groups)
 
  
And Potential Future Activities (as of today)
 
- Examine project long case studies to determine successfulness of using data analytics in the project (i.e., lessons learned)
 
- Oh yeah:  Create a Cluster Mission Statement and Objectives
 
- Report out to the Federation All
 
  
 +
- Getting data, in particular, meaningful data is very time consuming<br />
 +
- Metadata is very useful in accessing and understanding data to determine its meaningfulness<br />
 +
- Using semantics to acquire information in metadata needs to be further pursued<br />
 +
- Making data usable in system (i.e., analytics tool, decision support, etc.) is time consuming; Automating process is sometimes difficult<br />
  
For reference, I repeat some of the key ideas that came out of the February telecom. 
+
- Types of analytics needed:  Provider - Analytics to make data more usable<br />
 +
- Types of analytics needed:  Provider/User - For data integration; Combine data from 2 or more data sources; what isn the best way to do this (<-- end goal dependent)<br />
 +
- This is the figure (I believe) Rudy was alluding to, when referring to Big Data Value Chain:
  
▪ We can define the analytics toolset (focusing on Earth science)
+
[[Image:analyticsvaluechain.tiff|500px]]
  
▪ We can assemble end-to-end team(s) that together address various aspects of data analytics (and, more broadly, Data Science. This would also surface gaps in our expertise.
+
 +
- Using analytics to combine data tools, and be able to reverse out of analytics to get back to the original data<br />
 +
- Tools: Needed for identifying new information from a combination of existing data <br />
 +
- Tools: For linking data to causes (thus working backwards: result --> cause --> data)<br />
 +
- Tools: Data fusion - for example, for environmental data analysis<br />
  
▪ We can better understand and provide potential ESIP expertise to NIST activities
+
- But…who should apply data analytics?<br />
 +
Producers (e.g., science teams), the data experts; Providers (e.g., data centers), who know how to build infrastructure/framework to support advancing data analysis; Users (e.g., researchers, decision support), who know exactly what their goals are<br />
 +
- An answer:  All… but the key, is to make sure knowledge, experience, and needs, are shared amongst all the groupings.<br />
  
▪ Data Supplier vs. Data User perspectives. We can surface/organize the analytics needs and use cases from both perspectives
 
  
▪ Another dimension of delineating Data Scientist and Data Analytics is along the Data Creator/Provider < --- > Data End User axis. -- The perspectives and the needs of Data Science and Data Analytics are very different where you are along that axis. -- Typically a real gap exists between the two perspectives
+
'''Discussion continued on Discussion Forum:  http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum'''
  
▪ Idea: We can consider focusing on the collection of case studies where organizations have implemented big data solutions to problems, carried out analytics, quality assurance, and have allowed policy makers to make informed decisions based on the end products of data science. From this body of work, which can highlight both successes and failures, I think that the group can begin to form recommendations on how organizations should proceed in data science based on their particular goals. It can also serve as a bed of research for data scientists and IT staff to consider alternatives to their own approaches.
 
  
 
===Next Telecon:===
 
===Next Telecon:===
* April 17, 3:00 EST (third Thursday of each month)
+
* May 22, 3:00 EST
 
* Agenda (as of now)
 
* Agenda (as of now)
  
- Analytics related topic to better understand.  DOES ANYBODY HAVE A TOPIC THEY WISH TO BETTER UNDERSTAND
+
- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities
  
- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities
+
- Continued discussion from last telecom:  Types of Analytics, and Tools/Techniques best suited for each type
  
- ESDA Activities
+
- ESDA Activities - Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection

Latest revision as of 14:54, September 4, 2014

ESDA Telecom notes – 4/17/14

Known Attendees:

Erin (ESIP Host), Steve Kempler, Brand Niemann, Seung Hee Kim, Robert Downs, Josh Young, chung-lin shie, Ken Keiser, Rudy Husar, fritz vanwijngaarden, Eric Kihn, John, Tiffany Mathews, suhung shen, Rahul Ramachandran, Walt Baskin, Joan Aron

Agenda:

1 – Present new Cluster Information Sharing Webasites -Steve

Introduction to the Earth Science Data Analytics Discussion Forum - http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum

Introduction to the Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection


2 – Joan Aron – To Present: Data Analytics Needs Scenario


3 – Rudy Husar – To present: User-Oriented Data Analytics and Tools in DataFed


4 – Tiffany Matthews – To lead discussion:

'enabling users to leverage data to observe more phenomena than what can be identified when studying an average'.

Tiffany will initiate discussion with her presentation entitled: " Atmospheric Science Data Center Sample Data Analytics Use Cases."


Presentations:


Notes:

Today, from Joan, Rudy, and Tiffany, we received three excellent, insightful presentations regarding the need of data analytics from a user perspective, and a data discovery perspective, as well as useful tools that can help the data user. Please give them a look via the links above.


The ESDA Discussion Forum is open for topic requests, ideas, references, and continued telecom discussion - http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum


Highlights from Joan's presentation:
- Provides an end user perspective for data analytics tools/technique needs: Risk Analysis, trends of Near Real Time data
- Need for linking continuous data from various sources
- Use case: Linking Climate and Ar Quality


Highlights from Rudy's presentation:
- Also, provides an end user perspective for Air Quality Decision Systems needed analytics
- DataFed provides a shared data pool (multiple sources), data browser, event screening, data and trend analysis


Highlights from Tiffany's presentation
- From the data provider point of view, provides this excellent perspective: "enable users to leverage data to observe more phenomena than what can be identified by studying an average
- Discussed dataset inter-calibrarions, inter-comparisons, finding data that is meaningful, and being able to analyze original source data associated with higher level data of interest.


Tiffany next led a discussion to answer the following questions:

1. What are your most time consuming data tasks that can leverage analytics?
2. Identify and discuss different types of analytics
3. What kind of data analytics is needed for specific use cases?
4. Identify tools and technologies that address different types of analytics


(Of course,) We did not get through all questions, but after a very good discussion, we decided to post the questions on the 'ESDA discussion Forum' (http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum) and continue discussion on the forum (I encourage all to participate with questions, answers, and experience)


Discussion highlights (thus far), focusing on the different types of data analytics:
Onemoretype.png


- Getting data, in particular, meaningful data is very time consuming
- Metadata is very useful in accessing and understanding data to determine its meaningfulness
- Using semantics to acquire information in metadata needs to be further pursued
- Making data usable in system (i.e., analytics tool, decision support, etc.) is time consuming; Automating process is sometimes difficult

- Types of analytics needed: Provider - Analytics to make data more usable
- Types of analytics needed: Provider/User - For data integration; Combine data from 2 or more data sources; what isn the best way to do this (<-- end goal dependent)
- This is the figure (I believe) Rudy was alluding to, when referring to Big Data Value Chain:

Analyticsvaluechain.tiff


 - Using analytics to combine data tools, and be able to reverse out of analytics to get back to the original data
- Tools: Needed for identifying new information from a combination of existing data
- Tools: For linking data to causes (thus working backwards: result --> cause --> data)
- Tools: Data fusion - for example, for environmental data analysis

- But…who should apply data analytics?
Producers (e.g., science teams), the data experts; Providers (e.g., data centers), who know how to build infrastructure/framework to support advancing data analysis; Users (e.g., researchers, decision support), who know exactly what their goals are
- An answer: All… but the key, is to make sure knowledge, experience, and needs, are shared amongst all the groupings.


Discussion continued on Discussion Forum: http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum


Next Telecon:

  • May 22, 3:00 EST
  • Agenda (as of now)

- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities

- Continued discussion from last telecom: Types of Analytics, and Tools/Techniques best suited for each type

- ESDA Activities - Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection