Difference between revisions of "Earth Science Data Analytics/2014-11-20 Telecon"

From Earth Science Information Partners (ESIP)
(Created page with "ESDA Telecom notes – 11/20/14 ===Known Attendees:=== ESIP Host (Erin), Steve Kempler, Jennifer Wei, Chung-lin Shie, Suhung Shen, Tiffany Mathews, Ethan McMahon, Robert Dow...")
 
 
(10 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
===Known Attendees:===
 
===Known Attendees:===
  
ESIP Host (Erin), Steve Kempler, Jennifer Wei, Chung-lin Shie, Suhung Shen, Tiffany Mathews, Ethan McMahon, Robert Downs, Brand Niemann
+
ESIP Host (Erin), Steve Kempler, Chung-lin Shie, Suhung Shen, Tiffany Mathews, Ethan McMahon, Robert Downs, Brand Niemann
  
 
===Agenda:===
 
===Agenda:===
 +
 +
CONGRATULATIONS TO ERIN!
 +
  
 
1 – Recap of our last telecon on Diagnostic Analytics
 
1 – Recap of our last telecon on Diagnostic Analytics
  
  
2 - Discussion:  Descriptive and Predictive Analytics
+
2 - Discussion:  Discoveritive and Predictive Analytics
  
  
3 – Planning ahead discussion:  ��- Winter ESIP Meeting ESDA Planning:  Sessions; Suggestions for guest speakers�;  Are we starting to learn enough to write a paper on the Types of Data Analytics Utilized in (the various phases of) Earth Science
+
3 – Planning ahead discussion:   Winter ESIP Meeting ESDA Planning:  Sessions; Suggestions for guest speakers;  Are we starting to learn enough to write a paper on the Types of Data Analytics Utilized in (the various phases of) Earth Science
  
  
Line 20: Line 23:
  
 
Presentations:
 
Presentations:
* [[Media: 2014_10-23 ESDA.pdf|  Steve Kempler: ESDA Cluster Discussion slides, October 23, 2014]]
+
* [[Media: 2014-11-20 ESDA.pdf|  Steve Kempler: ESDA Cluster Discussion slides, November 20, 2014]]
  
  
Line 26: Line 29:
  
  
Thank you all for attending a very interesting and 'getting more focused' teleconMuch appreciation goes to our speaker, Dr. George Djorgovski, who gave his all interesting insights on the complexity of data analysis, as we are beginning to know it, today.
+
Thank you all for attending.
 +
 
 +
Next, in our movement to review the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics, we discussed Discoveritive, Predictive, and Prescriptive Data Analytics.
 +
 
 +
As a reminder:
 +
 
 +
'''Types of Data Analytics'''
 +
 
 +
Descriptive Analytics:  You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.
 +
 
 +
Diagnostic Analytics: If you want to go deeper into the data you have collected from users in order to understand "Why some things happened," you can use … intelligence tools to get some insights.  
 +
 
 +
Discoveritive Analytics:  The use of data and analysis tools/models to discover information
 +
 
 +
Predictive Analytics:  If you can collect contextual data and correlate it with other user behavior datasets, as well as expand user data … you enter a whole new area where you can get real insights.
  
After a review of the highlights from our Frisco Cluster Meeting, and the discussion about Descriptive Data Analytics from our August telecon (see link to presentation, above), Dr. Djorgovski provided discussion (unfortunately no slides) on his experiences in working with large amounts of multi-variant data.  Dr. Djorgovski's relevant interests include: Development of e-Science/Cyber-infrastructure, the roles of computation in knowledge discovery, Astroinformatics, Virtual Observatory, advanced data-mining and exploration techniques.   Some of the main points I was able to extract (others please edit in your highlights) include:
+
Prescriptive Analytics:  Once you get to the point where you can consistently analyze your data to predict what's going to happen, you are very close to being able to understand what you should do in order to maximize good outcomes and also prevent potentially bad outcomes. This is on the edge of innovation today, but it's attainable!
  
- Not only are we dealing with large amounts of data, but we are also dealing with increasing data growth
 
  
- What is interesting is the need to deal with multi-dimensional data, fusing data, and matching data to model
+
The following '''Discoveritive Data Analytics''' definitions were offered:
  
- However, there are no tools that can analyze these datasets: Classify, data, look for outliers, correlate multi-dimensional data
+
- Tell me something that I don't know" is the definition of data mining - discovering unexpected patterns and relationships in data. (http://online-behavior.com/emetrics/data-discovery-1073)
 +
 
 +
- Four types of discovery analytics: visual discovery, data discovery, information discovery and event discovery (http://www.information-management.com/blogs/3-major-trends-in-new-discovery-analytics-10024769-1.html)
  
- Data rich science: Data contains knowledge, but it is currently not easily obtainable
 
  
- Machine learning would be helpful for knowledge discovery
+
The following '''Predictive Data Analytics''' definitions were offered:
  
- Need computer scientists, mathematicians, applied computer science
+
- Encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events
  
- But once solution is not globally applicable. Tat is why we need to look for commonalities between problems, and domain knowledge of the application
+
- Combines techniques from statistics, data mining and machine learning to find meaning from large amounts of data…and predict where you’re going.
 +
Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends.
 +
- Predictive analytics does not tell you what will happen in the future. It forecasts what might happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk assessment  (http://www.webopedia.com/TERM/P/predictive_analytics.html)
  
- Caltech, in partnership with JPL, has developed a student curriculum for Big Data Analytics  (https://www.coursera.org/course/bigdataschool).
+
- Predictive analytics is the branch of data mining concerned with the prediction of future probabilities and trends. (http://searchcrm.techtarget.com/definition/predictive-analytics)  
  
- Discussion: Dealing with data correlation - How to obtain causality?  Need more context, and need more variables.  Both subject to the need for Data Analytics tools that address causation. Discoverative Data Analysis?
+
- While regression analysis is commonly used, there exists another class of methods that deserve proper mentions. E.g. Bayes Network, Artificial Neural Net, Decision Tree, Support Vector Machine, etc. More importantly, the non-linear analysis aspect and the probability based approach that underpin many of the aforementioned methods.
  
  
Next, in our movement to review the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics, we discussed Diagnostics Data Analytics.
+
Bonus:  The following '''Prescriptive Data Analytics''' definitions were offered:
  
 +
- Prescriptive analytics goes beyond descriptive and predictive models by recommending one or more courses of action and showing the likely outcome of each decision
  
As a reminder:
+
- Prescriptive analytics goes beyond predicting future outcomes by also suggesting actions to benefit from the predictions and showing the decision maker the implications of each decision option.
  
'''Types of Data Analytics'''
 
  
Descriptive Analytics:  You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.
+
Providing examples, use cases, and additional understanding is highly encouraged.  
  
Diagnostic Analytics:  If you want to go deeper into the data you have collected from users in order to understand "Why some things happened," you can use … intelligence tools to get some insights.
+
'''Please contact Steve'''
  
Discoveritive Analytics:  The use of data and analysis tools/models to discover information
 
  
Predictive Analytics:  If you can collect contextual data and correlate it with other user behavior datasets, as well as expand user data … you enter a whole new area where you can get real insights.
+
We next talked about our two sessions at the Federation Meeting, in January
  
Prescriptive Analytics:  Once you get to the point where you can consistently analyze your data to predict what's going to happen, you are very close to being able to understand what you should do in order to maximize good outcomes and also prevent potentially bad outcomes. This is on the edge of innovation today, but it's attainable!
+
1 - '''Earth Science Data Analytics 101''':
  
 +
Purpose:  To ‘educate’ ESIP community on what Earth Science Data Analytics means, and provide exemplary use cases.
  
We also noted that the diagram in the presentation, showing the different types of Data Analytics, need to be revised to de-emphasize the data quality and timeliness relationship between the various types.  It also needs to be made more applicable to Earth science Data Analytics, thus putting our 'mark' on the subject.  Tiffany and I are going to give it a go.
+
Cluster Goal:  Bring in speakers to provide their Data Analytics Use Cases to stir innovation juices that can generate ideas/techniques/collaborations/etc. that can facilitate/aid usage of data analytics
  
 +
Draft Agenda:
  
The following Diagnostic Data Analytics definitions were offered:
+
- Introduction to Earth science data analytics – (15 min)
  
- Determine why something happened, using content analytics and natural language processing to cull insights found in documents, email, websites, social media and so on. Understand the root cause of geophysical changes through more detailed analysis and visualizations. (modified from: http://www.ibm.com/analytics/us/en/analytics-tools.html)
+
- 3 or 4 use case speakers (10-15 min each)   I have 2 already…any suggestions
  
- Diagnostic analytics looks deeper into what has happened and seeks to understand why a problem or event of interest occurs. How do various measurable events and actions in the focal domain relate to each other?  (http://www.lifescaleanalytics.com/files/lifescale/files/brief_descriptivetoprescriptive.pdf)
+
- Current Data Analytics technologies useful in Earth science (15 min)
  
- Diagnostic data analytics is used to answer the question “Why is it happening?”. It strives to identify root causes, key factors, and unseen patterns (http://webcache.googleusercontent.com/search?q=cache:abygIyZBFLIJ:www.ag-ai.nl/download/17445-21-3-art.Parekh.pdf+&cd=8&hl=en&ct=clnk&gl=us&client=safari)
+
- Panel – Q&A (all speakers)
  
 +
Excellent suggestions were made to ensure speakers targeted our interest in learning how they perform data analytics in their research.
  
The following comparison between Descriptive and Diagnostic Data Analytics was also discussed:
 
  
 +
2 - '''Earth Science Data Analytics 201''':
  
[[Image:Descriptive vs Diagnostic.png|500px]]
+
Purpose: To scope a study that would meaningfully benefit the ESIP and broad community; Develop an outline for the study
  
Providing examples, use cases, and additional understanding is highly encouraged.
+
Cluster Goal:  Discuss: Publish our findings; Generate a library of Data Analytic methodologies
  
 +
Discussion and work breakout.  This is where we will further discuss and develop a more detailed outline for a paper that describes Earth science data analytics methodologies.  It seems the bulk of our research at this time would be gathering and characterizing use cases.  This lead to the possibility of creating an Earth science data analytics library of such methodologies. 
  
'''Interested in participating, please contact Steve:'''  
+
More to come.  '''Be a part of some ground breaking work'''
  
'''Our telecon concluded with discussion on authoring a paper along the lines of:  Types of Data Analytics Utilized in (the various data analysis phases of) Earth Science.'''
 
  
'''The following work plan that can lead us to the development of such a paper''':
+
The following was provided to initiate discussion of such a paper:
  
 
1.  Take what we learn, refine, and define about the different types of Data Analytics
 
1.  Take what we learn, refine, and define about the different types of Data Analytics
Line 130: Line 151:
  
 
- Technology Trends for Big Science Data Management - Session ID#: 2525
 
- Technology Trends for Big Science Data Management - Session ID#: 2525
 
 
  
  
 
===Next Telecon:===
 
===Next Telecon:===
* November 20, 3:00 EST
+
* No telecon.  Face to face January 7 at the Federation Meeting in Washington
* Agenda (as of now)
 
 
 
- Listen and Learn - We will have a guest speakers to discuss their Analytics activities
 
 
 
- ESDA Activities - Discuss definitions for Discoveritive and Predictive Analytics
 
 
 
- Publish potential - Discuss Cluster collaborative paper loosely entitled:  Data Analytics in Earth Science Research… but we can discuss
 
  
- Winter ESIP Meeting session planning
+
* Agenda:  See planned sessions above

Latest revision as of 17:41, November 21, 2014

ESDA Telecom notes – 11/20/14

Known Attendees:

ESIP Host (Erin), Steve Kempler, Chung-lin Shie, Suhung Shen, Tiffany Mathews, Ethan McMahon, Robert Downs, Brand Niemann

Agenda:

CONGRATULATIONS TO ERIN!


1 – Recap of our last telecon on Diagnostic Analytics


2 - Discussion:  Discoveritive and Predictive Analytics


3 – Planning ahead discussion:   Winter ESIP Meeting ESDA Planning:  Sessions; Suggestions for guest speakers; Are we starting to learn enough to write a paper on the Types of Data Analytics Utilized in (the various phases of) Earth Science


4 - Open Mic – Thoughts, Ideas


Presentations:


Notes:

Thank you all for attending.

Next, in our movement to review the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics, we discussed Discoveritive, Predictive, and Prescriptive Data Analytics.

As a reminder:

Types of Data Analytics

Descriptive Analytics: You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.

Diagnostic Analytics: If you want to go deeper into the data you have collected from users in order to understand "Why some things happened," you can use … intelligence tools to get some insights.

Discoveritive Analytics: The use of data and analysis tools/models to discover information

Predictive Analytics: If you can collect contextual data and correlate it with other user behavior datasets, as well as expand user data … you enter a whole new area where you can get real insights.

Prescriptive Analytics: Once you get to the point where you can consistently analyze your data to predict what's going to happen, you are very close to being able to understand what you should do in order to maximize good outcomes and also prevent potentially bad outcomes. This is on the edge of innovation today, but it's attainable!


The following Discoveritive Data Analytics definitions were offered:

- Tell me something that I don't know" is the definition of data mining - discovering unexpected patterns and relationships in data. (http://online-behavior.com/emetrics/data-discovery-1073)   - Four types of discovery analytics: visual discovery, data discovery, information discovery and event discovery (http://www.information-management.com/blogs/3-major-trends-in-new-discovery-analytics-10024769-1.html)


The following Predictive Data Analytics definitions were offered:

- Encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events

- Combines techniques from statistics, data mining and machine learning to find meaning from large amounts of data…and predict where you’re going. Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. - Predictive analytics does not tell you what will happen in the future. It forecasts what might happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk assessment (http://www.webopedia.com/TERM/P/predictive_analytics.html)

- Predictive analytics is the branch of data mining concerned with the prediction of future probabilities and trends. (http://searchcrm.techtarget.com/definition/predictive-analytics)

- While regression analysis is commonly used, there exists another class of methods that deserve proper mentions. E.g. Bayes Network, Artificial Neural Net, Decision Tree, Support Vector Machine, etc. More importantly, the non-linear analysis aspect and the probability based approach that underpin many of the aforementioned methods.


Bonus: The following Prescriptive Data Analytics definitions were offered:

- Prescriptive analytics goes beyond descriptive and predictive models by recommending one or more courses of action and showing the likely outcome of each decision

- Prescriptive analytics goes beyond predicting future outcomes by also suggesting actions to benefit from the predictions and showing the decision maker the implications of each decision option.


Providing examples, use cases, and additional understanding is highly encouraged.

Please contact Steve


We next talked about our two sessions at the Federation Meeting, in January

1 - Earth Science Data Analytics 101:

Purpose: To ‘educate’ ESIP community on what Earth Science Data Analytics means, and provide exemplary use cases.

Cluster Goal: Bring in speakers to provide their Data Analytics Use Cases to stir innovation juices that can generate ideas/techniques/collaborations/etc. that can facilitate/aid usage of data analytics

Draft Agenda:

- Introduction to Earth science data analytics – (15 min)

- 3 or 4 use case speakers (10-15 min each) I have 2 already…any suggestions

- Current Data Analytics technologies useful in Earth science (15 min)

- Panel – Q&A (all speakers)

Excellent suggestions were made to ensure speakers targeted our interest in learning how they perform data analytics in their research.


2 - Earth Science Data Analytics 201:

Purpose: To scope a study that would meaningfully benefit the ESIP and broad community; Develop an outline for the study

Cluster Goal: Discuss: Publish our findings; Generate a library of Data Analytic methodologies

Discussion and work breakout. This is where we will further discuss and develop a more detailed outline for a paper that describes Earth science data analytics methodologies. It seems the bulk of our research at this time would be gathering and characterizing use cases. This lead to the possibility of creating an Earth science data analytics library of such methodologies.

More to come. Be a part of some ground breaking work


The following was provided to initiate discussion of such a paper:

1. Take what we learn, refine, and define about the different types of Data Analytics

- Descriptive Analytics - Diagnostic Analytics - Discoveritive Analytics - Predictive Analytics - Prescriptive Analytics

2. Associate exemplary Earth science use cases to each type

3. Associate Data Analytics techniques/tools to each type

4. Associate user categories to each type

5. Describe skills and expertise needed for each type

- Currently, we talk about our expertise and experience, but they seldom seem to connect to each other

- This will help us, the industry, and hopefully, educators, focus their understanding and interests regarding Earth Science Data Analytics.


REMINDER: AGU sessions that pertain to Data Analytics:

- Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research: The Future Is Here - Session ID#: 1879

- Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas - Session ID#: 1809

- Advancing Analytics using Big Data Climate Information System - Session ID#: 3022

- Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm - Session ID#: 3292

- Leveraging Enabling Technologies and Architectures to enable Data Intensive Science - Session ID#: 3041

- Open source solutions for analyzing big earth observation data - Session ID#: 3080

- Technology Trends for Big Science Data Management - Session ID#: 2525


Next Telecon:

  • No telecon. Face to face January 7 at the Federation Meeting in Washington
  • Agenda: See planned sessions above