Difference between revisions of "Earth Science Data Analytics/2014-10-23 Telecon"

From Earth Science Information Partners (ESIP)
(Created page with "ESDA Telecom notes – 10/23/14 ===Known Attendees:=== To Be Provided ===Agenda:=== 1 – Steve Kempler - Recap of our last telecon on Descriptive Analytics 2 – Guest...")
 
 
(11 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
===Known Attendees:===
 
===Known Attendees:===
  
To Be Provided
+
ESIP Host (Erin), Steve Kempler, Jennifer Wei, Beth Huffer, Chung-lin Shie, S Doman Bennett, Seung Hee Kim, Suhung Shen, Tiffany, Eric Kihn, Djorgovski, Beth Huffer, Ethan McMahon, Robert Downs
  
 
===Agenda:===
 
===Agenda:===
Line 25: Line 25:
  
  
Prior to working through the agenda, AGU sessions that pertain to Data Analytics was provided:
+
Thank you all for attending a very interesting and 'getting more focused' telecon.  Much appreciation goes to our speaker, Dr. George Djorgovski, who gave his all interesting insights on the complexity of data analysis, as we are beginning to know it, today.
  
- Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research: The Future Is Here - Session ID#: 1879
+
After a review of the highlights from our Frisco Cluster Meeting, and the discussion about Descriptive Data Analytics from our August telecon (see link to presentation, above), Dr. Djorgovski provided discussion (unfortunately no slides) on his experiences in working with large amounts of multi-variant data. Dr. Djorgovski's relevant interests include: Development of e-Science/Cyber-infrastructure, the roles of computation in knowledge discovery, Astroinformatics, Virtual Observatory, advanced data-mining and exploration techniques.  Some of the main points I was able to extract (others please edit in your highlights) include:
 
 
- Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas - Session ID#: 1809
 
 
 
- Advancing Analytics using Big Data Climate Information System - Session ID#: 3022
 
  
- Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm - Session ID#: 3292
+
- Not only are we dealing with large amounts of data, but we are also dealing with increasing data growth
  
- Leveraging Enabling Technologies and Architectures to enable Data Intensive Science - Session ID#: 3041
+
- What is interesting is the need to deal with multi-dimensional data, fusing data, and matching data to model
  
- Open source solutions for analyzing big earth observation data - Session ID#: 3080
+
- However, there are no tools that can analyze these datasets: Classify, data, look for outliers, correlate multi-dimensional data
  
- Technology Trends for Big Science Data Management - Session ID#: 2525
+
- Data rich science: Data contains knowledge, but it is currently not easily obtainable
  
 +
- Machine learning would be helpful for knowledge discovery
  
As previously reported the ESDA Cluster session in Frisco was attended by people with a variety of interests... but still mostly to learn more about Data Analytics and its application to Earth science data utilized in furthering our understanding of our planet.  The following observations from the Frisco session were reviewed:
+
- Need computer scientists, mathematicians, applied computer science
  
- ESDA goal is to facilitate making information into knowledge
+
- But once solution is not globally applicable.  Tat is why we need to look for commonalities between problems, and domain knowledge of the application
  
- The ESDA Cluster, attracting a lot of interest, continues to ’churn’ through the process of maturing their understanding and impacts of this new paradigm: Data Analytics and Data Science. 
+
- Caltech, in partnership with JPL, has developed a student curriculum for Big Data Analytics (https://www.coursera.org/course/bigdataschool).
  
- Session participants were comprised of technologists and data users, with the majority of people, in attendance to ‘learn’. 
+
- Discussion: Dealing with data correlation - How to obtain causality?  Need more context, and need more variables.  Both subject to the need for Data Analytics tools that address causation. Discoverative Data Analysis?
  
- Thus, in the early stages of this Cluster life, we continue to emphasize learning, which will doubtlessly evolve into applying (shaping) the knowledge we gain into implementable techniques that facilitate the use and advancement of data analytics and data science.
 
  
 +
Next, in our movement to review the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics, we discussed Diagnostics Data Analytics.
  
Next, we reviewed the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics.  Our goal is to clearly characterize data analytics by types in terms of their purpose, available tools, users, usages, and use cases.  Through this ordering, we (hopefully) will be able to enable analytics connections, and identify gaps per specific community.  Right now, as we learn more about this data analytics, we seem to be throwing every use case, user, etc. in the same sink.  However, by applying definitions, we will be able to make clear connections between particular data analytics usage scenarios: of the same type; that share tools; have common methodologies.
 
  
 +
As a reminder:
  
 
'''Types of Data Analytics'''
 
'''Types of Data Analytics'''
Line 69: Line 66:
  
  
At this telecon, we tackled the first type of Data Analytics: '''Descriptive Analytics'''.
+
We also noted that the diagram in the presentation, showing the different types of Data Analytics, need to be revised to de-emphasize the data quality and timeliness relationship between the various types. It also needs to be made more applicable to Earth science Data Analytics, thus putting our 'mark' on the subject.  Tiffany and I are going to give it a go.
  
The following Descriptive Analytics definitions were offered:
 
  
'''Descriptive Analytics: You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.'''
+
The following Diagnostic Data Analytics definitions were offered:
  
What does Descriptive Data Analytics mean?  What does it do?  How it is used? Examples! Where in Earth science would this be used?  Which users?
+
- Determine why something happened, using content analytics and natural language processing to cull insights found in documents, email, websites, social media and so on. Understand the root cause of geophysical changes through more detailed analysis and visualizations. (modified from: http://www.ibm.com/analytics/us/en/analytics-tools.html)
- Purpose of descriptive analytics is to summarize and tell you what has happened in the past
 
  
- "the simplest class of analytics," one that allows you to condense big data into smaller, more useful nuggets of information. http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-2-Understanding-Predictive-Analytics/ba-p/79616
+
- Diagnostic analytics looks deeper into what has happened and seeks to understand why a problem or event of interest occurs. How do various measurable events and actions in the focal domain relate to each other?  (http://www.lifescaleanalytics.com/files/lifescale/files/brief_descriptivetoprescriptive.pdf)
  
- compute descriptive statistics (i.e. counts, sums, averages, percentages, simple arithmetic) that summarizes certain groupings or filtered version of the data, which are typically simple counts of some events. They are mostly based on standard aggregate functions http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-1-Descriptive-Analytics/ba-p/77766
+
- Diagnostic data analytics is used to answer the question “Why is it happening?”. It strives to identify root causes, key factors, and unseen patterns (http://webcache.googleusercontent.com/search?q=cache:abygIyZBFLIJ:www.ag-ai.nl/download/17445-21-3-art.Parekh.pdf+&cd=8&hl=en&ct=clnk&gl=us&client=safari)
  
- The purpose of descriptive analytics is simply to summarize and tell you what happened. For example, number of post, mentions, fans, followers, page views, kudos, +1s, check-ins, pins, etc. …simple event counters.
 
  
- Other descriptive analytics may be results of simple arithmetic operations, such as share of voice, average response time, % index, average number of replies per post, etc. http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-1-Descriptive-Analytics/ba-p/77766
+
The following comparison between Descriptive and Diagnostic Data Analytics was also discussed:
  
- Following the NetFlix approach, Amazon uses "Descriptive" analytics to process what you have purchased in the past, to predict what books, videos, and things you might like in the future
 
  
- Descriptive analytics answers the question, "What happened…?" It looks at data and information to describe the current situation in a way that trends, patterns and exceptions become apparent  http://www.mu-sigma.com/analytics/ecosystem/dipp.html
+
[[Image:Descriptive vs Diagnostic.png|500px]]
  
- Descriptive statistics is the discipline of quantitatively describing the main features of a collection of information or the quantitative description  http://en.wikipedia.org/wiki/Descriptive_statistics
+
Providing examples, use cases, and additional understanding is highly encouraged.  
  
- Natural Hazards: Looking for Patterns and Trends; Bringing in heterogeneous datasets, together summarized, to detect patterns
+
 
Erin to provide slides: air quality ‘use case’
+
'''Interested in participating, please contact Steve:'''
 +
 
 +
'''Our telecon concluded with discussion on authoring a paper along the lines of:  Types of Data Analytics Utilized in (the various data analysis phases of) Earth Science.'''
 +
 
 +
'''The following work plan that can lead us to the development of such a paper''':
 +
 
 +
1.  Take what we learn, refine, and define about the different types of Data Analytics
 +
 
 +
- Descriptive Analytics
 +
- Diagnostic Analytics
 +
- Discoveritive Analytics
 +
- Predictive Analytics
 +
- Prescriptive Analytics
 +
 
 +
2. Associate exemplary Earth science use cases to each type
 +
 
 +
3. Associate Data Analytics techniques/tools to each type
 +
 
 +
4. Associate user categories to each type
 +
 
 +
5. Describe skills and expertise needed for each type
 +
 
 +
- Currently, we talk about our expertise and experience, but they seldom seem to connect to each other
 +
 
 +
- This will help us, the industry, and hopefully, educators, focus their understanding and interests regarding Earth Science Data Analytics.
 +
 
 +
 
 +
 
 +
REMINDER:  AGU sessions that pertain to Data Analytics:
 +
 
 +
- Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research:  The Future Is Here - Session ID#: 1879
 +
 
 +
- Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas - Session ID#: 1809
 +
 
 +
- Advancing Analytics using Big Data Climate Information System - Session ID#: 3022
 +
 
 +
- Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm - Session ID#: 3292
 +
 
 +
- Leveraging Enabling Technologies and Architectures to enable Data Intensive Science - Session ID#: 3041
 +
 
 +
- Open source solutions for analyzing big earth observation data - Session ID#: 3080
 +
 
 +
- Technology Trends for Big Science Data Management - Session ID#: 2525
  
  
Providing additional examples, experiences, and understanding is highly encouraged.  Our research, and collective notes could be a huge understanding to the usage of data analytics to advance Earth science.
 
  
  
===Next Telecon -> Meeting in Frisco:===
+
===Next Telecon:===
* October 16, 3:00 EST (Apologies, I will be out of the office 3 Thursdays in September)
+
* November 20, 3:00 EST  
 
* Agenda (as of now)
 
* Agenda (as of now)
  
- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities
+
- Listen and Learn - We will have a guest speakers to discuss their Analytics activities
  
- ESDA Activities - Discuss definitions for Diagnostics and Discoveritive Analytics
+
- ESDA Activities - Discuss definitions for Discoveritive and Predictive Analytics
  
 
- Publish potential - Discuss Cluster collaborative paper loosely entitled:  Data Analytics in Earth Science Research… but we can discuss
 
- Publish potential - Discuss Cluster collaborative paper loosely entitled:  Data Analytics in Earth Science Research… but we can discuss
  
- Proposed Winter ESIP Meeting sessions
+
- Winter ESIP Meeting session planning

Latest revision as of 08:53, November 14, 2014

ESDA Telecom notes – 10/23/14

Known Attendees:

ESIP Host (Erin), Steve Kempler, Jennifer Wei, Beth Huffer, Chung-lin Shie, S Doman Bennett, Seung Hee Kim, Suhung Shen, Tiffany, Eric Kihn, Djorgovski, Beth Huffer, Ethan McMahon, Robert Downs

Agenda:

1 – Steve Kempler - Recap of our last telecon on Descriptive Analytics


2 – Guest Speakers: George Djorgovski, Cal Tech, who is interested in the roles of computation in knowledge discovery.


3 – Discussion: Diagnostic Analytics


4- Steve Kempler - Planning Ahead Discussion

Presentations:


Notes:

Thank you all for attending a very interesting and 'getting more focused' telecon. Much appreciation goes to our speaker, Dr. George Djorgovski, who gave his all interesting insights on the complexity of data analysis, as we are beginning to know it, today.

After a review of the highlights from our Frisco Cluster Meeting, and the discussion about Descriptive Data Analytics from our August telecon (see link to presentation, above), Dr. Djorgovski provided discussion (unfortunately no slides) on his experiences in working with large amounts of multi-variant data. Dr. Djorgovski's relevant interests include: Development of e-Science/Cyber-infrastructure, the roles of computation in knowledge discovery, Astroinformatics, Virtual Observatory, advanced data-mining and exploration techniques. Some of the main points I was able to extract (others please edit in your highlights) include:

- Not only are we dealing with large amounts of data, but we are also dealing with increasing data growth

- What is interesting is the need to deal with multi-dimensional data, fusing data, and matching data to model

- However, there are no tools that can analyze these datasets: Classify, data, look for outliers, correlate multi-dimensional data

- Data rich science: Data contains knowledge, but it is currently not easily obtainable

- Machine learning would be helpful for knowledge discovery

- Need computer scientists, mathematicians, applied computer science

- But once solution is not globally applicable. Tat is why we need to look for commonalities between problems, and domain knowledge of the application

- Caltech, in partnership with JPL, has developed a student curriculum for Big Data Analytics (https://www.coursera.org/course/bigdataschool).

- Discussion: Dealing with data correlation - How to obtain causality? Need more context, and need more variables. Both subject to the need for Data Analytics tools that address causation. Discoverative Data Analysis?


Next, in our movement to review the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics, we discussed Diagnostics Data Analytics.


As a reminder:

Types of Data Analytics

Descriptive Analytics: You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.

Diagnostic Analytics: If you want to go deeper into the data you have collected from users in order to understand "Why some things happened," you can use … intelligence tools to get some insights.

Discoveritive Analytics: The use of data and analysis tools/models to discover information

Predictive Analytics: If you can collect contextual data and correlate it with other user behavior datasets, as well as expand user data … you enter a whole new area where you can get real insights.

Prescriptive Analytics: Once you get to the point where you can consistently analyze your data to predict what's going to happen, you are very close to being able to understand what you should do in order to maximize good outcomes and also prevent potentially bad outcomes. This is on the edge of innovation today, but it's attainable!


We also noted that the diagram in the presentation, showing the different types of Data Analytics, need to be revised to de-emphasize the data quality and timeliness relationship between the various types. It also needs to be made more applicable to Earth science Data Analytics, thus putting our 'mark' on the subject. Tiffany and I are going to give it a go.


The following Diagnostic Data Analytics definitions were offered:

- Determine why something happened, using content analytics and natural language processing to cull insights found in documents, email, websites, social media and so on. Understand the root cause of geophysical changes through more detailed analysis and visualizations. (modified from: http://www.ibm.com/analytics/us/en/analytics-tools.html)

- Diagnostic analytics looks deeper into what has happened and seeks to understand why a problem or event of interest occurs. How do various measurable events and actions in the focal domain relate to each other? (http://www.lifescaleanalytics.com/files/lifescale/files/brief_descriptivetoprescriptive.pdf)

- Diagnostic data analytics is used to answer the question “Why is it happening?”. It strives to identify root causes, key factors, and unseen patterns (http://webcache.googleusercontent.com/search?q=cache:abygIyZBFLIJ:www.ag-ai.nl/download/17445-21-3-art.Parekh.pdf+&cd=8&hl=en&ct=clnk&gl=us&client=safari)


The following comparison between Descriptive and Diagnostic Data Analytics was also discussed:


Descriptive vs Diagnostic.png

Providing examples, use cases, and additional understanding is highly encouraged.


Interested in participating, please contact Steve:

Our telecon concluded with discussion on authoring a paper along the lines of: Types of Data Analytics Utilized in (the various data analysis phases of) Earth Science.

The following work plan that can lead us to the development of such a paper:

1. Take what we learn, refine, and define about the different types of Data Analytics

- Descriptive Analytics - Diagnostic Analytics - Discoveritive Analytics - Predictive Analytics - Prescriptive Analytics

2. Associate exemplary Earth science use cases to each type

3. Associate Data Analytics techniques/tools to each type

4. Associate user categories to each type

5. Describe skills and expertise needed for each type

- Currently, we talk about our expertise and experience, but they seldom seem to connect to each other

- This will help us, the industry, and hopefully, educators, focus their understanding and interests regarding Earth Science Data Analytics.


REMINDER: AGU sessions that pertain to Data Analytics:

- Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research: The Future Is Here - Session ID#: 1879

- Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas - Session ID#: 1809

- Advancing Analytics using Big Data Climate Information System - Session ID#: 3022

- Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm - Session ID#: 3292

- Leveraging Enabling Technologies and Architectures to enable Data Intensive Science - Session ID#: 3041

- Open source solutions for analyzing big earth observation data - Session ID#: 3080

- Technology Trends for Big Science Data Management - Session ID#: 2525



Next Telecon:

  • November 20, 3:00 EST
  • Agenda (as of now)

- Listen and Learn - We will have a guest speakers to discuss their Analytics activities

- ESDA Activities - Discuss definitions for Discoveritive and Predictive Analytics

- Publish potential - Discuss Cluster collaborative paper loosely entitled: Data Analytics in Earth Science Research… but we can discuss

- Winter ESIP Meeting session planning