Difference between revisions of "Earth Science Data Analytics/2014-08-21 Telecon"

From Earth Science Information Partners (ESIP)
(Created page with "ESDA Telecom notes – 8/21/14 ===Known Attendees:=== ESIP Host (Erin), Steve Kempler, Ward, Chung-lin Shie, H. Joe Lee, Sara Graves, Joan Aron, Suhung Shen, Joan Aron, Seun...")
 
 
(14 intermediate revisions by the same user not shown)
Line 15: Line 15:
 
3  – Discussion:  Descriptive Analytics
 
3  – Discussion:  Descriptive Analytics
  
 +
Presentations:
 +
* [[Media: 2014_08_21 ESDA.pdf|  Steve Kempler: ESDA Cluster Discussion slides, August 21, 2014]]
 +
 +
 +
===Notes:===
 +
 +
 +
Prior to working through the agenda, AGU sessions that pertain to Data Analytics was provided:
 +
 +
- Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research:  The Future Is Here - Session ID#: 1879
 +
 +
- Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas - Session ID#: 1809
 +
 +
- Advancing Analytics using Big Data Climate Information System - Session ID#: 3022
 +
 +
- Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm - Session ID#: 3292
 +
 +
- Leveraging Enabling Technologies and Architectures to enable Data Intensive Science - Session ID#: 3041
 +
 +
- Open source solutions for analyzing big earth observation data - Session ID#: 3080
 +
 +
- Technology Trends for Big Science Data Management - Session ID#: 2525
 +
 +
 +
As previously reported the ESDA Cluster session in Frisco was attended by people with a variety of interests... but still mostly to learn more about Data Analytics and its application to Earth science data utilized in furthering our understanding of our planet.  The following observations from the Frisco session were reviewed:
 +
 +
- ESDA goal is to facilitate making information into knowledge
 +
 +
- The ESDA Cluster, attracting a lot of interest, continues to ’churn’ through the process of maturing their understanding and impacts of this new paradigm: Data Analytics and Data Science. 
 +
 +
- Session participants were comprised of technologists and data users, with the majority of people, in attendance to ‘learn’. 
 +
 +
- Thus, in the early stages of this Cluster life, we continue to emphasize learning, which will doubtlessly evolve into applying (shaping) the knowledge we gain into implementable techniques that facilitate the use and advancement of data analytics and data science.
 +
 +
 +
Next, we reviewed the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics.  Our goal is to clearly characterize data analytics by types in terms of their purpose, available tools, users, usages, and use cases.  Through this ordering, we (hopefully) will be able to enable analytics connections, and identify gaps per specific community.  Right now, as we learn more about this data analytics, we seem to be throwing every use case, user, etc. in the same sink.  However, by applying definitions, we will be able to make clear connections between particular data analytics usage scenarios: of the same type; that share tools; have common methodologies.
 +
 +
 +
'''Types of Data Analytics'''
 +
 +
Descriptive Analytics:  You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.
 +
 +
Diagnostic Analytics:  If you want to go deeper into the data you have collected from users in order to understand "Why some things happened," you can use … intelligence tools to get some insights.
  
 +
Discoveritive Analytics:  The use of data and analysis tools/models to discover information
  
Presentations:
+
Predictive Analytics: If you can collect contextual data and correlate it with other user behavior datasets, as well as expand user data … you enter a whole new area where you can get real insights.
* Ralph KahnGlobal, Satellite-Remote-Sensing Aerosol Studies: What We Do, and Why It Matters  (Presentation posting pending agreement by co-authors)
+
 
 +
Prescriptive AnalyticsOnce you get to the point where you can consistently analyze your data to predict what's going to happen, you are very close to being able to understand what you should do in order to maximize good outcomes and also prevent potentially bad outcomes. This is on the edge of innovation today, but it's attainable!
 +
 
 +
 
 +
At this telecon, we tackled the first type of Data Analytics:  '''Descriptive Analytics'''.
 +
 
 +
The following Descriptive Analytics definitions were offered:
 +
 
 +
'''Descriptive Analytics:  You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.'''
 +
 
 +
What does Descriptive Data Analytics mean?  What does it do?  How it is used? Examples! Where in Earth science would this be used?  Which users?
 +
- Purpose of descriptive analytics is to summarize and tell you what has happened in the past
 +
 
 +
- "the simplest class of analytics," one that allows you to condense big data into smaller, more useful nuggets of information. http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-2-Understanding-Predictive-Analytics/ba-p/79616
 +
 
 +
- compute descriptive statistics (i.e. counts, sums, averages, percentages, simple arithmetic) that summarizes certain groupings or filtered version of the data, which are typically simple counts of some events. They are mostly based on standard aggregate functions http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-1-Descriptive-Analytics/ba-p/77766
 +
 
 +
- The purpose of descriptive analytics is simply to summarize and tell you what happened. For example, number of post, mentions, fans, followers, page views, kudos, +1s, check-ins, pins, etc. …simple event counters.
 +
 
 +
- Other descriptive analytics may be results of simple arithmetic operations, such as share of voice, average response time, % index, average number of replies per post, etc. http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-1-Descriptive-Analytics/ba-p/77766
 +
 
 +
- Following the NetFlix approach, Amazon uses "Descriptive" analytics to process what you have purchased in the past, to predict what books, videos, and things you might like in the future
  
 +
- Descriptive analytics answers the question, "What happened…?" It looks at data and information to describe the current situation in a way that trends, patterns and exceptions become apparent  http://www.mu-sigma.com/analytics/ecosystem/dipp.html
  
 +
- Descriptive statistics is the discipline of quantitatively describing the main features of a collection of information or the quantitative description  http://en.wikipedia.org/wiki/Descriptive_statistics
  
===Notes:===
+
- Natural Hazards: Looking for Patterns and Trends; Bringing in heterogeneous datasets, together summarized, to detect patterns
 +
Erin to provide slides: air quality ‘use case’
  
  
Ralph Kahn, an atmospheric research scientist, specializing in aerosols, is very experienced in bringing together datasets from various remote sensing instruments to further his researchThus, Ralph was able to bring us a perspective of utilizing Data Analytics, for specific science research and discovery, that we had not seen before.  Unlike more familiar utilizations of data analytics that draw relationships that provide the means for predictions, and potentially prescriptions, Ralph introduced us to his methods for identifying patterns and relationships between heterogeneous datasets to glean specific science research findings.  Ralph indicated that methods employed: scatterplots, difference plots, binning, etc. are specific to the data and the analysis being pursued.  Research intense data analytics are an intricate part of the research and understanding the research, being pursued.  In a follow-up discussion, Ralph observed applications communities, who tend to perform more routine operations on data, may be able to benefit more from tools that provide data analytics methodologies.  (As opposed to one-off methods created for specific research).  In conclusion, Ralph described two big issues we face:  People Over-Interpreting the Data and: The Easier It is to make “pretty plots,” The more this tends to happen. (The latter refers to the mis-use of data discovery tools to perform or demonstrate science)
+
Providing additional examples, experiences, and understanding is highly encouragedOur research, and collective notes could be a huge understanding to the usage of data analytics to advance Earth science.
  
Joan next spoke about 'Data Publications in Data Browsers for Earth System Science', introducing a potential effort to: 'Develop best practices for Earth System Science data publications in data browsers by mining scientific publications, producing data publications and disseminating results'.  This will be discussed further in Frisco.
 
  
Steve concluded the telecon with a brief discussion on ESIP ESDA cluster breakout preparation, in Frisco.  An ambitious schedule, we will have a guest speaker, followed by analysis of the data analytics use cases and methodologies we have thus far recorded.  See Meeting in Frisco agenda, below.
+
===Next Telecon:===
 +
* October 16, 3:00 EST (Apologies, I will be out of the office 3 Thursdays in September)
 +
* Agenda (as of now)
  
Travel safely and we'll see you next week in Frisco.
+
- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities
  
 +
- ESDA Activities - Discuss definitions for Diagnostics and Discoveritive Analytics
  
 +
- Publish potential - Discuss Cluster collaborative paper loosely entitled:  Data Analytics in Earth Science Research… but we can discuss
  
===Next Telecon -> Meeting in Frisco:===
+
- Proposed Winter ESIP Meeting sessions
* July 10, 2:00 MST
 
* Room: Ptarmigan A, and will post WebEx access
 
* Agenda (as of now):
 
** Review: What we have done
 
** Guest Speaker: Peter Fox -    ~ the role of the Data Scientist in facilitating the definition and subsequent usability of Data Analytics to further Earth science research
 
** Analysis:  Gleaning out information from our Data Analytics Use Case and Data Analytics Tools matrices - Use Case Collection webpage -
 
***    Use Case Matrix Analysis – Gleaning out Data Analytics needs (http://wiki.esipfed.org/index.php/Use_Case_Collection) <br />
 
***    Data Analytics Tools Matrix – Gleaning out what tools can provide (http://wiki.esipfed.org/index.php/Analytics_Tools) <br />
 
***    Summary of past speakers – Gleaning out highlights:  Data Analytics needs and/or tools and their targets (http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Telecom_Presentations) <br />
 
** Additional Topics:
 
*** Gaps Analysis… Matching our user needs with our known available tools <br />
 
*** Data Publications in Data Browsers for Earth System Science <br />
 
*** Tool Matchup update – Matches tools with data.  Interest here:  Match Data Analytics tools with data.  Notes: User dependent: Who are the target users?  Do they start with data or tools.  Visit User Model Matrix<br />
 
**Looking Forward: Where we are going
 

Latest revision as of 09:35, November 13, 2014

ESDA Telecom notes – 8/21/14

Known Attendees:

ESIP Host (Erin), Steve Kempler, Ward, Chung-lin Shie, H. Joe Lee, Sara Graves, Joan Aron, Suhung Shen, Joan Aron, Seung Hee Kim, Robert Downs, Smiley

Agenda:

1 – Steve Kempler - Welcome back from Frisco


2 – Postponed to future telecon: - Guest Speakers: George Djorgovski, Cal Tech, who is interested in the roles of computation in knowledge discovery.


3 – Discussion: Descriptive Analytics

Presentations:


Notes:

Prior to working through the agenda, AGU sessions that pertain to Data Analytics was provided:

- Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research: The Future Is Here - Session ID#: 1879

- Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas - Session ID#: 1809

- Advancing Analytics using Big Data Climate Information System - Session ID#: 3022

- Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm - Session ID#: 3292

- Leveraging Enabling Technologies and Architectures to enable Data Intensive Science - Session ID#: 3041

- Open source solutions for analyzing big earth observation data - Session ID#: 3080

- Technology Trends for Big Science Data Management - Session ID#: 2525


As previously reported the ESDA Cluster session in Frisco was attended by people with a variety of interests... but still mostly to learn more about Data Analytics and its application to Earth science data utilized in furthering our understanding of our planet. The following observations from the Frisco session were reviewed:

- ESDA goal is to facilitate making information into knowledge

- The ESDA Cluster, attracting a lot of interest, continues to ’churn’ through the process of maturing their understanding and impacts of this new paradigm: Data Analytics and Data Science. 

- Session participants were comprised of technologists and data users, with the majority of people, in attendance to ‘learn’. 

- Thus, in the early stages of this Cluster life, we continue to emphasize learning, which will doubtlessly evolve into applying (shaping) the knowledge we gain into implementable techniques that facilitate the use and advancement of data analytics and data science.


Next, we reviewed the various types of Data Analytics, with the objective to clarify and specifically define, one by one, each type of data analytics. Our goal is to clearly characterize data analytics by types in terms of their purpose, available tools, users, usages, and use cases. Through this ordering, we (hopefully) will be able to enable analytics connections, and identify gaps per specific community. Right now, as we learn more about this data analytics, we seem to be throwing every use case, user, etc. in the same sink. However, by applying definitions, we will be able to make clear connections between particular data analytics usage scenarios: of the same type; that share tools; have common methodologies.


Types of Data Analytics

Descriptive Analytics: You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.

Diagnostic Analytics: If you want to go deeper into the data you have collected from users in order to understand "Why some things happened," you can use … intelligence tools to get some insights.

Discoveritive Analytics: The use of data and analysis tools/models to discover information

Predictive Analytics: If you can collect contextual data and correlate it with other user behavior datasets, as well as expand user data … you enter a whole new area where you can get real insights.

Prescriptive Analytics: Once you get to the point where you can consistently analyze your data to predict what's going to happen, you are very close to being able to understand what you should do in order to maximize good outcomes and also prevent potentially bad outcomes. This is on the edge of innovation today, but it's attainable!


At this telecon, we tackled the first type of Data Analytics: Descriptive Analytics.

The following Descriptive Analytics definitions were offered:

Descriptive Analytics: You can quickly understand "what happened" during a given period in the past and verify if a campaign was successful or not based on simple parameters.

What does Descriptive Data Analytics mean? What does it do? How it is used? Examples! Where in Earth science would this be used? Which users? - Purpose of descriptive analytics is to summarize and tell you what has happened in the past

- "the simplest class of analytics," one that allows you to condense big data into smaller, more useful nuggets of information. http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-2-Understanding-Predictive-Analytics/ba-p/79616

- compute descriptive statistics (i.e. counts, sums, averages, percentages, simple arithmetic) that summarizes certain groupings or filtered version of the data, which are typically simple counts of some events. They are mostly based on standard aggregate functions http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-1-Descriptive-Analytics/ba-p/77766

- The purpose of descriptive analytics is simply to summarize and tell you what happened. For example, number of post, mentions, fans, followers, page views, kudos, +1s, check-ins, pins, etc. …simple event counters.

- Other descriptive analytics may be results of simple arithmetic operations, such as share of voice, average response time, % index, average number of replies per post, etc. http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-1-Descriptive-Analytics/ba-p/77766

- Following the NetFlix approach, Amazon uses "Descriptive" analytics to process what you have purchased in the past, to predict what books, videos, and things you might like in the future

- Descriptive analytics answers the question, "What happened…?" It looks at data and information to describe the current situation in a way that trends, patterns and exceptions become apparent http://www.mu-sigma.com/analytics/ecosystem/dipp.html

- Descriptive statistics is the discipline of quantitatively describing the main features of a collection of information or the quantitative description http://en.wikipedia.org/wiki/Descriptive_statistics

- Natural Hazards: Looking for Patterns and Trends; Bringing in heterogeneous datasets, together summarized, to detect patterns Erin to provide slides: air quality ‘use case’


Providing additional examples, experiences, and understanding is highly encouraged. Our research, and collective notes could be a huge understanding to the usage of data analytics to advance Earth science.


Next Telecon:

  • October 16, 3:00 EST (Apologies, I will be out of the office 3 Thursdays in September)
  • Agenda (as of now)

- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities

- ESDA Activities - Discuss definitions for Diagnostics and Discoveritive Analytics

- Publish potential - Discuss Cluster collaborative paper loosely entitled: Data Analytics in Earth Science Research… but we can discuss

- Proposed Winter ESIP Meeting sessions