Difference between revisions of "Earth Science Data Analytics/2014-06-26 Telecon"

From Earth Science Information Partners (ESIP)
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
SDA Telecom notes – 6/26/14
+
ESDA Telecom notes – 6/26/14
  
 
===Known Attendees:===
 
===Known Attendees:===
Line 15: Line 15:
  
  
4 – Steve - Preparations for ESIP ESDA Breakout Meeting in Frisco\
+
4 – Steve - Preparations for ESIP ESDA Breakout Meeting in Frisco
  
  
Line 26: Line 26:
  
  
Today, Tiffany provided a demonstration of UV CVAT, and described ClimatyePipes, two visualization analytics toolsTiffany followed with a continuation of last month's discussion on different types of data analytics.  Steve followed, showing the ESDA Use Case Gathering website, and Data Analytics Tools/Techniques Inventory website.
+
Ralph Kahn, an atmospheric research scientist, specializing in aerosols, is very experienced in bringing together datasets from various remote sensing instruments to further his research.  Thus, Ralph was able to bring us a perspective of utilizing Data Analytics, for specific science research and discovery, that we had not seen before.  Unlike more familiar utilizations of data analytics that draw relationships that provide the means for predictions, and potentially prescriptions, Ralph introduced us to his methods for identifying patterns and relationships between heterogeneous datasets to glean specific science research findings.  Ralph indicated that methods employed: scatterplots, difference plots, binning, etc. are specific to the data and the analysis being pursued.  Research intense data analytics are an intricate part of the research and understanding the research, being pursuedIn a follow-up discussion, Ralph observed applications communities, who tend to perform more routine operations on data, may be able to benefit more from tools that provide data analytics methodologies.  (As opposed to one-off methods created for specific research)In conclusion, Ralph described two big issues we face:  People Over-Interpreting the Data and: The Easier It is to make “pretty plots,” The more this tends to happen. (The latter refers to the mis-use of data discovery tools to perform or demonstrate science)
  
 +
Joan next spoke about 'Data Publications in Data Browsers for Earth System Science', introducing a potential effort to: 'Develop best practices for Earth System Science data publications in data browsers by mining scientific publications, producing data publications and disseminating results'.  This will be discussed further in Frisco.
  
From Tiffany's UV CVAT and ClimatePipes demonstration/discussion<br />
+
Steve concluded the telecon with a brief discussion on ESIP ESDA cluster breakout preparation, in Frisco.  An ambitious schedule, we will have a guest speaker, followed by analysis of the data analytics use cases and methodologies we have thus far recorded.  See Meeting in Frisco agenda, below.
  
'''UV-CDAT http://uv-cdat.llnl.gov/ Description''': UV-CDAT brings together two active projects -- Ultrascale Visualization Climate Data Analysis Tools and Visual Data Exploration and Analysis of Ultra-large Climate Data, with the intent to  deliver new capabilities to the climate-science community. This project’s vision is to provide large-scale visualization and analysis for both observational and model-generated climate data, with the goal of delivering new capabilities into the hands of the climate scientists. The integrated software product, the Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT), is intended to be a powerful and complete front-end to a rich set of visual-data exploration and analysis capabilities well suited for climate-data analysis problems. UV-CDAT builds on the following key technologies: the Climate Data Analysis Tools (CDAT) framework; ParaView; VisTrails; and VisIt.
+
Travel safely and we'll see you next week in Frisco.
  
Additional Info: The NCCS at GSFC is developing climate data analysis and viasualization tools for UV-CDAT, that provide data analysis capabilities for the Earth System Grid (ESG). These tools feature workflow interfaces, interactive 3D data exploration, hyper wall and stereo visualization, automated provenance generation, parallel task execution, and streaming data parallel pipelines. NASA’s DV3D is a UV-CDAT package that enables exploratory analysis of diverse and rich data sets from various sources including the Earth System Grid Federation (ESGF). Additionally, Python scripts can easily be generated.
 
  
'''ClimatePipes Description''': ClimatePipes is a  web-based application platform/"IDE" for science data analysis. It can be used to create and run analysis workflows and visualizations.
 
Additional Info:The front-end uses HTML5, WebGL, and CSS3 for geospatial visualizations. The back-end is built using the Visualization Toolkit (VTK), Climate Data Analysis Tools (CDAT), and other climate and geospatial data processing tools such as GDAL and PROJ4. ParaView Web, and D3, Canvas are also used for some visualizations, offers look-up tools, works with UVC-DAT and MongoDB. It can read NetCDF, offers a python Web Service infrastructure, supports workflows and provenance tools using VisTrails. Python was chosen as theserver-side language using CherryPy (http://www.cherrypy.org/) as the web server.  JQuery (http://jquery.com/) and Bootstrap are being used as the supporting frameworksfor a consistent interactive cross-browser experience.
 
  
 
+
===Next Telecon -> Meeting in Frisco:===
Tiffany next continued discussion to answer the following questions:<br />
+
* July 10, 2:00 MST
 
+
* Room: Ptarmigan A, and will post WebEx access
1. What are your most time consuming data tasks that can leverage analytics?<br />
+
* Agenda (as of now):
2. Identify and discuss different types of analytics<br />
+
** Review: What we have done
3. What kind of data analytics is needed for specific use cases?<br />
+
** Guest Speaker: Peter Fox -    ~ the role of the Data Scientist in facilitating the definition and subsequent usability of Data Analytics to further Earth science research
4. Identify tools and technologies that address different types of analytics<br />
+
** Analysis: Gleaning out information from our Data Analytics Use Case and Data Analytics Tools matrices - Use Case Collection webpage -  
 
+
***    Use Case Matrix Analysis – Gleaning out Data Analytics needs (http://wiki.esipfed.org/index.php/Use_Case_Collection) <br />
 
+
***    Data Analytics Tools Matrix – Gleaning out what tools can provide (http://wiki.esipfed.org/index.php/Analytics_Tools) <br />
Discussion focused on the different types of data analytics:<br />
+
***    Summary of past speakers – Gleaning out highlights:  Data Analytics needs and/or tools and their targets (http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Telecom_Presentations) <br />
[[Image:onemoretype.png|500px]]
+
** Additional Topics:
 
+
*** Gaps Analysis… Matching our user needs with our known available tools <br />
 
+
*** Data Publications in Data Browsers for Earth System Science <br />
 
+
*** Tool Matchup update – Matches tools with data.  Interest here:  Match Data Analytics tools with data.  Notes: User dependent: Who are the target users?  Do they start with data or tools.  Visit User Model Matrix<br />
In particular, question 3, regarding use cases, and question 4, regarding tools and technologies, led to a 'tour', by Steve, through the ESDA information gathering pages.  Namely:<br />
+
**Looking Forward: Where we are going
 
 
'''Use Case Collection webpage''' - http://wiki.esipfed.org/index.php/Use_Case_Collection
 
 
 
'''Data Analytics Tools/Techniques Collection webpage''' - http://wiki.esipfed.org/index.php/Analytics_Tools
 
 
 
 
 
In the spirit of compiling use cases, analytics tools/techniques, and performing gap analysis between use case analytics needs and available tools/techniques, telecon participants volunteered to provide data analytics use cases.  This is as simple as providing the following information:
 
 
 
Use Case Name:
 
Provided By:
 
Brief Description:
 
Key Analytics Needs:
 
 
 
ESDA members are all encouraged to provide use cases they may have come across or are faced with. Thanks Beth, Robert, Suhung
 
 
 
 
 
The 'tour' continued with a walk through the Data Analytics Tools/Techniques Collection webpage.  Upon soliciting for additional analytics tools/techniques, Tiffany offered a list that she has been compiling.  Other are encouraged to share, as well.
 
 
 
You can edit the websites, or if easier, feel free to send use cases and tools to:  Steven.J.Kempler @nasa,gov.
 
 
 
 
 
'''Additional Discussion:'''<br />
 
- Preparation for using specific analytics tools may be difficult, or not possible, if the tool can not support specific data characteristics.  Beth discussed the ESIP Semantic Web Cluster, ToolMatch project, for us to track tools for data analytics:
 
 
 
ToolMatch Service (http://wiki.esipfed.org/index.php/ToolMatch):  Finding Tools for Your Data & Data for Your Tools, ToolMatch is intended to be a service based on community-built semantic web applications that will provide data users with the means to match their datasets with a comprehensive list of useful, appropriate tools, and also provide data tool developers with datasets or data collections that will work with their tools.<br />
 
 
 
Next ToolMatch telecon: Tuesday May 27 at 4pm Eastern time <br />
 
Call-in toll-free number (US/Canada): 1-877-668-4493, code: 231 033 48 <br />
 
WebEx: https://esipfed.webex.com/esipfed/j.php?MTID=m98ad38879252b9000f6a489a8b2fad48,  If a password is required, enter the Meeting Password: 23103348  <br />
 
 
 
 
 
- Exemplary Use Cases: Looking for correlations across multiple variables; Bringing multiple datasets together utilizing Giovanni <br />
 
 
 
- Add use case column to tools inventory matrix to indicate who would find tools useful (i.e., data producer, user, etc.) <br />
 
 
 
Preparations for ESIP ESDA Breakout Meeting in Frisco (30 min)– Steve
 
 
 
Potential topics for ESIP ESDA cluster meeting (not ordered)<br />
 
1.    Data Analytics tools matrix – Gleaning out what tools can provide<br />
 
2.    Use case Matrix Analysis – Gleaning out Data Analytics needs<br />
 
3.    Summary of past speakers – Gleaning out highlights:  Data Analytics needs and/or tools and their targets<br />
 
4.    Gaps Analysis… What Gaps are we filling?  <br />
 
5.    Tool Matchup group update – Matches tools with data.  Interest here:  Match Data Analytics tools with data.  User dependent: Who are the target users?  Do they start with data or tools.  Visit User Model Matrix<br />
 
6.    Invited Speaker (waiting to hear back): Role of the Data Scientist in facilitating the definition and subsequent usability of Data Analytics to further Earth science research<br />
 
7.    Data Publications in Data Browsers for Earth System Science<br />
 
 
 
Earth Science Data Analytics Discussion Forum - http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics/Discussion_Forum
 
 
 
Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection
 
 
 
Data Analytics Tools/Techniques Collection webpage - http://wiki.esipfed.org/index.php/Analytics_Tools
 
 
 
 
 
===Next Telecon:===
 
* June 26, 3:00 EST
 
* Agenda (as of now)
 
 
 
- Listen and Learn - We will have 2 guest speakers to discuss their Analytics activities
 
 
 
- ESDA Activities - Use Case Collection webpage - http://wiki.esipfed.org/index.php/Use_Case_Collection
 
 
 
- Preparation for Frisco
 

Latest revision as of 08:53, October 24, 2014

ESDA Telecom notes – 6/26/14

Known Attendees:

ESIP Hosts (Erin, Cheryl). Robert Casey, Emily Law, Steve, chung-lin shie, Ralph Kahn, Joan Aron, John Evans, Eric Kihn, Tiffany Mathews plus 18 dial-ins

Agenda:

1 – Steve Kempler - Recap of last telecon


2 – Guest presenter: Ralph Kahn, Scientist, describing his views and experiences using large amounts of heterogeneous data in his research


3 – Joan - Data Publications in Data Browsers for Earth System Science


4 – Steve - Preparations for ESIP ESDA Breakout Meeting in Frisco


Presentations:

  • Ralph Kahn: Global, Satellite-Remote-Sensing Aerosol Studies: What We Do, and Why It Matters (Presentation posting pending agreement by co-authors)


Notes:

Ralph Kahn, an atmospheric research scientist, specializing in aerosols, is very experienced in bringing together datasets from various remote sensing instruments to further his research. Thus, Ralph was able to bring us a perspective of utilizing Data Analytics, for specific science research and discovery, that we had not seen before. Unlike more familiar utilizations of data analytics that draw relationships that provide the means for predictions, and potentially prescriptions, Ralph introduced us to his methods for identifying patterns and relationships between heterogeneous datasets to glean specific science research findings. Ralph indicated that methods employed: scatterplots, difference plots, binning, etc. are specific to the data and the analysis being pursued. Research intense data analytics are an intricate part of the research and understanding the research, being pursued. In a follow-up discussion, Ralph observed applications communities, who tend to perform more routine operations on data, may be able to benefit more from tools that provide data analytics methodologies. (As opposed to one-off methods created for specific research). In conclusion, Ralph described two big issues we face: People Over-Interpreting the Data and: The Easier It is to make “pretty plots,” The more this tends to happen. (The latter refers to the mis-use of data discovery tools to perform or demonstrate science)

Joan next spoke about 'Data Publications in Data Browsers for Earth System Science', introducing a potential effort to: 'Develop best practices for Earth System Science data publications in data browsers by mining scientific publications, producing data publications and disseminating results'. This will be discussed further in Frisco.

Steve concluded the telecon with a brief discussion on ESIP ESDA cluster breakout preparation, in Frisco. An ambitious schedule, we will have a guest speaker, followed by analysis of the data analytics use cases and methodologies we have thus far recorded. See Meeting in Frisco agenda, below.

Travel safely and we'll see you next week in Frisco.


Next Telecon -> Meeting in Frisco:

  • July 10, 2:00 MST
  • Room: Ptarmigan A, and will post WebEx access
  • Agenda (as of now):
    • Review: What we have done
    • Guest Speaker: Peter Fox - ~ the role of the Data Scientist in facilitating the definition and subsequent usability of Data Analytics to further Earth science research
    • Analysis: Gleaning out information from our Data Analytics Use Case and Data Analytics Tools matrices - Use Case Collection webpage -
    • Additional Topics:
      • Gaps Analysis… Matching our user needs with our known available tools
      • Data Publications in Data Browsers for Earth System Science
      • Tool Matchup update – Matches tools with data. Interest here: Match Data Analytics tools with data. Notes: User dependent: Who are the target users? Do they start with data or tools. Visit User Model Matrix
    • Looking Forward: Where we are going