Difference between revisions of "Interoperability and Technology/Tech Dive Webinar Series"

From Earth Science Information Partners (ESIP)
 
(44 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
'''[[Interoperability and Technology/Past Tech Dive Webinar Series|Past Tech Dive Webinars]] (2015-2022)'''
  
'''Bold text'''= Tech Dive Webinars =
+
==July 11th: Update on OGC GeoZarr Standards Working Group==
  
==9 Feb 2023: "February 2023 - Rants & Raves"==
+
[https://www.briannapagan.com/ Dr. Brianna Rita Pagán]
  
The conversation built on the "rants and raves" session from the 2023 January ESIP Meeting, starting with very short presentations and an in-depth discussion on interoperability and the Committee's next steps.
+
Zarr is a cloud-native data format for n-dimensional arrays that enables access to data in compressed chunks of the original array. Zarr facilitates portability and interoperability on both object stores and hard disks.  
  
* Mike Mahoney: Make Reproducibility Easy
+
As a generic data format, Zarr has increasingly become popular to use for geospatial purposes. As such, in June 2022, OGC endorsed Zarr V2.0 as an OGC Community Standard. The purpose of the GeoZarr SWG is to have an explicitly geospatial Zarr Standard (GeoZarr) adopted by OGC that establishes flexible and inclusive conventions for the Zarr cloud-native format that meet the diverse requirements of the geospatial domain. These conventions aim to provide a clear and standardized framework for organizing and describing data that ensures unambiguous representation.
* Dave Blodgett: FAIR data and Science Data Gateways
 
* Doug Fils:  Web architecture and Semantic Web
 
* Megan Carter: Opening Doors for Collaboration
 
* Yuhan (Douglas) Rao: Where are we for AI-ready data?
 
  
I had a couple major take aways from the Winter Meeting:
+
[[File:ITI_July_2024.png|thumb|IT&I July 2024]]
 
 
* We have come a long way in IT interoperability but most of our tools are based on tried and true fundamentals. We should all know more about those fundamentals.
 
* There are a TON of unique entry points to things that, at the end of the day, do more or less the same thing. These are opportunities to work together and share tools.
 
* The “shiny object” is a great way to build enthusiasm and trigger ideas and we need to better capture that enthusiasm and grow some shared knowledge base.
 
 
 
So with that, I want to suggest three core activities:
 
  
# We seek out presentations that explore foundational aspects of interoperability. I want to help build an awareness of the basics that we all kind of know but either take for granted, haven’t learned yet, or straight up forgot.
+
'''<u>Recording</u>''':
# We ask for speakers to explore how a given solution fits into multiple domain’s information systems and to discuss the tension between the diversity of use cases that are accommodated by an IT solution targeted at interoperability. We are especially interested to learn about the expense / risk of adopting dependencies vs the efficiency that can be gained from adopting pre-built dependencies.
 
# We look for opportunities to take small but meaningful steps to record the core aspects of these sessions in the form of web resources like the ESIP wiki or even Wikipedia. On this front, we will aim to construct a summary wiki page from each meeting assembled from a working notes document and the presenting authors contribution.
 
  
'''<u>Recording</u>''':<br />
 
 
<html>
 
<html>
<iframe width="560" height="315" src="https://www.youtube.com/embed/cS7TrLmSu5U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/l3o11uLdm7E?si=cOBaSNFpzuYjQU3P" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
</html>
 
</html>
  
==10 Nov 2022: "AWS for Research - Stepping Into the Cloud"==
+
==June 13th: "Evaluation and recommendation of practices for publication of reproducible data and software releases in the USGS"==
  
The cloud is often seen as a binary alternative to local resources for research. This talk with Scott Friedman from Amazon Web Service (AWS) Higher Education Division explores how a hybrid approach enables many of the best elements of both local and cloud resources by addressing some key researcher challenges and concerns that surround them.  
+
[https://www.usgs.gov/centers/community-for-data-integration-cdi/science/evaluation-and-recommendation-practices#overview Alicia Rhoades, Dave Blodgett, Ellen Brown, Jesse Ross.]
  
aws.amazon.com
+
USGS Fundamental Science Practices recognize data and software as separate information product types. In practice, (e.g., in model application) data are rarely complete without workflow code and workflows are often treated as software that include data. This project assembled a cross mission area team to build an understanding of current practices and develop a recommended path. The project conducted 27 interviews with USGS employees with a wide range of staff roles from across the bureau. The project also analyzed existing data and software releases to establish an evidence base of current practices for implemented information products. The project team recommends that a workshop be held at the next Community for Data Integration face to face or other venue. The workshop should consider the sum total of the findings of this project and plan specific actions that the Community can take or recommendations that the Community can advocate to the Fundamental Science Practices Advisory Council or others.
  
Presenter(s):
+
[[File:ITI_June_2024.png|thumb|IT&I June 2024]]
Scott Friedman, AWS
+
 
 +
'''<u>Recording</u>''':
  
'''<u>Recording</u>''':<br />
 
 
<html>
 
<html>
<iframe width="560" height="315" src="https://www.youtube.com/embed/vr46-SIT2QU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/orjBINgaXag?si=a41TWK1vZZsXLyph" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
</html>
 
</html>
  
==10 Oct 2022: "openEO - API for Earth Observation Workflows"==
+
==May 9th: "Achieving FAIR water quality data exchange thanks to international OGC water standards"==
  
openEO is a modern high-level API aiming at defining EO processing workflows independent from the infrastructure in which they are processed. It is based on the concept of virtual data cubes. A broad ecosystem of open-source client and server implementations has evolved around it.
+
[https://orcid.org/0000-0001-7656-1830 (Sylvain Grellet - BRGM)]
  
• https://openeo.org – openEO project website
+
Leveraging on international standards (OGC, ISO), the OGC, WMO Water Quality Interoperabily Experiment aims at bridging the gap regarding Water Quality data exchange (surface, ground water). This presentation will also give a feedback on the methodology applied on this journey. How to build on existing international standards (OGC/ISO 19156 Observations, measurements and samples ; OGC SensorThings API) while answering domain needs and maximize community effect.
  
• https://api.openeo.org – openEO API
+
[[File:ITI_May_2024.png|thumb|IT&I FAIR water quality data]]
  
• https://processes.openeo.org – openEO process specifications
+
'''<u>Recording</u>''':
  
• https://openeo.cloud – openEO Platform
 
 
• https://github.com/Open-EO - open-source software on GitHub
 
 
Presenter(s):
 
Alexander Jacob, Eurac Research
 
Matthias Mohr, University of Muenster WWU
 
 
'''<u>Recording</u>''':<br />
 
 
<html>
 
<html>
<iframe width="560" height="315" src="https://www.youtube.com/embed/hGPKwslUGzM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/AlYnSNWJYy0?si=nQfGtfJ51cJM60v8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
</html>
 
</html>
  
==10 June 2022: "MAAP - NASA Multi-Mission Algorithm and Analysis Platform"==
+
'''<u>Slides</u>'''
  
MAAP is a collaborative effort between NASA and ESA to support above ground biomass research. MAAP brings together relevant data, algorithms and computing capabilities in a common cloud environment to address the challenges of sharing and processing data from field, airborne and satellite measurements.
+
[[File:FAIR_water_quality_data_OGC_Grellet-compressed.pdf|Slides from May 2024 IT&I]]
  
Link: https://earthdata.nasa.gov/esds/maap
+
'''<u>Minutes:</u>'''
  
Presenter(s):
+
* Emphasis on international water data standards.
George Chang - MAAP Platform Development Lead
+
*Introduced OGC – international standards with contribution from public, private, and academic stakeholders.
 +
*Hydrology Domain Working Group around since circa 2007
 +
**This presentation is about the latest activity, the Water Quality Interoperability Experiment
 +
*Relying on a baseline of conceptual and implementation modeling from the Hydro Domain Working Group and more general community works like Observations Measurements and Samples.
 +
*Considering both in-situ (sample observations) and ex-situ (laboratory).
 +
*Core data models support everything the IE has needed with some key extensions, but the models are designed to support extensions.
 +
*In terms of FAIR access, Sensorthings is very capable for observational data and OGC-API Features support geospatial needs well.
 +
*Introduced a separation between "sensor" and "procedure" – sensor is the thing you used, procedure is the thing you do.
  
Hook Hua - MAAP System Architect
+
==April 11th: "A Home for Earth Science Data Professionals - ESIP Communities of Practice"==
  
'''<u>Recording</u>''':<br />
+
[https://www.esipfed.org/about/people/#people_bios-1-4 (Allison Mills)]
<html>
 
<iframe width="560" height="315" src="https://www.youtube.com/embed/fqFk3U1TloM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 
</html>
 
 
 
==14 April 2022: "GeoCollaborate - Shared Geospatial Data Platform​"==
 
 
 
GeoCollaborate - Geospatial Data Collaboration Platform
 
  
Info Link: Put your geospatial data to work (geocollaborate.com)
+
Earth Science Information Partners (ESIP) is a nonprofit funded by cooperative agreements with NASA, NOAA, and USGS. To empower the use and stewardship of Earth science data, we support twice-annual meetings, virtual collaborations, microfunding grants, graduate fellowships, and partnerships with 170+ organizations. Our model is built on an ever-evolving quilt of collaborative tools: Guest speaker Allison Mills will share insights on the behind-the-scenes IT structures that support our communities of practice.
  
Description:
+
[[File:ITI_April2024.png|thumb|IT&I ESIP Communities of Practice]]
 
 
GeoCollaborate overview.
 
 
 
Stormcenter Communications was ESIP's 2019 Partner of the Year and is a big contributor to ESIP, primarily as lead of the ESIP Disasters Cluster.  <nowiki>https://youtu.be/Zhv_Q4FOwrw</nowiki>. Dave and Ellen Prager will provide a demonstration of GeoCollaborate and conduct a deep dive describing the compute infrastructure and bespoke tools driving the platform.
 
 
 
GeoCollaborate®, is a patented collaborative software platform, that achieves total and true commonality and permits collaboration across all stakeholders accessing data using interactive web maps, GIS platforms, or Common Operating Pictures (COPs).  In the same way that map services bring real-time map information to maps, GeoCollaborate® is a network service that permits real-time data sharing and collaboration across an unlimited number of disparate web maps.  The concept behind GeoCollaborate® is simple: it lets anyone securely author the content of a lead web map, share content, and collaborate in real time or offline with other follower web maps with nothing more than a browser and a network connection.
 
  
 
'''<u>Recording</u>''':<br />
 
'''<u>Recording</u>''':<br />
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/rXNv2sM59wo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/6loiBWpgMGE?si=yvIMfhKNbDrX_cn_" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
</html>
 
</html>
  
==10 March 2022: "National Park Service - Web Mapping Tools​”==
+
'''<u>Minutes:</u>'''
  
National Park Service - Web Mapping Tools
+
*Going to to talk about the IT infrastructure behind the ESIP cyber presence.
 +
*Shared ESIP Vision and Mission – BIG goals!!
 +
*Played a video about what ESIP is as a community.
 +
*But how do we actually "build a community"?
 +
*Virtual collaborations need digital tools.
 +
*<nowiki>https://esipfed.org/collaborate</nowiki>
 +
**Needs a front door and a welcome mat!
 +
** "It doesn't matter how nice your doormat is if your porch is rotten."
 +
**Tools: Homepage, Slack, Update email, and people Directory.
 +
**"We take easy collaboration for granted."
 +
*<nowiki>https://esipfed.org/lab</nowiki>
 +
** Microfunding – build in time for learning objectives.
 +
**RFP system, github, figshare, people directory.
 +
**"Learning objectives are a key component of an ESIP lab project."
 +
*<nowiki>https://esipfed.org/meetings</nowiki>
 +
**Web site, agendas, eventbrite, QigoChat + Zoom, Google Docs.
  
NPMap - GIS, Cartography & Mapping (U.S. National Park Service) (nps.gov)
+
Problem: our emails bounce! Needed to get in the weeds of DNS and "DMARC" policies.
  
Topic:
+
Domain-based Message Authentication, Reporting, and Conformance (DMARC)
  
The NPMap suite of web map tools enables NPS employees and partners to tell the story of the nation's most cherished places using innovative mapping techniques and technologies. Maps built using NPMap Builder, Park Tiles, NPMap.js, and the NPMap Symbol Library make these places come alive for visitors to our national parks.
+
Problem: Twitter is now X
  
Presenter(s):Jim McAndrew
+
Decided to focus on platforms where engagement is higher.
  
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/1IHEYqwe2QQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
Problem: Old wikimedia pages are way way outdated.
</html>
 
  
==10 February 2022: "The National Map – Image Delivery Platform​”==
+
Focus on creating new web pages that replace, update and maintain community content.
Time: Thursday, February 10 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Virtual Meeting:'''
+
Problem: "I can't use platform XYZ"
  
'''Title:'''
+
Try to go the extra mile to adapt so that these issues are overcome.
  
'''The National Map – Image Delivery Platform'''​
+
==March 15th: "Creating operational decision ready data with remote sensing and machine learning."==
  
'''Orthoimagery Meta Raster Format (MRF) - Spatial Data Servic'''
+
[https://www.voyagersearch.com/ (Brian Goldin])
  
'''Next Event :''' '''3pm EDT Thu Feb 10'''
+
[[File:ITI_March2024.png|thumb|IT&I Operational Remote Sensing 2024]]
  
'''Topic:'''
+
As organizations grapple with information overload, timely and reliable insights remain elusive, particularly in disaster scenarios. Voyager's participation in the OGC Disaster Pilot 2023 aimed to address these challenges by streamlining data integration and discovery processes. Leveraging innovative data conditioning and enrichment techniques, alongside machine learning models, Voyager transformed raw data into actionable intelligence. Through operational pipelines, we linked diverse datasets with machine learning models, automating the generation of new observations to provide decision-makers with timely insights during critical moments. This presentation will explore Voyager's role in enhancing disaster response capabilities, showcasing how innovative integration of technology along with open standards can improve decision-making processes on a global scale.
  
Developed as a more efficient data delivery mechanism for high resolution imagery (also employed by NASA/EROS for Landsat imagery) the system provides a highly available service that reduces the need for costly image file storage.
+
'''<u>Recording</u>''':<br />
 
+
<html>
'''Presenter(s): Liz Huselid and Kwin Keuter'''
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/TFGLnVljAlY?si=LzpWoMWZx_3YMk0H" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
 
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/_RN1bxs6N0w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
 
</html>
 
</html>
  
==13 January 2022: "Water Data Visualization”-CANCELLED==
+
'''<u>Minutes:</u>'''
Time: Thursday, January 13 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
Virtual Meeting:
+
Providing insights from the OGC Disaster Pilot 2023
  
Topic:
+
Goal with work is to provide timely and reliable insights based on huge volumes of data.
  
USGS Water - Data Visualization and Information Dissemination
+
"Overcome information overload in critical moments"
  
Presenter(s):
+
Example: 2022 Callao Oil Spill in Peru
  
Cee Nell - Data Visualization Specialist USGS VizLab
+
Tsunami hit an oil tanker transferring oil to land.
  
Lindsay Platt - Data Scientist USGS Integrated Information Dissemination Division
+
Possibly useful data from many remote sensing products but hard to combine them all together in the moment of responding to an oil spill. (slide shows dozens of data sources)
  
==09 December 2021: "rOpenSci - Open and Reproducible Research Using Shared Data and Re-Usable Software”==
+
Goal: build a centralized and actionable inventory of data resources.
Time: Thursday, December 9 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Virtual Meeting:'''
+
#Connect and read data,
 +
#build pipelines to enrich data sources,
 +
#populate a registry of data sources,
 +
#construct processing framework that can operate over the registry,
 +
#build user experience framework that can execute the framework.  
  
'''Topic:'''
+
Focus is on an adaptable processing framework for model execution.
  
We are pleased to host Stefanie Butland, and Carl Boettiger from rOpenSci.
+
At this scale and for this purpose, it's critical to have a receipt of what was completed with basic results in a registry that is searchable. Allows model results to trigger notifications or be searched based on a record of model runs that have been run previously.
  
Stefanie will provide an introduction and overview of the rOpenSci community and recent projects.
+
For the pilot: focused on wildfire, drought, oil spill, and climate.
  
rOpenSci creates technical and community infrastructure for open and reproducible research in the R language.  It features a curated collection of over 300 R packages, an open software peer review system for community-contributed packages, a platform for building, testing, and publishing R packages, and documentation, and community engagement programs to support scientific R users and developers.
+
"What indicators do decision makers need to make the best decisions?"
  
'''About The Presenter(s):'''
+
What remote sensing processing models can be run in operations to provide these indicators?
  
'''Stefanie Butland'''
+
Fire Damage Assessment
  
Community Manager
+
Detected building footprints using a remote sensing building detection model.
  
[[Mailto:stefanie@ropensci.org|stefanie@ropensci.org]]
+
Can run fire detection model in real time cross referenced with building footprints.
  
'''Carl Boettiger'''
+
Need for stronger / more consistent "model metadata"
  
Co-Founder and Strategic Advisor
+
Need data governance/fitness for use metadata
  
[[Mailto:cboettig@berkeley.edu|cboettig@berkeley.edu]]
+
Need better standards that provide linkages between systems.
  
=='''14 October 2021: "OpenTopography - High Resolution Topography Data and Tools ”'''==
+
Need better public private partnerships.
Time: Thursday, October 14 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
We are pleased to host Chris Crosby, co-founder and co-PI of OpenTopography.  OpenTopography is an NSF funded data facility, with a mission to facilitate access to earth science oriented high resolution topography data, tools and resources.
+
Need better data licensing and sharing framework.
  
'''Summary:'''
+
"This is not rocket science, it's really just building a good metadata registry."
  
Overview of OpenTopography.  Current projects and initiatives.  Recent data updates.  Creating 2D and 3D visualizations with the R-package, Rayshader.
+
==February 15th: "Creating Great Data Products in the Cloud"==
  
'''About The Presenters:'''
+
[https://radiant.earth/about/ (Jed Sundwall])
  
'''Chris Crosby'''
+
[[File:ITI_Feb2024.png|thumb|IT&I Cloud Data Products 2024]]
  
Co-founder and co-PI of OpenTopography
+
Competition within the public cloud sector has reliably led to reduction in object storage costs, continual improvement in performance, and a commodification of services that have made cloud-based object storage a viable solution to share almost any volume of data. Assuming that this is true, what are the best ways to create data products in a cloud environment? This presentation will include an overview of lessons learned from Radiant Earth as they’ve advocated for adoption of cloud-native geospatial data formats and best practices.
  
[[Mailto:crosby@unavco.com|crosby@unavco.com]]
+
'''<u>Recording</u>''':<br />
 
+
<html>
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/zoxCsZ_Uu68" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/4cWGJcOcAEA?si=NYWSSB7DiGK2nrMN" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 
</html>
 
</html>
  
<br />
+
'''<u>Minutes:</u>'''
==09 September 2021: "TerrainR -  Generating 3D Landscape Visualizations Using R and Unity”==
 
Time: Thursday, September 9 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Summary:'''
+
Jed is executive director of Radiant Earth – Focus is on human cooperation on a global scale.
  
We are pleased to host Mike Mahoney, a PhD candidate at SUNY-ESF, working on predictive modeling and visualization with a focus on natural systems. Mike has developed terrainr,  an R package with the capability to retrieve data from  USGS National Map services and perform transformations to produce 2D and 3D landscape visualizations using the Unity engine.
+
Two major initiatives – Cloud Native Geospatial foundation and Source Cooperative
  
terrainr: R package with the capability to retrieve data from  USGS National Map services and perform transformations to produce 2D and 3D landscape visualizations
+
Cloud native geospatial is about adoption of efficient approaches
 +
Source is about providing easy and accessible infrastructure
  
terrainr makes it easy to identify your area of interest from point data, retrieve geospatial data (including orthoimagery and DEMs) for areas of interest within the United States from the National Map family of APIs, and then process that data into larger, joined images or crop it into tiles that can be imported into the Unity rendering engine.
+
What does "Cloud Native" mean? https://guide.cloudnativegeo.org/
 +
partial reads, parallel reads, easy access to metadata
  
At the absolute simplest level, terrainr provides a convenient and consistent API to downloading data from the National Map.
+
Leveraging market pressure to make object stores cheaper and more scalable.
  
'''About The Presenters:'''
+
"Pace Layering" – https://jods.mitpress.mit.edu/pub/issue3-brand/release/2
  
'''Mike Mahoney'''
+
Observation: Software is getting cheaper and cheaper to build – it gets harder to create software monopolies in the way Microsoft or ESRI have.
  
Graduate Researcher
+
This leads to a lot of diversity and a proliferation of "primitive" standards and defacto interoperability arrangements.
  
SUNY-ESF
+
'''Source Cooperative'''
  
mike.mahoney.218@gmail.com
+
Borrowed a lot from github architecturally.  
  
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/xWZ7QQMr_AQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
Repository with a README
</html>
 
  
==12 Aug 2021: "Developing an Open-Source Workflow and Toolset for Quantifying Lacustrine Sedimentation Using Publicly Available Data”==
+
Browser of contents in the browser.
Time: Thursday, August 12 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Summary:''' We are pleased to host Jake Gearon,  a graduate researcher pursuing his PhD Ph.D in the nexus of Sedimentology, Remote Sensing, and Informatics at Indiana University. Jake developed a data repository using AWS RDS that aggregates global lake level data from a number of distributed resources.  He has created a standardized data schema, a harvest mechanism to scrape data into a cloud-based data platform, and made the aggregated data available through AWS API Gateway.  To assist with access to and use of the data resource, Jake developed LakePy, a python package, that can be run within a Jupyter notebook.
+
Within this, what makes a great data product?
  
In this month's presentation, Jake will provide an overview of the data pipeline, and demonstrate how to set up and use LakePy in a Jupyter notebook environment.
+
"Our data model is the Web"
  
'''About The Presenters:'''
+
People will deal with messy data if it's super valuable.
  
'''<small>Jake Gearon</small>'''
+
Case in point, IRS 990 data on non-profits was shared in a TON of xml schemas. People came together to organize it and work with it.
  
<small>Graduate Researcher</small>
+
Story about a building footprint data released in the morning – had been matched up into at least four products by the end of the day.
  
<small>Indiana University</small>
+
Shout out to:
 +
https://www.geoffmulgan.com/ and https://jscaseddon.co/  
  
<small>Sedimentology, Remote Sensing, and Informatics</small>
+
https://jscaseddon.co/2024/02/science-for-steering-vs-for-decision-making/  
  
[[Mailto:jake.gearon@gmail.com|jake.gearon@gmail.com]]
+
"We don't have institutions that are tasked with producing great data products and making them available to the world!"
  
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/N_kWZOf-OVA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
https://radiant.earth/blog/2023/05/we-dont-talk-about-open-data/
</html>
+
[[File:Meme hackathons.png|thumb]]
  
==10 June 2021: "USGS Hydro Network Linked Data Index Tools”==
+
"There's a server somewhere where there's some stuff" – This is very different from a local hard drive where everything is indexed.
'''Time:''' Thursday, June 10 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Summary:''' We are pleased to host Dave Blodgett, with the Integrated Modeling and Prediction Division of the USGS Water Mission Area and Anders, Anders Hopkins with the USGS Web Informatics and Mapping team, and Taher Chegini from the University of Houston. They will present on the Hydro Network Linked Data Index and a variety of client tools that work with it.
+
A cloud native approach puts the index (metadata) up front in a way that you can figure out what you need.
  
The Hydro Network-Linked Data Index (NLDI) is a system that can index data to NHDPlus V2 catchments and offers a search service to discover indexed information. Data linked to the NLDI includes active NWIS stream gages, water quality portal sites, and outlets of HUC12 watersheds. The NLDI is a core product of the Open Water Data Initiative and is being developed as an open source project.
+
A file's metadata gives you the information you need to ask for just the part of a file that you actually need.
  
'''Zoom Meeting Connection:''' https://us02web.zoom.us/j/81907776438?pwd=L2VWY1hBOEdROTlNODUxanZ4T1FhZz09
+
But there are other files where you don't need to do range requests. Instead, the file is broken up into many many objects that are indexed.
  
'''Meeting ID:''' 819 0777 6438
+
In both cases, the metadata is a map to the content. Figuring out the right size of the content's bits is kind of an art form.
  
'''Passcode:''' 507948
+
https://www.goodreads.com/en/book/show/172366
  
Outline:  
+
Q: > I was thinking of your example of Warren Buffett's daily spreadsheet (gedanken experiment)... How do you see data quality or data importance (incl. data provider trustworthiness) being effectively conveyed to users?
  
*Intro and overview lecture format (Dave – 10 minutes)
+
A: We want to focus on verification of who people are and relying on reputational considerations to establish importance.  
*NLDI NWIS & WQP integration demo (Dave – 5 minutes)
 
*StreamStats use of NLDI context and catchment splitting demo  (Anders – 5 minutes)
 
*StreamStats down-slope trace and time of travel demo (Anders – 5 minutes)
 
*Cross-section and model data discovery and retrieval (Dave for Rich – 5 minutes)
 
*HyRiver overview lecture format (Taher – 5 minutes)
 
*HyRiver use of the NLDI demo (Taher – 5 minutes)
 
*nhdplusTools and dataRetrieval use of NLDI demo (Dave – 5 minutes)
 
*geoconnex.us and adding data sources to the NLDI (Dave – 5 minutes)
 
*Questions (10 minutes)
 
  
For more, see: https://waterdata.usgs.gov/blog/nldi-intro/
+
Q: > I agree with you about the importance of social factors in how people make decisions. What do you think the implications are of this for metadata for open data on the cloud?
  
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/Wz8Y5G9oy-M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
A: Tracking data's impact and use is an important thing to keep track of. Using metadata as concrete records of observations and how it has been used is where this becomes important.
</html>
 
  
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnbviewer.jupyter.org%2Fgithub%2Fcheginit%2Fpangeo_showcase21%2Fblob%2Fmain%2Fnotebooks%2Fesip.ipynb&data=04%7C01%7Cevolkmer%40usgs.gov%7Cf1b9238f63844db60f7108d92ce569cd%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637590186103948230%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jvXa29JYbbMZQEI6dbxPYi3z64V4Z3U4H%2FakBYdQcfI%3D&reserved=0
+
Q: > What about the really important kernels of information that we use to, say calibrate remote sensing products, that are really small but super important? How do we make sure those don't get drowned?
 +
A: We need to be careful not to overemphasize "everything is open" if we can't keep really important datasets in the spotlight.  
  
<br />
+
==January 11th: "Using Earth Observations for Sustainable Development"==
  
==13 May 2021: "Visualization of Landsat Data stored in the Cloud using LandsatLook 2.0."==
+
"Using Earth Observation Technologies when Assessing Environmental, Social, Policy and Technical factors to Support Sustainable Development in Developing Countries"
'''Time:''' Thursday, May 13 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Presentation Abstract''':
+
[https://www.media.mit.edu/people/shariful/overview/ Sharif Islam]
  
With existing USGS tools you can search for Landsat scenes, view thumbnail browse images of scenes, and download scene-based data, all derived from the 45+ year USGS archive of Landsat scene-based products. Now that Landsat scene data has been unpacked and stored in the cloud, LandsatLook 2.0 can access that data and present it to users in ways that were not previously possible. Thanks to the cloud-optimized GeoTiff format, we're no longer limited by scene-based display rules. Pixel mosaics in an area of interest can be queried and displayed without accessing entire scenes, effectively allowing users to select their own scene boundaries.
+
Earth Observation (EO) technologies, such as satellites and remote sensing, provide a comprehensive view of the Earth's surface, enabling real-time monitoring and data acquisition. Within the environmental domain, EO facilitates tracking land use changes, deforestation, and biodiversity,   thereby   supporting   evidence-based   conservation   efforts.   Social   factors, encompassing population dynamics and urbanization trends, can be analyzed to inform inclusive and resilient development strategies. EO also assumes a crucial role in policy formulation by furnishing accurate and up-to-date information on environmental conditions, thereby supporting informed decision-making. Furthermore, technical aspects, like infrastructure development and resource management, benefit from EO's ability to provide detailed insights into terrain characteristics and natural resource distribution. The integration of Earth Observation across these domains yields a comprehensive understanding of the intricate interplay between environmental, social, policy, and technical factors, fostering a more sustainable and informed approach to development initiatives. In this presentation, I will discuss our lab's work in Bangladesh, Angola, and other countries, covering topics such as coastal erosion, drought, and air pollution.
  
LandsatLook 2.0 presents a more pleasing view of the data using mosaicked pixels without scene boundaries. Users can also overlay scene, tile, or political boundaries to help determine an area of interest. Selecting and viewing band combinations of the available imagery is super simple. LandsatLook 2.0 also allows users to select and visualize supplied algorithms like NDVI. Users can display or download an animated GIF file of a selected time series over their selected areas of interest.  LandsatLook 2.0 can also apply quality band information to filter out cloudy pixels and replace them with valid pixels from a different collection date. LandsatLook 2.0 provides download capabilities in several ways. Users can download the GIF produced of your time series, they can also download data based on their selection criteria, narrowed by selecting certain bands or an area of interest. They are not limited to shapes of scenes anymore.
+
'''<u>Recording</u>''':<br />
 
+
<html>
All of this is made possible by the storage of Landsat data in the cloud which allows on-the-fly manipulation of data for tremendous user benefit. The EROS team is excited about these new changes and we hope you are too.
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/PhEg9bTd1JU?si=EUfOaz3nzEFdOOsb" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 
 
'''Presenters:'''
 
 
 
<u>Kristi Kline</u>
 
 
 
<small>Project Manager, Landsat and Sentinel2 Archive and Access (LSAA) Project</small>
 
 
 
<small>U.S. Geological Survey (USGS)</small>
 
 
 
<small>Earth Resources Observation & Science (EROS) Center</small>
 
 
 
<small>Ph: 605-594-2585</small>
 
 
 
<small>Email: kkline@usgs.gov</small>
 
 
 
<u>Kelly Lemig</u>
 
 
 
<small>KBR | User Services Task Lead</small>
 
 
 
<small>Contractor to the U.S. Geological Survey (USGS)</small>
 
 
 
<small>Earth Resources Observation and Science (EROS) Center</small>
 
 
 
<small>Ph: 605-594-2744</small>
 
 
 
<small>Email: klemig@contractor.usgs.gov</small>
 
 
 
'''<u>Recording</u>''':<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/EtJc4X2ml4U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
 
</html>
 
</html>
  
==8 April 2021: "Developing A New Geologic Map Database and 3D Geologic Model of The Great Basin and Rocky Mountains"==
+
'''<u>Minutes:</u>'''
'''Time:''' Thursday, April 8 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Zoom Meeting Connection:''' <nowiki>https://us02web.zoom.us/j/81907776438?pwd=L2VWY1hBOEdROTlNODUxanZ4T1FhZz09</nowiki>
+
Plan to share data from NASA and USGS that was used in his PHD work.
  
'''Meeting ID: 819 0777 6438'''
+
Applied the EVDT Environment, Vulnerability, Decision Technology Framework.
  
'''Passcode: 507948'''
+
Studied a variety of hazards – coastal erosion, air pollution, drought, deforestation, etc.
  
'''Summary:''' We are pleased to host Joe Colgan, Paco Van Sistine, and Kenzie Turner from the USGS Geosciences and Environmental Change Science Center (GECSC). The Geologic Framework of the Intermountain West project was launched with the goal of producing a new digital geologic map database and 3D geologic model of a transect from the Rio Grande rift to the Basin and Range, based on a synthesis of existing geologic maps with new targeted new mapping, subsurface data, and other data sets. This effort integrates disparate map data, resolves inconsistent stratigraphic assignments, and provides an integrated regional geologic map database to further the National Cooperative Geologic Mapping Program’s strategic goal of mapping the Nation. The activity has implemented  new techniques for digital compilation and new procedures for reviewing and publishing large digital geologic data sets.
+
'''Coastal Erosion in Bangladesh:'''
  
'''About The Presenters:'''
+
*Displacement, loss of land, major economic drain
 +
* Studied the situation in the Bay of Bengal
 +
*Used LANDSAT to study coastal erosion from the 80s to the present
 +
*Coastal erosion rates upwards of 300m/yr!
 +
*Combined survey data and landsat observations
  
<u>Joseph Colgan</u>
+
'''Air Pollution and mortality in South Asia'''
  
<small>Research Geologist- USGS</small>
+
*Able to show change in air pollution over time using remote sensing
  
<small>Geosciences and Environmental Change Science Center</small>
+
'''Drought in Angola and Brazil'''
  
<small>jcolgan@usgs.gov</small>
+
Used SMAP (Soil Moisture Active Passive)
  
<u>Kenzie Turner</u>
+
Developed the same index as the US Drought Monitor
  
<small>Research Geologist- USGS</small>
+
Able to apply SMAP observations over time
  
<small>Geosciences and Environmental Change Science Center</small>
+
Applied a social vulnerability model using these data to identify vulnerable populations.
  
<small>kturner@usgs.gov</small>
+
'''Deforestation in Ghana'''
  
<u>Darren (Paco) Van Sistine</u>
+
Used LANDSAT to identify land converted from forest to mining and urban.
  
<small>Geographer- USGS</small>
+
Significant amounts of land to mining (gold mining and others)
  
<small>Geosciences and Environmental Change Science Center</small>
+
'''Water hyacinth in a major fishery lake in Benin.'''
  
<small>dvansistine@usgs.gov</small>
+
Impact on fishery and transportation
  
'''Recording:'''<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/So43w-4jY10" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Rotting hyacinth is a big issue
  
==11 March 2021: “Earth Data Extraction, Exploration and Visualization Using the AppEARS Platform” Tom Maiersperger & Chris Torbert==
+
Helped develop a DSS to guide management practices
'''Time:''' Thursday, March 11 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''Zoom Meeting:''' <nowiki>https://us02web.zoom.us/j/81907776438?pwd=L2VWY1hBOEdROTlNODUxanZ4T1FhZz09</nowiki>
+
'''Mangrove loss in Brazil'''  
  
'''Meeting ID: 819 0777 6438'''
+
Combined information from economic impacts, urban plans, and remote sensing to help build a decision support tool.
  
'''Passcode: 507948'''
+
==November 9th: "Persistent Unique Well Identifiers: Why does California need well IDs?"==
  
'''Summary:''' We are pleased to host Tom Maiersperger and Chris Torbert from the USGS Earth Resources Observation and Science (EROS) Center. Tom and Chris will provide an overview of the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS).  The Land Processes Distributed Active Archive Center (LP DAAC) operates as a partnership between the U.S. Geological Survey (USGS) and the National Aeronautics and Space Administration (NASA) and is a component of NASA’s Earth Observing System Data and Information System (EOSDIS).  The LP DAAC processes, archives, and distributes land data products to hundreds of thousands of users in the earth science community. LP DAAC land data products are made universally accessible and support the ongoing monitoring of Earth’s land dynamics and environmental systems to facilitate interdisciplinary research, education, and decision-making.
+
[[File:ITI_Nov_Wells.png|thumb|IT&I CA Wells November 2023]]
  
The LP DAAC has developed the Application for Extracting and Exploring Analysis Ready Samples (AρρEEARS) tool, which is a data subsetter that provides data access and data value exploration for a variety of data from federal archives.  AppEEARS offers users a simple and efficient way to perform data access and transformation processes.  By enabling users to subset data spatially, temporally, and by layer, the volume of data downloaded for analysis is greatly reduced.
+
[https://cawaterdata.org/teams/hannah-ake/ Hannah Ake]
  
'''About the Presenters:'''
+
Groundwater is a critical resource for farms, urban and rural communities, and ecosystems in California, supplying approximately 40 percent of California's total water supply in average water years, and in some regions of the state, up to 60 percent in dry years. Regardless of water year type – some communities rely entirely on groundwater for drinking water supplies year-round. However, California lacks a uniform well identification system, which has real impacts on those who manage and depend upon groundwater. Clearly identifying wells, both existing and newly constructed, is vital to maintaining a statewide well inventory that can be more easily monitored to ensure the wellbeing of people, the environment, and the economy, while supporting the sustainable use of groundwater. A uniform well ID program has not yet been accomplished at a scale like California, but it is achievable, as evidenced by great successes in other states. Learn more about why a well ID program will be so important to tackle in California and offer your thoughts about how to untangle some of the particularly thorny technical challenges.
  
<u>Thomas K. Maiersperger</u>
+
'''<u>Recording</u>''':<br />
 +
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/dvxOHh86QVQ?si=GtgSG62nbj2aVMR0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
Project Scientist, Land Processes Distributed Active Archive Center (LP DAAC)
+
'''<u>Minutes:</u>'''
  
U.S. Geological Survey (USGS)
+
*Groundwater is 40-60% of California's Water supply
 +
*~2 Million groundwater wells!
 +
*As many as 15k new wells are constructed each year
 +
Sustainable groundwater management act frames groundwater sustainability agencies that develop groundwater sustainability plans
  
Earth Resources Observation & Science (EROS) Center
+
There is a need to account for groundwater use to ensure the plans are achieved.
  
Email: [[Mailto:tmaiersperger@usgs.gov|tmaiersperger@usgs.gov]]
+
Problem: There is no dedicated funding (or central coordinator) to create and maintain a statewide well inventory.
  
<u>Chris Torbert</u>
+
  
Land Processes Distributed Active Archive Center (LP DAAC) Manager (acting)
+
*Department of Water Resources develops standards
 +
* State Water Resources Control Board has statewide ordinance
 +
*Cities and local districts adopt local ordinance
 +
*Local enforcement agency administers and enforces ordinance
  
U.S. Geological Survey (USGS)
+
There are a lot of IDs in use. 5 different identifiers can be used for the same well.
  
Earth Resources Observation and Science (EROS) Center
+
Solution: Create a well inventory that is statewide but is a compound (single id that stands in for many others) id from multiple id systems. – A meaningless identifier that links multiple others to each other.
  
Email: [[Mailto:ctorbert@usgs.gov|ctorbert@usgs.gov]]
+
There are a number of states with well id programs.  
  
'''Recording:'''<br /><html><iframe width="560" height="315" src="https://www.youtube.com/embed/WY_760yX_a0" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
*Trying to learn from what other states have done.
==11 February 2021: "Elevation Data Processing At Scale - Deploying Open Source GeoTools Using Docker, Kubernetes" Josh Trahern & Andrew Bulen==
 
'''Time:'''
 
Thursday, 11, February (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''GoToMeeting''': https://global.gotomeeting.com/join/977967029
+
Going forward with some kind of identifier system that spans all local and federal identifier systems.
  
'''Summary:''' We are pleased to host Josh Trahern (Project Manager) and Andrew Bulen (Lead Developer) from the USGS National Geospatial Technical Operations Center (NGTOC).
+
*Q: Will this include federal wells? – Yes!
 +
*Q: Will this actually be a new well identifier minted by someone? – Yes.
 +
*Q: If someone drills a well do they have to register it? – Yes, but it's the local enforcing agency that collects the information.
 +
*Q: What if a well is deepened? Do we update the ID? – This has caused real problems in the past. We end up with multiple IDs for the same hole that go through time.
 +
**Seems to make sense to make a new one to keep things simple.
  
The scheduled presentation highlights the Lev8 and Valid8 Web Applications, produced by the Elevation team.  These tools are used by Production Operations to support the 3D Elevation Program (3DEP). The 3DEP dataset is a compilation of data from a variety of existing high-precision datasets such as LiDAR data, contour maps, USGS DEM collection, SRTM and other sources which are combined into a seamless dataset, designed to cover all the United States and its territories.
+
Link mentioned early in the talk:
  
'''About the Presenters:''' The U.S. Geological Survey National Geospatial Technical Operations Center (NGTOC) provides leadership and world-class technical expertise in the acquisition and management of trusted geospatial data, services, and map products for the Nation. NGTOC supports The National Map as part of the National Geospatial Program (NGP).
+
https://groundwateraccounting.org/
  
'''Recording:'''
+
Reference during Q&A
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/nYsteOP9CUE" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
https://docs.ogc.org/per/20-067.html#_cerdi_vvg_selfie_demonstration
  
==10 December 2020: "Environmental Data Retrieval API" EDR-API Standard Working Group Members==
+
==October 26th: "Improving standards and documentation publishing methods: Why can’t we cross the finish line?"==
  
'''Time:'''
+
[[File:ITI_Oct_ogc.png|thumb|IT&I OGC October 2023]]
Thursday, 10, December (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
  
'''GoToMeeting''':
+
[https://www.ogc.org/about/team/scott-simmons/ Scott Simmons]
https://www.gotomeeting.com/join/533510693
 
  
'''Summary:''' The '''OGC API - Environmental Data Retrieval''' candidate standard is part of the OGC API suite of standards. [https://ogcapi.ogc.org/ OGC API standards] define modular API building blocks to spatially enable Web APIs in a consistent way. [http://openapis.org/ OpenAPI] is used to define the reusable API building blocks.
+
OGC and the rest of the Standards community have been promising for YEARS that our Standards and supporting documentation will be more friendly to the users that need this material the most. Progress has been made on many fronts, but why are we still not finished with a promise made in 2015 that all OGC Standards will be available in implementer-friendly views first, ugly virtual printed paper second?This topic bugs me as much as it bugs our whole community. Some of the problems are institutional (often from our Government members across the globe), others are due to lack of resources, but I think that most are due to a lack of clear reward to motivate people to do things differently.Major progress is being made in some areas. The OGC APIs have landing pages that include focused and relevant content for users/implementers and it takes some effort to find the owning Standard. OGC Developer Resources are growing quickly with sample code, running examples, and multiple views of API resources in OpenAPI, Swagger, and ReDoc.
  
*Dave Blodgett (U.S. Geological Survey) and Chris Little (UK Met Office): general overview and talk about how EDR fits with API-Common an O&M
+
[https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fportal.ogc.org%2Ffiles%2F%3Fartifact_id%3D106445&data=05%7C01%7Cdblodgett%40usgs.gov%7Cfb8c43b89a0e45fbb4ff08dbd56a8577%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638338425710712359%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z1GilUJ3gQaBmwi7QPaz3%2Fa2ycaqRcsJlNfoYN4aG0U%3D&reserved=0 Slides]
*Mark Burgoyne (UK Met Office): details of the API design and demonstration application
 
*Lei Hu (Wuhan University) demo and experience implementing
 
*Steve Olson and Shane Mill (U.S. National Oceanic and Atmospheric Administration) Demo and plans for EDR
 
*Tom Kralidis (Meteorological Service of Canada and pygeoapi) demo and the future of EDR in pygeoapi
 
  
[http://docs.opengeospatial.org/DRAFTS/19-086.html See the draft specification here.]
 
  
[https://github.com/opengeospatial/ogcapi-environmental-data-retrieval EDR github repository here.]
+
'''<u>Recording</u>''':<br />
 +
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/HJ7TbhcVs-U?si=hcUmpoIaTz4zNodo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/URVvMioUpZc" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
'''<u>Minutes:</u>'''
  
<br />
+
(missing first ~15 minutes of recording -- apologies)
  
==12 November 2020: "OGC API Update" Gobe Hobona==
+
Circa 2015 OGC GeoRabble
  
'''Time:'''
+
*Took a critical look at the status of publishing standards.
Thursday, 12, November (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
*Couldn't we format these specs in a kind of tutorial form?
  
'''GoToMeeting''':
+
* Lots of snippets and tutorial content in the specs.
https://www.gotomeeting.com/join/533510693
+
**E.g. http://opengeospatial.github.io/e-learning/index.html
 +
*Multiple representations of specifications – that OGC staff could maintain
  
'''Summary:'''
+
9 years later
This presentation will describe an emerging suite of Web Application Programming Interface (API) standards by the Open Geospatial Consortium (OGC). The OGC API standards define modular API building blocks to spatially enable Web APIs in a consistent way [https://ogcapi.ogc.org]. The standards make use of the OpenAPI specification for defining API building blocks that describe consistent behavior for accessing resources such as vector feature data, coverage data, metadata records, tiled data, tiled maps, and geospatial processes.  The presentation will introduce the standards and also cover the utility of the standards in supporting Earth Science. Examples from several recent and current projects will be shown to demonstrate how Earth Scientists can make use of the standards. 
 
  
'''About the Presenter:'''
+
*What makes this hard?
Dr. Gobe Hobona is the OGC's Director of Product Management, Standards. In this role he manages coordination between Standards Working Groups (SWGs) and set priorities for standards development (in cooperation with SWG chairs). He also provides oversight of OGC Application Programming Interface (API) evolution and harmonization activities. He holds a PhD in Geomatics from Newcastle University. He also holds a Bachelor of Science with Honours in Geographic Information Science from Newcastle University.  He is a professional member of both the Royal Institution of Chartered Surveyors (RICS) and the Association for Computing Machinery (ACM).
+
**Standards must be unambiguous AND procurable.
 +
**The modular specification is a model for this balance.
  
More: https://ogcapi.ogc.org/
+
Standards are based around testable requirements that relate to conformance classes.
  
'''Recording:'''
+
Swaggerhub and ReDoc as a way to show a richer collection of information for multiple users.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/3y46zovMsvo" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Specification are much more modular (core and extensions)
  
<br />
+
Developer website: https://developer.ogc.org/
  
==8 October 2020: "Quantifying and Communicating Climate Change Risk at First Street" Ed Kearns==
+
Going to be including persistent  demonstrator (example implementations) that are "in the wild".
  
'''Summary:'''
+
https://www.ogc.org/initiatives/open-science/
The First Street Foundation is quantifying and communicating the flood and inundation risks posed by a changing environment. First Street has created a nation-wide assessment of flood risk for the contiguous United States (CONUS) and the District of Columbia, and is now sharing that assessment for free with the public through Flood FactorTM. Risk scores for each of the approximately 142 million properties in CONUS were built upon open data from the National Oceanic and Atmospheric Administration (NOAA), the United States Geological Survey (USGS), the US Army Corps of Engineers (USACE), and the Federal Emergency Management Agency (FEMA). Flood risk from rivers, precipitation, sea level rise, and coastal storms have been combined into a single risk assessment methodology to raise individuals' awareness of their risks, and empower them to take steps to reduce their current and future flood risk exposure.
 
  
'''Recording:'''
+
Moving to an "OGC Building Blocks" model that are registered across multiple platforms and linked to lots of examples.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/0hQarFk9tFs" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Building blocks are richly described and nuanced but linked back to specific requirements in a specification.
  
<br />
+
https://blocks.ogc.org/
==10 September 2020: "SELFIE and geoconnex.us update" Dave Blodgett==
 
  
'''Summary:'''
+
https://sn80uo0zmbg.typeform.com/to/gcwDDNB6?typeform-source=blocks.ogc.org
This month, I’ll be providing an update on the outcomes of the Second Environmental Linked Features Interoperability Experiment (<nowiki>https://github.com/opengeospatial/SELFIE</nowiki>) and a project using the SELFIE outcomes, <nowiki>https://geoconnex.us</nowiki>. The SELFIE project explored a Web architecture for linking environmental features and observational data with a focus on adoptability and W3C spatial data on the Web best practices. It is wrapped up and in review with the OGC Geosemantics Domain Working Group. Geoconnex.us, US-focused effort being developed by the Ineternet of Water Project at Duke and USGS Water Mission Area, is building on the outcomes of SELFIE to build a system of linked data and knowledge network of earth science data.  
 
  
'''Slides:'''
+
A lot of this focused on APIs – what about data models?
Blodgett, David (2020): IT&I Tech Dive: Second Environmental Linked Features Interoperability Experiment. ESIP. Presentation. https://doi.org/10.6084/m9.figshare.12937445
 
  
'''Recording:'''
+
*Worked on APIs first because it was current. Also thinking about how to apply similar concepts to data models.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/gBqOV4c_QB8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
==September 14th: "Water data standardization: Navigating the AntiCommons" ==
  
==13 August 2020: "ESIP Summer Meeting Highlights"==
+
[[File:ITI_Sept_IoW_Kyle-Onda.png|thumb|IT&I IoW September 2023]]
  
'''Time'''
+
[https://internetofwater.org/about/people/kyle-onda/ Kyle Onda]
Thursday, 13, August 2020 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
==11 June 2020: "ESIP Collaboration Infrastructure 2.0" Erin Robinson, Ike Hecht, Lucas Cioffi, and Sheila Rabun==
 
  
'''Summary:'''
+
We all know interoperability rests on data standards and API standards. Many open standards are less prominent in the open water data space than proprietary solutions. This is because proprietary data management solutions are often bundled with very easy to use implementing software and more importantly—client software that address basic use cases. We’re giving people blueprints when they need houses. Community standards making processes should invest in end-user tools if they want to gain traction. The good news is that some of the newest generation of standards is much easier to develop around which has led to some reference implementations that are much easier to create end-user tools around than previously.
ESIP has been a virtual organization for over 20 years with two in person meetings a year, but our collaborative infrastructure was lagging. Due to COVID-19, ESIP leadership decided to move our next two in-person meetings to virtual meetings ([https://www.esipfed.org/monday-updates/special-esip-update-2020-esip-summer-2021-esip-winter-meetings-going-virtual announcement]) and to invest funding that would have been spent on travel into the ESIP collaborative infrastructure. We are currently upgrading a few components:
 
  
*Mediawiki upgrade from v1.19 to 1.34 of the ESIP wiki
+
'''<u>Recording</u>''':<br />
*Utilizing QiqoChat to bring together our asynchronous workspaces with our virtual conferences and meetings
+
<html>
*Becoming an ORCID member to gain access to ORCID API keyes to integrate ORCID authentication into the wiki
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/miFwXB-E1V8?si=Xr2SF_okCxLv2lL2" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
We will have three brief presentations on these components from [https://www.mediawiki.org/wiki/User:Tosfos Ike Hecht], [http://www.wikiworks.com/ WikiWorks] on the ESIP Wiki, [https://www.linkedin.com/in/lucascioffi/ Lucas Cioffi], [https://esip.qiqochat.com/ QiQoChat] lead developer on the technical side of QiQoChat and [https://orcid.org/0000-0002-1196-6279 Sheila Rabun], ORCID US Community Specialist on the ORCID API. [https://www.esipfed.org/about/leadership/staff/erin-robinson Erin Robinson], executive director of ESIP will introduce the session. 
+
'''<u>Minutes:</u>'''
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/gLTlTvGEjh8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
AntiCommons – name comes from social science background
  
==14 May 2020: "CUAHSI HydroShare Update" Jerad Bales, Anthony Castronova, Jeff Horsburgh==
+
Tragedy of the commons - two solutions, enclose (privatize) or regulate
  
'''Summary:'''
+
Tragedy of the anticommons - as opposed to common resources, these are resources that don't get used up – as in open data. Inefficiency and under utilization is common.
HydroShare (https://www.hydroshare.org) is a platform for sharing hydrologic resources (data, models, model instances, geographic coverages, etc.), enabling the scientific community to more easily and freely share products, including the data, models, and workflow scripts used to create scientific publications. HydroShare also includes a variety of social functions, such as resource sharing within a specified group, publication with a DOI, and support for integrating external applications to view and use resources without downloading them. This presentation will provide an overview of HydroShare, details of CUAHSI Compute resources which can be accessed through HydroShare or in a standalone mode, and the metadata model used in HydroShare. This presentation will also describe some community resources held by HydroShare, including comprehensive information on recent hurricanes and the complete Critical Zone Observatory data library.
 
  
More info: https://www.hydroshare.org
+
Two solutions. expropriation (like imminent domain or public data), incentivize
  
'''About the presenters:'''
+
Example – consolidate urban sprawl into higher density housing to get more open space and room for business.
  
Jerad Bales is the Executive Director of the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), Tony Castronova is Hydrologic Scientist at CUAHSI, and Jeff Horsburgh is an associate professor in the Civil and Environmental Engineering department at Utah State University
+
Introducing the Internet of Water.  
 +
 +
Noting that in PNW, there are >800 USGS stream gages and >400 from other organizations. Only USGS are very broadly known about.
  
'''Slides:''' [https://www.hydroshare.org/resource/c2d7d41310af472aaec9dce57928487e/ On HydroShare.]
+
Thinking about open data as an anticommons – environmental data is normally publically available but only in ways that are convenient to data providers and the software that they use.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/UvCDhHWFOT0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Discussion of the variety of standardized vs bespoke modes of data dissemination.
  
==9 April 2020: "Unidata Science Gateway" Julien Chastang==
+
Example of Nebraska – GUI with download and separate custom API
 +
USGS has the same basic scheme where an ETL goes from data management software to a custom web service system.
  
'''Summary:'''
+
What's going on here? Limited resources lead to focus on existing users and needs and administration ease.
With the goal of better serving our community, Unidata is investigating how its technologies can best make use of cloud computing. The observation that science students and professionals are spending too much time distracted by software that is difficult to access, install, and use, motivates Unidata’s investigation. In addition, cloud computing can tackle a class of problems that cannot be approached by traditional, local computing methods because of its ability to scale and its capacity to store large quantities of data. Cloud computing accelerates scientific workflows, discoveries, and collaborations by reducing research and data friction. We aim to improve “time to science” with the NSF-funded Jetstream cloud. We describe a Unidata science gateway on Jetstream. With the aid of open-source cloud computing projects such as OpenStack, Docker, and JupyterHub, we deploy a variety of scientific computing resources on Jetstream for our scientific community. These systems can be leveraged with data-proximate Jupyter notebooks, and remote visualization clients such as the Unidata Integrated Data Viewer (IDV) and AWIPS CAVE. This gateway will enable students and scientists to spend less time managing their software and more time doing science.
 
  
More info: https://science-gateway.unidata.ucar.edu/
+
Tools that meet this need tend to not focus on the needs of new user and standardization.
  
Slides: https://doi.org/10.6084/m9.figshare.12124065.v1
+
Most organizations don't need standards – they need software. Both server and CLIENT software.  
  
'''About the presenter:'''
+
New specs and efforts ARE heading in this direction.  
I am a scientific software developer for the Unidata Program Center at UCAR (University Corporation for Atmospheric Research) in Boulder, Colorado. I have been employed at UCAR  since 1999 and at Unidata since 2010. I obtained a bachelor's degree in molecular, cellular and developmental biology in 1994 and a master's degree in computer science in 2000. I am passionate about the application of computing technology to science and math. During my employment at Unidata I have advocated for open-source, cloud computing and Python related technologies. I began at Unidata as a software developer supporting the Integrated Data Viewer (IDV). More recently, I have been focused on Unidata science gateway efforts with the objective of facilitating science for the Unidata community with web technologies.
+
OGC-API, SensorThings, etc.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/hdGu8XPW6Rg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Promising developments around proxying non standard APIs and in use of structured data "decoration" to make documentation more standard.
  
==12 March 2020: "DGGS in action: provision of rapid response during Australian bushfires and other applications"==
+
== August 10th: "Learning to love the upside down: Quarto and the two data science worlds" ==
  
'''Speakers'''
+
[[File:ITI_August_Quarto.png|thumb|IT&I Quarto August 10th]]
Shane Crossman and Irina Bastrakova
 
  
'''Summary'''
+
[https://cscheid.net/v2/ Carlos Scheidegger]
Everything has a location. Location can be defined using descriptive terms (e.g. place names), geometry (e.g. geographic coordinates) and/or index notations (e.g. statistical boundaries). However, existing approaches and disconnected infrastructures limit our ability to discover, access and integrate spatial data across organisation and jurisdiction boundaries to produce up-to-date reliable information. The Location Index project (LOC-I) aims to introduce a consistent way to access, analyse and use location data to support the effective integration of socio-economic, statistical and environmental data from multiple data providers to support the spatially enabled delivery of Government policies and initiatives.
 
 
The devastation caused by the Australian Bushfires highlighted the need for a new approach for rapid data integration. The total burnt area during Autumn-Summer 2019-2020 is 72,000 square miles, which is an equivalent to a half of Montana or North Dakota and Delaware areas combined. Rapid response in provision of information on areas affected by the bushfires was required to support evaluation of the impact, and also planning the recovery process and support for families, businesses and the environment. This presentation will discuss application of the Discrete Global Grid System (DGGS) in bringing together diverse complex information from multiple sources to support the response process. The presentation will also discuss testing of the DGGS capability in other use cases.
 
  
Slides here: https://doi.org/10.6084/m9.figshare.12032592.v1
+
There are two wonderful data science worlds. You can be a jupyter expert: you work on jupyter notebooks, with access to myriad Julia, Python, and R packages, and excellent technical documentation systems. You can also be a knitr and rmarkdown expert: you work on rmarkdown notebooks, with access to myriad Julia, Python, and R packages, and excellent technical documentation systems.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/AF_9JBuPdHI" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
But what if your colleague works on the wrong side of the fence? What if you spent years learning one of them, only to find that the job you love is in an organization that uses the other? In this talk, I’m going to tell you about quarto, a system for technical communication (software documentation, academic papers, websites, etc) that aspires to let you choose any of these worlds.
  
==13 February 2020: "Urban Flooding Open Knowledge Network": Mike Johnson==
+
If you’re one to worry about Conway’s law and what this two-worlds situation does to an organization’s talent pool, or if you live in one side of the world and want to be able to collaborate with folks on the other side, I think you’ll find something of value in what I have to say. 
  
We’re back this month with a webinar on the Urban Flooding Open Knowledge Network project from Mike Johnson, a graduate student from UC Santa Barbara. This is an exciting stakeholder-driven knowledge network project with emphasis on prototyping interfaces and web resources, some of which Mike will demonstrate for us.
+
I’m also going to complain about software, mostly the one I write. Mostly.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/HuJBKp0yCOo" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Slides: https://cscheid.net/static/2023-esip-quarto-talk/
  
==12 December 2019: "Location Integration Project -- DGGS and Linked Data": Matthew Purss and Shane Crossman (postponed)==
+
'''<u>Recording</u>''':<br />
 
+
<html>
'''Summary'''
+
<iframe width="560" height="315" src="https://www.youtube.com/embed/uQ3yZjM1bj8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
Following presentations on DGGS and similar technologies for indexing, this presentation will explore use cases and technical implementations for using DGGS and linked data together in the Loc-I project.
+
</html>
 
 
==14 November 2019: "Location Integration Project": Matthew Purss and Shane Crossman==
 
 
 
'''Summary'''
 
Everything has a location. Location can be defined using descriptive terms (e.g. place names, suburb names, and rivers), position and geometry (e.g. geographic coordinates like latitude and longitude) and/or index notations (e.g. mesh block or parcel identifiers). A street address is useful for service delivery; land parcels for revenue raising and investment; coordinates and grids for positioning and monitoring changes in the landscape; various administrative boundaries for law enforcement and service management; and statistical (e.g. health, society or economy) information to analyse and improve life in Australian. However, existing approaches and disconnected infrastructures coupled with the myriad of ways to describe and store location information limit our ability to discover, access and integrate spatial data across organization and jurisdiction boundaries to produce reliable and actionable information. The Location Index project (LOC-I) aims to introduce a consistent way to access, analyse and use location data to support the effective integration of socio-economic, statistical and environmental data from multiple data providers to support the spatially enabled delivery of Government policies and initiatives.
 
 
The flexible, and standards based, spatial data infrastructures being developed by Geoscience Australia and its partners under LOC-I include Internet of Things (IoT), Linked Data and Discrete Global Grid Systems (DGGS) technologies integrated with cloud-based data discovery and access tools. This will enable LOC-I to democratize spatial data applications, where people will be able to do spatial data integration operations without needing GIS specialists. This presentation will provide an overview of LOC-I, DGGS and how DGGS technologies can be used as a tool to spatially enable and integrate socio-economic and geospatial data together.
 
  
http://locationindex.org
+
'''<u>Minutes:</u>'''
  
[http://wiki.esipfed.org/images/f/fb/Loc-I_webinar_-_15Nov2019.pptx| Download the slides here.]
+
Carlos was in a tenure computer science position at University of Arizona.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/fRSYsuNP1X0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
Hating bad software makes a software developer a good developer.  
  
==10 October 2019: "ELFIE: Environmental Linked Features Interoperability Experiment":==
+
Two data science worlds:  
  
'''Summary'''
+
tidyverse (with R and markdown)
This webinar will provide an update on activities of the OGC Environmental Linked Features Interoperability Experiment (ELFIE) and a preview of activities of the Second ELFIE (SELFIE). This will be an interactive demonstration-based session focused on practical application of JSON-LD to link features and observations.
 
  
Presenter: [https://www.usgs.gov/staff-profiles/david-l-blodgett David Blodgett, Civil Engineer, U.S. Geological Survey Water Mission Area]
+
*Cohesive, hard to run things out of order.
 +
* Doesn't store output.
  
[https://opengeospatial.github.io/ELFIE/ More info on ELFIE here.]
+
Jupyter (python and notebooks)
  
'''Recording'''
+
*Notebook saves intermediate outputs.
 +
*State can be messed up easily – cells aren't linear steps.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/lvy9vdv96-s" frameborder="0" allowfullscreen></iframe></html>
+
Quarto:
  
==12 September 2019: "STARE: SpatioTemporal Adaptive Resolution Encoding for Scalable Integrative Analysis": Michael Rilee==
+
*Acts as a compatibility layer for tidyverse and jupyter ecosystems.
 +
*Emulates RMarkdown with multi language support.
  
'''Summary'''
+
Rant:
Title: STARE: SpatioTemporal Adaptive Resolution Encoding for Scalable Integrative Analysis
 
  
Abstract: The Earth Science enterprise has generated a great volume and variety of high-quality, high-spatiotemporal resolution data (e.g. level 1 and 2 swath data), that could be integrated to open wide arenas of new, improved scientific advances. Yet aligning and integrating different kinds of Earth Science data is a laborious process, leading most researchers to focus on more generic, high level data products that are more easily compared. Dealing with the great volume and variety is the goal of the NASA/ACCESS-17 STARE project. STARE is a unifying indexing scheme addressing variety and is well suited for applying distributed storage and computing resources to address volume. We will describe the STARE approach and how it relates to other geolocation schemes, e.g. DGGS or simulation grids.
+
Quarto gets you a webpage and PDF output.  
  
Presenter: Michael Rilee, Ph.D., Rilee Systems Technologies LLC.
+
– note that the PDF requirement is not great.
  
Bio: Michael Rilee, PI of the STARE project, has been involved with high-end computing and modeling and simulation for about 25 years. For about the past 5 years or so he has been researching advanced computing techniques for a range of Earth Science applications including parallel computing, array databases, regridding, kriging, and atmospheric chemistry. Before that he was involved with several NASA efforts involving spacecraft autonomy, on-board data processing, and high-end computing. He was initially trained in plasma astrophysics, receiving his Ph.D. from Cornell University.
+
Quarto is kind of just a huge wrapper around pandoc.
  
[https://esip.figshare.com/articles/STARE_Spatiotemporal_Adaptive_Resolution_Encoding_for_Integrative_Compatibility_Across_Variety_of_Volume/8942291 See poster about STARE here.]
+
Quarto documentation is intractably hard to build out.
[http://odl.unl.edu/index.php/map_interface_user_guide/ See demo here.]
 
[https://www.researchgate.net/profile/Kwo-Sen_Kuo See related research here.]
 
  
'''Recording'''
+
Consider Conways Law – that an organization that creates a large system will create a system that is a copy of the organization's communication structure.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/uodeCiYc6Mg" frameborder="0" allowfullscreen></iframe></html>
+
– Quarto is meant to allow whole organizations with different technical tools exist in the same communication structure (same system).
 
 
==8 August 2019: "The Challenge of Location and How Discrete Global Grid Systems can enable Spatial Data Integration.": Matthew Purss==
 
 
 
'''Summary'''
 
Everything has a location. Location can be defined using descriptive terms (e.g. place names, suburb names, and rivers), position and geometry (e.g. geographic coordinates like latitude and longitude) and/or index notations (e.g. mesh block or parcel identifiers). A street address is useful for service delivery; land parcels for revenue raising and investment; coordinates and grids for positioning and monitoring changes in the landscape; various administrative boundaries for law enforcement and service management; and statistical (e.g. health, society or economy) information to analyse and improve life in Australian. However, existing approaches and disconnected infrastructures coupled with the myriad of ways to describe and store location information limit our ability to discover, access and integrate spatial data across organisation and jurisdiction boundaries to produce reliable and actionable information. The Location Index project (LOC-I) aims to introduce a consistent way to access, analyse and use location data to support the effective integration of socio-economic, statistical and environmental data from multiple data providers to support the spatially enabled delivery of Government policies and initiatives.
 
 
   
 
   
The flexible, and standards based, spatial data infrastructures being developed by Geoscience Australia and its partners under LOC-I include Internet of Things (IoT), Linked Data and Discrete Global Grid Systems (DGGS) technologies integrated with cloud-based data discovery and access tools. This will enable LOC-I to democratise spatial data applications, where people will be able to do spatial data integration operations without needing GIS specialists. This presentation will provide an overview of LOC-I, DGGS and how DGGS technologies can be used as a tool to spatially enable and integrate socio-economic and geospatial data together.
+
Quarto tries to make kinda hard things easy while not making really hard things impossible.
  
Dr. Matthew Brian John Purss is a Senior Advisor on Geospatial Standards at Geoscience Australia. He is a founding co-chair of the Open Geospatial Consortium's (OGC) Discrete Global Grid Systems (DGGS) Domain and Standards Working Groups. He is a geophysicist with over 20 years' experience in the exploration, research and government sectors and holds a PhD in Exploration Geophysics from Monash University where he studied grid based approaches to magnetic and electromagnetic modelling applications. Prior to initiating and leading the international standardisation of DGGS infrastructures.
+
Quarto can convert jupyter notebooks (with cached outputs) into markdown and vice versa.
  
http://locationindex.org
+
Issue is, you need to know a variety of other languages (YAML, CSS, Javascript, LaTeX, etc.)
  
[[http://wiki.esipfed.org/images/c/c4/ESIP_DGGS_Webinar_presentation_2019_08_08.pptx|Download the slides here.]]
+
– "unavoidable but kinda gross"
  
'''Recording'''
+
You can edit Quarto in RStudio or VS Code, or any text editor.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/Gbe6-N_Zisw" frameborder="0" allowfullscreen></iframe></html>
+
For collaboration, Quarto projects can use jupyter or knitr engines. E.g. in a single web page, you can build one page with jupyter and another page with knitr.
  
==11 July 2019: "ADIwg Open Source Metadata Toolkit": Josh Bradley, Dennis Walworth==
+
– you can embed a ipynb cell in a notebook.
  
'''Summary'''
+
Orchestrating computation is hard – quarto has to take input from existing computation – which can be awkward / complex.
Josh Bradley is a data manager for Fish & Wildlife Service, Science Data Applications and project manager for the ADIwg mdTools project.
 
Dennis Walworth is the data manager for the USGS Alaska Science Center and collaborator on the ADIwg mdTools project.
 
  
*[https://docs.google.com/presentation/d/e/2PACX-1vTAXbGlKL8QZ4G8pSHDOoeOLrfH3xWHNmspLeaMAocJI2zl0fKn5g-x7PAirekLAi3vImIL6752T0k7/pub?start=false&loop=false&delayms=3000&slide=id.p1 See slides here.]
+
Quarto is extensible – CSS themes, OJS for interactive webpages, Pandoc extensions.
*mdEditor website: https://www.mdeditor.org/
 
*Development version: https://dev.mdeditor.org/
 
*mdEditor User Manual: https://adiwg.gitbooks.io/mdeditor/content/
 
*GitHub Issues: https://github.com/adiwg/mdEditor/issues
 
  
'''Recording'''
+
Can also write your own shortcodes.
 +
==July 13th 2023: "Tools to Assist Simulation Based Researchers in Deciding What Project Outputs to Preserve and Share"==
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/ClRmCU2Ae5c" frameborder="0" allowfullscreen></iframe></html>
+
[[File:ITI_July_EarthCube.png|thumb|IT&I EarthCube Model RCN July 13th]]
  
==6 June 2019: "Google Dataset Search: Facilitating data discovery in an open ecosystem.": Chris Gorgolewski==
+
[https://staff.ucar.edu/users/schuster Doug Schuster]
  
'''Summary'''
+
This presentation will highlight findings from the NSF EarthCube
 +
Research Coordination Network project titled “What About Model Data? -
 +
Best Practices for Preservation and Replicability”
 +
(https://modeldatarcn.github.io/), which suggest that most simulation
 +
based research projects only need to preserve and share selected model
 +
outputs, along with the full simulation experiment workflow to
 +
communicate knowledge. Challenges related to meeting community open science
 +
expectations will also be highlighted.
  
There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others’ work, and providing data journalists easier access to information and its provenance. In this talk, I will discuss recently launched Google Dataset Search, which provides search capabilities over potentially all dataset repositories on the Web. I will talk about the open ecosystem for describing and citing datasets that we hope to encourage and the technical details on how we went about building Dataset Search. Finally, I will highlight research challenges in building a vibrant, heterogeneous, and open ecosystem where data becomes a first-class citizen.
+
Slides available here: [[File:ModelDataRCN-2023-07-13-ESIP-IT&I_v2.pdf|thumb]]
  
https://toolbox.google.com/datasetsearch
+
https://modeldatarcn.github.io/
https://www.blog.google/products/search/making-it-easier-discover-datasets/
 
  
'''Recording'''
+
Rubric: https://modeldatarcn.github.io/rubrics-worksheets/Descriptor-classifications-worksheet-v2.0.pdf
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/1PSE2hQ7mBo?start=3" frameborder="0" allowfullscreen></iframe></html>
+
''Open science expectations for simulation based research. Frontiers in Climate, 2021. https://doi.org/10.3389/fclim.2021.763420''
  
==9 May 2019: "The SpatioTemporal Asset Catalog (STAC) specification": Chris Holmes==
+
'''<u>Recording</u>''':<br />
 +
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/ulk0mQSQNzQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
'''Summary'''
+
'''<u>Minutes:</u>'''
  
The SpatioTemporal Asset Catalog (STAC) specification is an emerging standard to make it easier to find geospatial information. It aims to enable a cloud-native geospatial future by providing a common layer of metadata for search and discovery, while playing well with the web and existing geospatial standards.
+
Primary motivation: What are data management requirements for simulation projects?
  
See the the [http://stacspec.org/ SpatioTemporal Asset Catalog (STAC) specification] web site here.
+
Project ran May 2020 to Jul 2022
  
'''Time''': Thursday, 9, May, 2019 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
We clearly shouldn't preserve ALL data / output from projects. It's just too expensive.
  
'''Recording'''
+
Project broke down components of data associated with a different project
 +
Forcings, code/documentation, selected outputs.
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/emXgkNutUTo" frameborder="0" allowfullscreen></iframe></html>
+
But what outputs to share?!?
  
 +
Project developed a rubric of what to preserve / share.
  
==11 Apr 2019: "Pachyderm": John Karabaic==
+
"Is your project a data production project or a knowledge production project"
  
'''Summary'''
+
"How hard is it to rerun your workflow?"
  
See the Pachyderm one pager here: https://cdn2.hubspot.net/hubfs/4751021/Business_One_Pager.pdf
+
"How much will it cost to store and serve the data?"
  
"Pachyderm lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance."
+
Rubric gives guidance on how much of a project's outputs should be preserved.
  
'''Time''': Thursday, 11, April, 2019 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
So this is all well and good, but it falls onto PIs and funding agencies.
  
'''Join meeting''':
+
What are the ethical and professional considerations of these trade offs?
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
What are the incentives in place currently? Sharing is not necessarily seen as a benefit to the author.
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker''': Michael Masters (Pachyderm)
+
==June 8 2023: "Reproducible Data Pipelines in Modern Data Science: what they are, how to use them, and examples you can use!"==
See Docs Here:
 
http://docs.pachyderm.io/en/latest/getting_started/local_installation.html 
 
  
Also see intro talk here: https://changelog.com/practicalai/23
+
[[File:ITI_June_Pipeline.png|thumb|IT&I Reproducible Pipelines June 8th]]
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/za0JNgL4JGA" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></html>
+
[https://www.usgs.gov/staff-profiles/julie-padilla Julie Padilla]
  
==14 Mar 2019: "Integrating SciServer and OceanSpy to enable easy access to oceanographic model output": Mattia Almansi (Johns Hopkins University)==
+
Modern scientific workflows face common challenges including accommodating growing volumes and complexity of data and the need to update analyses as new data becomes available or project needs change. The use of better practices around reproducible workflows and the use of automated data analysis pipelines can help overcome these challenges and more efficiently translate open data to actionable scientific insights. These data pipelines are transparent, reproducible, and robust to changes in the data or analysis, and therefore promote efficient, open science. In this presentation, participants will learn what makes a reproducible data pipeline and what differentiates it from a workflow as well as the key organizational concepts for effective pipeline development.
  
'''Time''': Thursday, 14, March, 2019 (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
'''<u>Recording</u>''':<br />
 +
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/K8EOY_HLlho" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
'''Summary''':  OceanSpy is an open-source and user-friendly Python package that enables scientists and interested amateurs to use ocean model data sets with out-of-the-box analysis tools.  OceanSpy builds on software packages developed by the Pangeo community (in particular xarray, dask, and xgcm). I will show how OceanSpy can be used on model outputs stored on the Johns Hopkins SciServer system, negating the need for the user to own a computing cluster or even download the data. OceanSpy accelerates and facilitates exploration (including visualization) of terascale data. It is designed to operate on petascale ocean simulations hosted by SciServer in the coming months.
+
'''<u>Minutes:</u>'''
  
'''Join meeting''':
+
Motivation –
 +
what if we find bad data in an input,
 +
what if we need to rerun something with new data,
 +
can we reproduce findings from previous work?
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
Need to be able to "trace" what we did and the way we do it needs to be reliable.
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker''': Mattia is a PhD student at Johns Hopkins University under Prof. Tom Haine.  He's a physical oceanographer, primarily interested in the dynamics governing the circulation of the Subpolar North Atlantic. To address the targets of his research he uses high-resolution numerical simulations and available observations.
+
A "workflow" is a sequence of steps going from start to finish of some activity or process.
  
'''Links'''
+
A "pipeline" is a programmatic implementation of a workflow that requires little to no interaction.
  
*[https://github.com/malmans2/oceanspy OceanSpy Github Page]
+
In a pipeline, if one workflow step or input gets changed, we can track what is "downstream" of it.
*[http://www.sciserver.org/ SciServer web page]
 
  
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/GO_hNIcFw8o" frameborder="0" allowfullscreen></iframe></html>
+
Note that different steps of the workflow may be influenced by  different people. So a given step of a pipeline could be contributed by different programmers. But each person would be contributing a component of a consistent pipeline.
  
==14 Feb 2019: "Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo": Scott Henderson (University of Washington)==
+
There is a difference between writing scripts to building a reproducible pipeline.
 +
Better to break it into steps. Script -> organize -> encapsulate into functions -> assemble pipeline.
  
'''Summary''': NASA has estimated that by 2025, it will be storing upwards of 250 Petabytes (PB) of its data using commercial Cloud services (e.g. Amazon Web Services [AWS]). This presentation will focus on efforts funded by a NASA ACCESS 2017 grant to transition the Earth Science community into Cloud computing by developing technologies that build on top of the growing Pangeo ecosystem. In particular, the integration of JupyterHub with Kubernetes and several high-level Python packages (i.e. Xarray, Dask, Rasterio, Intake, PyViz), are enabling Cloud-native workflows that circumvent the bottleneck of downloading large amounts of data. These tools work best with emerging Cloud-native storage solutions for satellite imagery (i.e. NASA’s CMR, STAC, COGs). In this presentation, Scott gives update on the Pangeo project and showcases a few example workflows using large public archives of optical and radar satellite data.
+
Focus is on R targets – snakemake is equivalent in python.
  
'''Speaker''': Scott Henderson has a PhD in Geological Sciences from Cornell University and is currently a postdoctoral fellow at the University of Washington eScience Institute. Scott studies geologic hazards with satellite-based synthetic aperture radar.
+
Key concepts for going from script to workflow:  
 +
Functions stored separate from workflow script.
 +
Steps clearly organized in script.
 +
Can wrap steps in pipeline steps to track them.
  
'''Links'''
+
Pipeline software keeps track of whether things have changed and what needs to be rerun.
 +
Allows visualization of the workflow inputs, functions, and steps.
  
*[https://docs.google.com/presentation/d/1evNXCddIllXUt4a5jKfmlO2197sp8T6I9L650cYZcsk/edit?usp=sharing Google Slides]
+
How do steps of the pipeline get related to eachother?
*[https://github.com/scottyhq/esip-tech-dive Landsat NDVI (data on AWS, compute on Google)]
+
They are named and the target names get passed to downstream targets.
*[https://github.com/pangeo-data/pangeo-tutorial-agu-2018 AGU 2018 Tutorial material (various examples)]
 
*[https://github.com/scottyhq/grfn_pangeo_demo Getting ready for NISAR data]
 
*[https://github.com/scottyhq/stac-intake-landsat STAC catalogs, Intake, mosaics]
 
*[https://medium.com/pangeo/cloud-native-geoprocessing-of-earth-observation-satellite-data-with-pangeo-997692d91ca2 blog post for context]
 
  
'''Recording'''
+
Chat questions about branching.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/yQon6Wh-lN4" frameborder="0" allowfullscreen></iframe></html>
+
Dynamic branching lets you run the same target for a list of inputs in a map/reduce pattern.  
  
==13 Dec 2018: "Developing JupyterLab Extensions": Ian Rose (Berkeley)==
+
Pipelines can have outputs that are reports that render pipeline results in a nice form.
  
'''Summary''': JupyterLab extensions can customize or enhance any part of JupyterLab. They can provide new themes, file viewers and editors, or renderers for rich outputs in notebooks. Extensions can add items to the menu or command palette, keyboard shortcuts, or settings in the settings system. Extensions can provide an API for other extensions to use and can depend on other extensions. In fact, the whole of JupyterLab itself is simply a collection of extensions that are no more powerful or privileged than any custom extension. In this talk Ian will demonstrate how to build a JuptyerLab extension.  
+
Pipeline templates:
 +
A pipeline can adopt from a standard template that is pre-determined.  
 +
Helps enforce best practices and have a quick and easy starting point.
  
'''Time''': Thursday, 13 Dec, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
Note that USGS data science has a template for a common pattern.
  
'''Join meeting''':  
+
What's a best practice for tracking container function and reproducibility?
 +
Versioned Git / Docker for code and environment.
 +
For data, it is context dependent. Generally, try to pull down citeable / persistent sources. If sources are not persistent, you can cache inputs for later reuse / reproducibility.
 +
 +
Data change detection / cacheing is a really tricky thing but many people are working on the problem. https://cboettig.github.io/contentid/, https://dvc.org/
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
https://learning.nceas.ucsb.edu/2021-11-delta/session-3-programmatic-metadata-and-data-access.html#reproducible-data-access
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker''': Ian Rose is a postdoctoral fellow at the Berkeley Institute for Data Science, where he is a core developer on Project Jupyter, working on JupyterLab.  He has a PhD in geophysics from Berkeley.
 
  
'''Links''':
+
==11 May 2023: "Software Procurement Has Failed Us Completely, But No More!"==
  
*https://jupyterlab.readthedocs.io/en/stable/
+
[[File:ITI_May_Software.png|thumb|IT&I Software Procurement May 11th]]
*https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
 
*https://jupyterlab.readthedocs.io/en/stable/developer/extension_dev.html
 
  
'''Recording'''
+
[https://waldo.jaquith.org/ Waldo Jaquith]
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/cUEutudPmAo" frameborder="0" allowfullscreen></iframe></html>
 
  
 +
The way we buy custom software is terrible for everybody involved, and has become a major obstacle to agencies achieving their missions. There are solutions, if we would just use them! By combining the standard practices of user research, Agile software development, open source, modular procurement, and time & materials contracts, we can make procurement once again serve the needs of government.
  
==8 November 2018: "Intake: Lightweight tools for loading and sharing data in data science projects": Martin Durant (Anaconda)==
+
Slides available here: [[File:2023-05-Jaquith.pdf|thumb]]
  
'''Summary''': Intake is a set of free open-source Python tools that help load data from a variety of formats into familiar containers like Pandas dataframes, Xarray datasets, and more. Boilerplate data loading code can be transformed into reusable Intake plugins.  Datasets can be described for easy reuse and sharing using Intake catalog files.  Martin will give an overview of Intake and demonstrate use via Jupyter Notebooks.  
+
'''<u>Recording</u>''':<br />
 +
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/V4-3WZ5hN5k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
'''Time''': Thursday, 8 November, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
'''<u>Minutes:</u>'''
  
'''Join meeting''':
+
Recognizing that software procurement is one of the primary ways that Software / IT systems advance, Waldo went into trying to understand that space as a software developer.
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
'''Healthcare.gov '''
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker''': Martin Durant is a data scientist and software developer at Anaconda who specializes in Python data engineering tools, instruction and solutions.  He also has a PhD in Astrophysics.  
+
Contract given to CGI Federal for $93M – cost ~1.7B by launch.<br>
 +
Low single digits numbers of people actually made it through the system.<br>
 +
Senior leaders were given the impression that things were in good shape.<br>
 +
The developers working on the site knew it wasn't going to work out (per IG report).<br>
 +
*strategic misrepresentation – things are represented as more rosy as you go up the chain of command<br>
 +
On launch, things went very badly and the recovery was actually quite quick and positive.<br>
  
'''Links''':
+
Waldo recommends reading the IG report on healthcare.gov.
 +
This article: <br>
 +
''("The OIG Report Analyzing Healthcare.gov's Launch: What's There And What's Not", Health Affairs Blog, February 24, 2016. https://dx.doi.org/10.1377/hblog20160224.053370<nowiki/>)'' provides a path to the IG report: <br>
 +
''(HealthCare.gov - CMS Management of the Federal Marketplace: An OIG Case Study (OEI-06-14-00350), https://oig.hhs.gov/oei/reports/oei-06-14-00350.pdf<nowiki/>)''
 +
and additional perspective.
  
*https://intake.readthedocs.io
+
'''Rhode Island Unified Health Infrastructure'''
*https://github.com/ContinuumIO/intake
 
  
'''Recording'''
+
($364M to DeLoitte) "Big Bang" deployment – they let people running old systems go on the day of the new system launch.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/PSD7r3JFml0" frameborder="0" allowfullscreen></iframe></html>
+
They "outsourced" a mission critical function to a contractor.
  
 +
We don't tend to hear about relatively smaller projects because they are less likely to fail and garner less attention.
  
==11 October 2018: "SpatioTemporal Feature Registry: ESIP idea campaign and working example in USGS": Sky Bristol (USGS)==
+
Outsourcing as started in the ~90s was one thing when the outsourcing was for internal agency software. It's different when the systems are actually public interfaces to stakeholders or are otherwise mission critical.
  
'''Summary''': Building a National Biogeographic Map, an analysis platform for exploring biodiversity conservation measures and stressors, sparked the need for a reliable data source with a wide variety of identified places. These need to be assembled in a sustainable and robust way that keeps track of provenance and processing steps so we can build reports for decisionmakers that are trustworthy and consistent. We started an idea campaign in the ESIP Lab with some questions we have about how best to do this work. This talk will share what we have in place so far, including a registry of sources, a data processing pipeline, an integrated index, a REST API, and a working web application. Technologies include USGS ScienceBase, PostgreSQL/PostGIS, ElasticSearch, Python Flask, and other Python processing codes using GDAL and other libraries.
+
'''[[File:2023-05-Jaquith.pdf|See slides with big numbers and study sources!!]]'''
  
'''Time''': Thursday, 11 October, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
It's common for software to meet contract requirements but NOT meet end user needs.
  
'''Join meeting''':
+
Requirements complexity is fractal. There is no complete / comprehensive set of requirements.
No live meeting, as something came up at the last minute and Sky can't make it. He did record a 30 minute presentation, however.  See "Recording" below.  See also the sample Jupyter notebooks in the "Links" section below.
 
  
'''Speaker''': Sky Bristol is the branch chief for Biogeographic Characterization in the USGS Core Science Analytics, Synthesis, and Library Program. Unless he can be skiing bumps, he likes doing cool things to help people make better decisions using data.
+
… federal contractors interpreting requirements as children trying to resist getting out the door ...
  
'''Links''':
+
There is little to no potential to update or improve requirements due to contract structure.
https://github.com/skybristol/notebooks/blob/master/SFR%20Exploration.ipynb
 
  
 +
'''Demos not memos!'''
  
'''Recording'''
+
Memorable statements:
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/J7T59-H_W4o" frameborder="0" allowfullscreen></iframe></html>
+
*Outsourced ability to accomplish agency’s mission
 +
*Load bearing software systems on which the agency depends to complete their mission.
 +
*Mission of many agencies is mediated by technology.
  
==9 August 2018: "EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks": Dharhas Pothina (ERDC)==
+
But no more! – approach developed by 18F
  
'''Summary''': Building environmental simulation workflows is typically a slow process involving multiple  proprietary desktop tools that do not interoperate well. In this work, we demonstrate building flexible, lightweight workflows entirely in Jupyter notebooks.  The goal is to provide a set of tools that can easily be reconfigured and repurposed as needed to rapidly solve specific emerging issues. As part of this work, extensive improvements were made to several general-purpose open source packages, including support for annotating and editing plots and maps in Bokeh and HoloViews, rendering large triangular meshes and regridding large raster data in HoloViews, GeoViews, and Datashader, and widget libraries for Param.
+
System of six parts –
  
'''Time''': Thursday, August 9, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
1. User-centered design  <br>
 +
2. Agile software development <br> 
 +
3. Product ownership  <br>
 +
4. DevOps  <br>
 +
5. Building out of loosely coupled parts  <br>
 +
6. Modular contracting  <br>
  
'''Join meeting''':  
+
[[File:Agil-control-model.png|frame|Roles for government and vendors in agile contracting]]
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
"You don't know what people need till you talk to them."
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker''':
+
Basic premise of agile is good. Focus is on finished software being developed every two weeks.
Dharhas Pothina is the Associate Technical Director of the Information Technology Laboratory, US Army Engineer Research and Development Center, in Vicksburg, MS. He was formerly the Water Informatics Lead at the Texas Water Development Board. He holds a Ph.D in Civil Engineering from the University of Texas at Austin.
 
  
'''Links''':
+
Constantly delivering a usable product... e.g. A skateboard is more usable than a car part.
https://pyviz.github.io/EarthSim/
 
  
'''Recording'''
+
Key roles for government staff around operations are too often overlooked.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/etf8M_uW39E" frameborder="0" allowfullscreen></iframe></html>
 
  
 +
Product team needs to include an Agency Product Owner. Allows government representation in software development iteration.
  
==14 June 2018: "Analysis of Massive Underwater Video Data in the Cloud using Pangeo": Tim Crone (Lamont)==
+
Build out of loosely coupled / interchangeable components. Allows you to do smaller things and form big coherent systems that can evolve.
  
'''Summary''': An open-source environment for parallel analysis of massive (100TB) image data in the Cloud is now available via the Pangeo environment, which allows you to apply the power of the Python ecosystem from your browser.  Technologies include JupyterHub, Kubernetes, Docker, and Dask distributed.
+
Modular contracts allow big projects that are delivered through many small task orders or contracts. The contract document is kind of a fill in the blank template and doesn't have to be hard.
  
'''Time''': Thursday, June 14, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
Westrum typology of cultures article is relevant is relevant: http://dx.doi.org/10.1136/qshc.2003.009522
  
'''Join meeting''':  
+
==13 April 2023: "Evolution of open source geospatial python."==
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
[[File:Iti april geospatialpython 720.png|thumb|IT&I Python Open Source April 13th]]
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker(s)''':  
+
[https://github.com/tomkralidis Tom Kralidis]
Tim Crone is a marine geophysicist at Lamont-Doherty Earth Observatory studying spatial variations in the tidal triggering of microearthquakes within ridge systems, and problems in acoustics associated with high-temperature hydrothermal vents and seafloor seismic networks.  He has recently deployed the Pangeo framework on the Microsoft Azure Cloud.
 
  
'''Links''':
+
Free and Open Source Software in the Geospatial ecosystems (i.e. FOSS4G) play a key role in geospatial systems and services.  Python has become the lingua franca for scientific and geospatial software and tooling.  This rant and rave will provide an overview of the evolution of FOSS4G and Python, focusing on popular projects in support of Open Standards.
  
*https://pangeo-data.github.io
+
Slides: https://geopython.github.io/presentation
  
'''Recording'''
+
'''<u>Recording</u>''':<br />
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/_6eeymc4c7g" frameborder="0" allowfullscreen></iframe></html>
+
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/HTouLSzKGto" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
==10 May 2018: "NetCDF-CF Advances - Simple Geometries, Swaths, and Groups"==
+
'''<u>Minutes:</u>'''
  
'''Speakers'''
+
Mapserver has been around for 23 years!
Dave Blodgett (USGS), Tim Whiteaker (UT Austin), Aleksander Jelanek (HDF Group) and Daniel Lee (EUMETSAT)==
 
  
'''Summary'''
+
Why Python for Geospatial?
Simple geometry (points, lines, and polygons) has now been accepted as part of the Open Geospatial Consortium’s NetCDF-CF specification. This a major enhancement to a widely used standard whose utility has previously been limited to time-series of point or (raster) coverage data only. Advances on Groups and Swaths will also be presented.
+
Ubiquity
 +
Cross OS compatible
 +
Legible and easy to understand what it's doing
 +
Support ecosystem is strong (PyPI, etc.)
 +
Balance of performance and ease of implementation
 +
Python: fast enough, and fast in human time -- more intensive workloads can glue to C/C++
  
'''Links'''
+
The new generation of OGC services – based on JSON, so the API interoperates with client environments / objects at a much more direct level.
  
*https://github.com/cf-convention/cf-conventions/pull/115
+
The geopython ecosystem has a number of low level components that are used across multiple projects.
  
'''Recording'''
+
pygeoapi is an OGC API reference implementation and an OSGeo project.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/79e3GC74y_w" frameborder="0" allowfullscreen></iframe></html>
+
E.g. https://github.com/developmentseed/geojson-pydantic
  
 +
pygeoapi implements OGC API - Environmental Data Retrieval (EDR) https://ogcapi.ogc.org/edr/overview.html
  
==12 April, 2018: "Jetstream: A free national science and engineering cloud environment on XSEDE": Jeremy Fischer, Indiana University==
+
pygeoapi has a plugin architecture.
 +
https://pygeoapi.io/
 +
https://code.usgs.gov/wma/nhgf/pygeoapi-plugin-cookiecutter
  
'''Summary''': Jetstream, a national science and engineering cloud, adds cloud-based, on-demand computing and data analysis resources to the national XSEDE cyberinfrastructure. A description of Jetstream current and planned capabilities, and how to gain access, will be presented.  
+
pycsw is an OGC CSW and OGC API - Records implementation.
 +
Works with pygeometa for metadata creation and maintenance.
 +
https://geopython.github.io/pygeometa/
  
'''Speaker(s)''':
+
There's a real trade off to "the shiny object" vs the long term sustainability of an approach. Geopython has generally erred on the side of "does it work in a virtualenv out of the box".
Jeremy Fischer is a Senior Technical Advisor at Indiana University.  He works primarily on the Jetstream project, as the technical evangelist getting researchers and educators on the system. In this role, he is  the jack of all trades doing unix sys admin work, cloud image maintenance, support, training, documentation, and anything else that needs to happen.
 
  
'''Links''':
+
How does pycsw work with STAC and other catalog APIs?
 +
pycsw can convert between various representations of the same basic metadata resource.
  
*https://jetstream-cloud.org/about/index.php
+
"That's a pattern… People can implement things the way they want."
  
'''Recording'''
+
'''<u>Chat Highlights:</u>'''
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/-CGKHXvDVyM" frameborder="0" allowfullscreen></iframe></html>
 
  
==8 March, 2018: "Zarr: A simple, open, scalable solution for big NetCDF/HDF data on the Cloud": Alistair Miles, University of Oxford.==
+
*You can also write a C program that is slower than Python if you aren't careful =).
 +
*https://www.ogc.org/standards/ has lots of useful details
 +
* For anyone interested in geojson API development in Python, I just recently came across this https://github.com/developmentseed/geojson-pydantic
 +
*OGC API - Environmental Data Retrieval (EDR) https://ogcapi.ogc.org/edr/overview.html
 +
*Our team has a pygeoapi plugin cookiecutter that we are hopeful others can get some mileage out of. https://code.usgs.gov/wma/nhgf/pygeoapi-plugin-cookiecutter
 +
*I'm going to post this here and run: https://twitter.com/GdalOrg/status/1613589544737148944
 +
**''100% agreed. That's unfortunate, but PyPI is not designed to deal with binary wheels of beasts like me which depend of ~ 80 direct or indirect other native libraries. Best solution or least worst solution depending on each one's view is "conda install -c conda-forge gdal"''
 +
*General question here - you mentioned getting away from GDAL in a previous project. What are your thoughts on GDAL's role in geospatial python moving forward, and how will pygeoapi accommodate that?
 +
*Never, ever works with the wheels!
 +
*Kitware has some pre-compiled wheels as well: https://github.com/girder/large_image
 +
*In the pangeo.io project, our go to tools are geopandas for tabular geospatial data, xarray/rioxarray for n-dimensional array data, dask for parallelization, and holoviz for interactive visualization.  We use the conda-forge channel pretty much exclusively to build out environments
 +
*If you work on Windows, good luck getting the Python gdal/geos-based tools installed without Conda
 +
* data formats and standards are what make it difficult to get away from GDAL -- it just supports so many different backends!  Picking those apart and cutting legacy formats or developing more modular tools to deal with each of those things "natively" in python would be required to get away from the large dependency on something like GDAL.
 +
*Sustainability and maintainability is always good to ask yourself "how easy will it be to replace this dependency when it no longer works?"
 +
*No one should build gdal alone (unless it is winter and you need a source of heat). Join us at https://github.com/conda-forge/gdal-feedstock
  
'''Summary''': The motivation, current status and future plans for Zarr
 
will be discussed, along with a demo of basic functionality.
 
  
'''Time''': Thursday, March 8, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
==9 Mar 2023: "Meeting Data Where it Lives: the power of virtual access patterns"==
  
'''Join meeting''':
+
[https://github.com/mikejohnson51 Mike Johnson] (Lynker, NOAA-affiliate) will rant and rave about the VRT and VSI (curl and S3) virtual data access patterns and how he's used them to work with LCMAP and 3DEP data in integrated climate and data analysis workflows.
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
'''<u>Recording</u>''':<br />
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
+
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/auK_gPR-e7M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
'''Speaker(s)''':
+
'''<u>Minutes:</u>'''
Alistair Miles is the Head of Epidemiological Informatics
 
for the Kwiatkowski group at the University of Oxford. Before joining
 
the University of Oxford, Alistair was a research scientist at the
 
e-Science Centre at the UK Science and Technology Facilities Council,
 
where he was involved in a range of computing research projects,
 
primarily in the areas of Web and semantic technology, and also in the
 
engineering of production software systems.  He is the lead developer
 
for Zarr.
 
  
'''Links''':
+
*VRT stands for "ViRTual"
 +
*VSI stands for "Virtual System Interface"
 +
*Framed by FAIR
  
*https://github.com/zarr-developers/zarr
+
LCMAP – requires fairly complex URLs to access specific data elements.
*[https://www.youtube.com/watch?v=8WtaYvqhxHc 5 minute demo of Dask + Zarr + S3]
 
  
'''Recording'''
+
3DEP - need to understand tiling scheme to access data across domains.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/np_p4JBAIYI" frameborder="0" allowfullscreen></iframe></html>
 
  
 +
Note some large packages (zip files) where only one small file is actually desired.
  
==8 February, 2018:  "The National Data Service Labs Workbench": Craig Willis, NCSA==
+
NWM datasets in NetCDF files that change name (with time step) daily as they are archived.
  
'''Summary''': The National Data Service Labs Workbench is a platform designed to share, discover, evaluate, develop, and test research data management and analysis tools. Community members can recommend or contribute tools as well as drive the direction, and the Workbench is evolving into a platform for data access, education and training.
 
  
'''Time''': Thursday, February 8, 2018, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
Implications for Findability, Availability, and Reuse – note that interoperability is actually pretty good once you have the data.
  
'''Join meeting''':  
+
VRT: – an XML "metadata" wrapper around one or more tif files.
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
Use case 1: download all of 3DEP tiles and wrap in a VRT xml file.
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker(s)''':
+
*VRT has an overall aggregated grid "shape"
Craig Willis is the Technical Coordinator for the National Data Service and a senior research programmer at the National Center for Supercomputer Applications (NCSA) at the University of Illinois.
+
*Includes references to all the individual files.
 +
* Can access the dataset through the vrt wrapper to work across all the times.
 +
*Creates a seamless collection of subdatasets
 +
*Major improvement to accessibility.
 +
If you have to download the data is that "reuse" of the data??
  
'''Links''':
+
VSI: – allows virtualization of data from remote resources available as a few protocols (S3/http/compressed)
  
*http://www.nationaldataservice.org/
+
Wide variety of GDAL utilities to access VSI files – zip, tar, 7zip
*https://www.workbench.nationaldataservice.org
 
  
'''Recording'''
+
Use case 2: Access a tif file remotely without downloading all the data in the file.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/WzZKpbwth_g" frameborder="0" allowfullscreen></iframe></html>
 
  
 +
*Uses vsi to access a single tif file
  
==11 January 2018: "The Pangeo Project": Ryan Abernathy (Lamont) and Matthew Rocklin (Anaconda)==
+
Use case 3: Use vsi within a vrt to remotely access contents of remote tif files.
  
'''Summary''':  Pangeo is a scalable, low-barrier-for-entry science platform, with cloud-optimized storage for large multidimensional datasets, such as simulation (met, ocean, hydrologic, climate) model output.   Technologies include JupyterHub, Kubernetes, Xarray, Dask, and Zarr. The Pangeo environment has been deployed on the NCAR Cheyenne supercomputer, on Google Cloud and on AWS.  
+
* Note that the vrt file doesn't actually have to be local itself.
 +
*If the tiles that the vrt points to update, the vrt will update by default.
 +
*Can easily access and reuse data without actually copying it around.
  
'''Speaker(s)''':  
+
Use case 4: OGR using vsi to access a shapefile in a tar.gz file remotely.
Ryan Abernathy is a physical oceanographer at Lamont/Columbia, and Matthew Rocklin is a open-source developer for Anaconda.
 
  
'''Links''':
+
* Can create a nested url pattern to access contents of the tar.gz remotely.
  
*[https://github.com/pangeo-data/pangeo/issues pangeo discussion on github issues]
+
Use case 5: NWM shortrange forecast of streamflow in a netcdf file.
*[https://www.youtube.com/watch?v=rSOJKbfNBNk 3 minute demo of pangeo on Google Cloud]
 
*[http://matthewrocklin.com/blog/work/2018/01/22/pangeo-2 Blog post on pangeo]
 
*[http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud Blog post on big multidimensional data on the Cloud]
 
  
'''Recording'''
+
*Appending "HDF5:" to the front of a vsicurl url allows access to a netcdf file directly.
Note: This talk was given live at the ESIP winter meeting in North Bethesday, MD, USA.
+
*The access url pattern is SUPER tricky to get right.
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/mDrjGxaXQT4" frameborder="0" allowfullscreen></iframe></html>
 
  
==14 December  2017: "Mini-Hack-Session: Developing and extending Jupyter Widgets": Jason Grout, Bloomberg==
+
Use case 5: "flat catalogs"  
  
'''Summary''': [https://github.com/jupyter-widgets/ipywidgets Jupyter widgets] (aka ipywidgets) enables building interactive GUIs for Python code using standard form controls (sliders, dropdowns, textboxes, etc.), as well providing a framework for building complex interactive controls such as interactive [https://github.com/bloomberg/bqplot 2d graphs], [https://github.com/maartenbreddels/ipyvolume 3d graphics], [https://github.com/ellisonbg/ipyleaflet maps], and [http://jupyter.org/widgets more]. Jason will walk through the thought and technical processes involved with developing new widget capability.  
+
*Stores a flat (denormalized) table of data variables with the information required to construct URLs.
 +
*Can search based on rudimentary metadata within the catalog.
 +
* Can access and reuse data from any host in the same workflow.
  
'''Time''': Thursday, Dec 14, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
Use case 6: access NWM current and archived data from a variety of cloud data stores.
  
'''Join meeting''':
+
*Leveraging the flat catalog content to fix up urls and data access nuances.
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
Flat catalog improves findability down at the level of individual data variables.
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker(s)''':  
+
Take Aways / discussion:
Jason Grout is scientific software developer at Bloomberg. He has been a member of the Project Jupyter team since it's inception in 2014 and a core developer for the Jupyter widgets project. He has a PhD in mathematics from Brigham Young University.
 
  
'''Recording'''
+
Question about the flat catalog:
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/msJig1fr7Lw" frameborder="0" allowfullscreen></iframe></html>
 
  
 +
"Minimal set of shortcuts" to get at this fast access mechanism.
  
==9 November 2017:  "Jupyter Widgets": Jason Grout, Bloomberg==
+
Is the flat catalog manually curated?
  
'''Summary''': [https://github.com/jupyter-widgets/ipywidgets Jupyter widgets] (aka ipywidgets) enables building interactive GUIs for Python code using standard form controls (sliders, dropdowns, textboxes, etc.), as well providing a framework for building complex interactive controls such as interactive [https://github.com/bloomberg/bqplot 2d graphs], [https://github.com/maartenbreddels/ipyvolume 3d graphics], [https://github.com/ellisonbg/ipyleaflet maps], and [http://jupyter.org/widgets more]. The latest developments in Jupyter widgets will be discussed as well as plans for the future.
+
More or less – all are automated but some custom logic is required to add additional content.
  
'''Time''': Thursday, November 9, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
Would be great to systematize creation of this flat catalog more broadly.
  
'''Join meeting''':  
+
Question: Could some “examples” be posted either in this doc or elsewhere (or links to examples), for a beginner to copy/paste some code and see for themselves, begin to think about how we’d use this? Something super basic please.
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
GDAL documentation is good but doesn't have many examples.
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker(s)''':
+
climateR has a workflow that shows how the catalog was built.
Jason Grout is scientific software developer at Bloomberg. He has been a member of the Project Jupyter team since it's inception in 2014 and a core developer for the Jupyter widgets project. He has a PhD in mathematics from Brigham Young University.
 
  
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/CVcrTRQkTxo" frameborder="0" allowfullscreen></iframe></html>
 
  
 +
What about authentication issues?
  
==12 October 2017:  "Research Workspace: A web-based tool for data sharing, documentation, analysis, and publication"==
+
*S3 is handled at a session level.
 +
*Earthengine can be handled similarly.
 +
How much word of mouth or human-to-human interaction is required for the catalog.
  
'''Speakers'''
+
* If there is a stable entrypoint (S3 bucket for example) some automation is possible.
Rob Bochenek, Axiom Data Science
+
*If entrypoints change, configuration needs to be changed based on human intervention.
  
'''Summary''': The Research Workspace (RW) is a web-based tool designed to support collaborative science and data management tasks throughout the data lifecycle. The RW provides a secure environment for organizing, sharing, documenting, and analyzing scientific datasets, and for publishing datasets through a DataONE member node. As a shared, cloud-based storage environment, the RW is designed for collaborative organization and management of project content. Multiple levels of access and read/write permissions provide transparent, controlled access and oversight to collaborators, funders, and project managers. A custom metadata editor exports standards-compliant metadata (ISO 19115-2 and 19110) and includes tools for easily adding keywords, taxonomic information, keywords, spatial boundaries, and contact information to metadata records. An integrated Jupyter notebooks environment allows R- and Python-based analysis scripts to be written in and run on the RW and to access data in the RW or any data set resource that is hosted within the Axiom Data Science cyber-infrastructure stack.  These notebooks serve as transparent, reproducible, and easily-shareable computational analysis and processing tools. Finally, datasets in the RW that have undergone sufficient curation and documentation can be exported to the Research Workspace DataONE Member Node for long-term preservation and broader discoverability and accessibility.  More information about the RW and it's capabilities can be found in the help documents -  
+
== 9 Feb 2023: "February 2023 - Rants & Raves"==
  
'''Time''': Thursday, October 12, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
+
The conversation built on the "rants and raves" session from the 2023 January ESIP Meeting, starting with very short presentations and an in-depth discussion on interoperability and the Committee's next steps.
  
'''Join meeting''':  
+
'''<u>Recording</u>''':<br />
 +
<html>
 +
<iframe width="560" height="315" src="https://www.youtube.com/embed/cS7TrLmSu5U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 +
</html>
  
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
+
'''<u>Minutes:</u>'''
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
  
'''Speaker(s)''':  
+
*Mike Mahoney: Make Reproducibility Easy
Rob Bochenek is an information architect at Axiom Data Science. Rob has been developing data management and cyber infrastructure solutions for research programs and organizations for the past fifteen years. He is a graduate of the University of Michigan with a degrees in aerospace engineering and mathematics. Early in his career Rob spent five years at the Exxon Valdez Oil Spill Trustee Council leading the data management team in processing, documenting and organizing the informational products produced from the scientific research funded to understand and monitor the ecological effects of the oil spill. Based upon that experience, Rob founded Axiom in 2006 to develop more generalized and holistic solutions for data management. He specializes in scientific geospatial information management with applications to physical/biological modeling and decision support data warehouse knowledge systems.
+
*Dave Blodgett: FAIR data and Science Data Gateways
 +
*Doug Fils: Web architecture and Semantic Web
 +
*Megan Carter: Opening Doors for Collaboration
 +
*Yuhan (Douglas) Rao: Where are we for AI-ready data?
  
'''Links''':
+
I had a couple major take aways from the Winter Meeting:
  
*http://www.axiomdatascience.com/about/
+
*We have come a long way in IT interoperability but most of our tools are based on tried and true fundamentals. We should all know more about those fundamentals.
*https://researchworkspace.com/help/
+
*There are a TON of unique entry points to things that, at the end of the day, do more or less the same thing. These are opportunities to work together and share tools.
 +
*The “shiny object” is a great way to build enthusiasm and trigger ideas and we need to better capture that enthusiasm and grow some shared knowledge base.
  
'''Recording'''
+
So with that, I want to suggest three core activities:
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/YOQgAMo0pYc" frameborder="0" allowfullscreen></iframe></html>
 
 
 
==14 September 2017:  "JupyterHub and JupyterLab Developments": Brian Granger, Cal Poly==
 
 
 
'''Summary''': The latest developments in JupyterHub and JupyterLab will be discussed as well as the roadmap for the future.
 
 
 
'''Time''': Thursday, September 14, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
Brian Granger is an Associate Professor of Physics at Cal Poly State University in San Luis Obispo, CA. He has a background in theoretical atomic, molecular and optical physics, with a PhD from the University of Colorado. His current research interests include quantum computing, parallel and distributed computing and interactive computing environments for scientific and technical computing. He is a core developer of the Jupyter project and is an active contributor to a number of other open source projects focused on scientific computing in Python.
 
 
 
 
 
'''Links''':
 
 
 
*https://jupyterhub.readthedocs.io/en/latest/
 
*https://github.com/jupyterlab/jupyterlab
 
*https://jupyter.org
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/K1AsGeak51A" frameborder="0" allowfullscreen></iframe></html>
 
 
 
==31 August 2017:  "ERDDAP 5 min Lightning Talks"==
 
 
 
'''Speakers```'''
 
Jenn Sevadjian, Jim Potemra, Conor Delaney, Kevin O'Brien, John Kerfoot, Stephanie Petillo, Charles Carleton, Eli Hunter
 
 
 
'''Summary''': A series of 5 minute lightning talks on how people are using ERDDAP to solve environmental data problems.
 
 
 
'''Time''': Thursday, August 31, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''': Jenn Sevadjian, Jim Potemra, Conor Delaney, Kevin O'Brien, John Kerfoot, Stephanie Petillo, Charles Carleton, Eli Hunter
 
 
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/2-ydBByYB0M" frameborder="0" allowfullscreen></iframe></html>
 
 
 
 
 
==10 August 2017:  "ERDDAP: Easier access to scientific data": Bob Simons, NOAA==
 
 
 
'''Summary''': ERDDAP is a free, open source data server that gives you a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps.
 
 
 
'''Time''': Thursday, August 10, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
 
 
'''Speaker(s)''':
 
Bob Simons is an IT Specialist with NOAA's Environmental Research Division.
 
 
 
'''Links''':
 
 
 
*https://coastwatch.pfeg.noaa.gov/erddap/index.html
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/H541G1XXZrU" frameborder="0" allowfullscreen></iframe></html>
 
 
 
==13 July 2017:  "GeoServer Developments": Jody Garnett and Kevin Smith, Boundless==
 
 
 
'''Summary''': The latest developments in GeoServer will be discussed as well as plans for the future.
 
 
 
'''Time''': Thursday, July 13, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
Jody Garnett is the Community Lead and Kevin Smith is the GeoWebCache Lead at Boundless.
 
 
 
'''Links''':
 
 
 
*http://geoserver.org/
 
*https://boundlessgeo.com/geoserver/
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/q4g0RXnadAU" frameborder="0" allowfullscreen></iframe></html>
 
 
 
==6 June 2017:  "Installing JupyterHub in the Cloud using Kubernetes Helm": Yuvi Panda==
 
 
 
'''Summary''': Yuvi Panda will show how to deploy JupyterHub in the Cloud using Kubernetes Helm.
 
 
 
'''Time''': Tuesday, June 6, 2017, (Time: 3PM Eastern, 2PM Central, 1PM Mountain, 12PM Pacific)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
Yuvi Panda is a developer with 15 years of experience and 400+ followers on GitHub.  He worked formerly with Wikimedia, and is currently working with the Data Science Education Program at UC Berkeley to make it easier for people who don't consider
 
themselves programmers to write code. He has been very involved with creating the Helm Chart for JupyterHub.
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/aUwMlSIjtdg" frameborder="0" allowfullscreen></iframe></html>
 
 
 
 
 
 
 
'''Links''':
 
 
 
*https://github.com/yuvipanda
 
*https://jupyterhub.readthedocs.io/en/latest/
 
*https://gitter.im/jupyterhub/jupyterhub
 
*https://daemonza.github.io/2017/02/20/using-helm-to-deploy-to-kubernetes/
 
*https://github.com/kubernetes/helm
 
 
 
==11 May 2017:  "TerriaJS: A Free, Open-Source Library for Building Web-based Geospatial Data Explorers": Kevin Ring, CSIRO/Data61, Australia==
 
 
 
'''Summary''': The library behind the Australian National Map. 3D and 2D geospatial visualization based on Cesium and Leaflet. Visualise WMS, WMTS, WFS, KML, GeoJSON, CSV, CZML, GPX, and many more spatial formats out of the box, or easily add your own. Present a dynamic catalog from your existing WMS, ArcGIS, CKAN, CSW, Socrata, WMTS or WFS server, curate your catalog by hand, or use any combination thereof. Explore time-varying WMS layers, watch vehicles move smoothly across the map, and observe your CSV data change over time.
 
 
 
'''Time''': Thursday, May 11, 2017, (5:00pm ET | 4:00pm CT | 3:00pm MT | 2:00pm PT | 07:00am Sydney Time)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
 
 
Kevin Ring is a Principal Software Engineer at CSIRO's Data61, and is the lead developer for TerriaJS. Previously, he helped found the Cesium project while working at Analytical Graphics, Inc. (AGI) and developed its streaming terrain and imagery engine.
 
 
 
'''Links''':
 
 
 
*http://terria.io/
 
*https://github.com/TerriaJS/terriajs
 
*http://nationalmap.gov.au/
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/videoseries?list=PL8X9E6I5_i8gmLI7r6huyQNr0oC1mCadA" frameborder="0" allowfullscreen></iframe></html>
 
 
 
==13 April 2017:  "Processing Planetary-Scale Data in the Cloud": Drew Bollinger, Development Seed==
 
 
 
'''Summary''': Modern cloud-based infrastructure has had a huge effect on our ability to process, manipulate, and publish satellite imagery at scale. We'll discuss current methods of making imagery available across different platforms and how this is supported by the efforts of groups like AWS to publish open satellite data including MODIS, Landsat and more.
 
 
 
'''Time''': Thursday, April 13, 2017, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Presenter''':
 
 
 
Drew Bollinger is a data analyst and software developer, with experience running advanced statistical and spatial analysis on large and small data sets, as well as building visualizations for data storytelling.
 
 
 
'''Links''':
 
 
 
*https://github.com/sat-utils/sat-api
 
*https://github.com/sat-utils
 
*https://github.com/developmentseed/landsat-util
 
*https://libra.developmentseed.org/
 
 
 
'''Slides'''
 
http://drewbo.com/talks/esip-2017/#0
 
 
 
'''Recording'''
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/PO2z37XX1Gg" frameborder="0" allowfullscreen></iframe></html>
 
 
 
==9 March 2017:  "Introduction to Esri Story Maps": Christine White, Esri==
 
 
 
'''Summary''': Today, multi-media communication plays a pivotal role in how an audience experiences, understands, and shares your message. Story Maps bring a narrative to life by weaving maps,  text, images, video, and other content into a creative and memorable story. Christine will share several examples of effective Story Maps and then walk through how you can create and configure your own.
 
 
 
'''Time''': Thursday, March 9, 2017, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
 
 
Christine is a Technical Advisor and science team member at Esri. She loves using art and technology to communicate about the challenges and opportunities for our future. Christine also serves as the Vice President of ESIP. One of her favorite things about ESIP is how its members offer their unique perspectives (stories) and shared knowledge to collaborate.
 
 
 
'''Recording'''
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/wqQW2xVw0hA" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
Christine gave her presentation as a live StoryMap, available here:
 
https://www.arcgis.com/apps/MapSeries/index.html?appid=5a99a82a19c84dbab641a22ddd3d329b
 
 
 
==9 February 2017:  "Web AppBuilder for ArcGIS": Derek Law, ESRI==
 
 
 
'''Summary''': Web AppBuilder for ArcGIS is a pure HTML5/JavaScript-based application that allows you to create your own intuitive, fast, and beautiful web apps without writing a single line of code. The app uses new ArcGIS platform features and modern browser technology to provide both flexible and powerful capabilities such as 3D visualization of data. In addition, developers have an opportunity to create custom tools and themes through the extensibility framework.
 
 
 
'''Time''': Thursday, February 9, 2017, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
 
 
Derek Law is an Product Manager at ESRI.  He has over 15 years experience with geospatial software and web application development.
 
 
 
'''Recording'''
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/7uKbfMSX6Sw" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/esri-webapp-builder-derek-law-esri
 
 
 
==19 January 2017:  "Introduction to Google Earth Engine": Jess Walker, USGS==
 
 
 
'''Summary''': Google Earth Engine is a cloud-based geospatial processing platform that unites multiple petabytes of publicly accessible imagery and a massive computational infrastructure with a web-based integrated development environment (IDE).  Users can harness the unprecedented combination of data and computing resources to conduct complex geospatial analyses on planetary scales.
 
 
 
'''Time''': Thursday, January 19, 2017, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Speaker(s)''':
 
 
 
Jessica Walker is a postdoctoral researcher with the USGS Western Geographic Science Center in Tucson, AZ.  Her research investigates the recovery of post-wildfire landscapes in Alaska and across the southwestern US using time series of remote sensing imagery.
 
 
 
 
 
'''Recording'''
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/m47eHiOL0ZI" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/introduction-to-google-earth-engine-jessica-walker-usgs
 
 
 
==8 December 2016:  "Vector Tile Maps": Sam Matthews, Mapbox==
 
 
 
'''Summary''': Vector tiles make huge maps fast while offering full design flexibility. They are the vector data equivalent of image tiles for web mapping, applying the strengths of tiling – developed for caching, scaling and serving map imagery rapidly – to vector data.  A general overview of vector tiles will be presented.
 
 
 
'''Speaker(s)''':
 
 
 
Sam Matthews is a Mapbox engineer  focused on improving the speed and reliability of maps. He works with the Mapnik team to generate vector tiles and maintains the upload pipeline behind Mapbox Studio. He is passionate about making open source tools as welcoming as possible through clear docs and zero assumptions.
 
 
 
'''Time''': Thursday, December 8, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=wN2-ms2PwBs
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/wN2-ms2PwBs" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/vector-tile-maps-sam-matthews-mapbox
 
 
 
==10 November 2016:  "Introducing 3D Tiles": Todd Smith, AGI==
 
 
 
'''Summary''': 3D Tiles are an open specification for streaming massive heterogeneous 3D geospatial datasets. To expand on Cesium’s terrain and imagery streaming, 3D Tiles will be used to stream 3D content, including buildings, trees, point clouds, and vector data.
 
 
 
'''Speaker(s)''':
 
 
 
Todd Smith is the Cesium Product Manager, and helps define and manage the Cesium product line. Todd has been with the AGI team from the beginning and has been in the web mapping world for over 15 years.  He is a Penn State GIS graduate.
 
 
 
 
 
'''Time''': Thursday, November 10, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=0upb4E12CPE
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/0upb4E12CPE" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed
 
 
 
==13 October 2016:  "EarthCube Integration and Test Environment (ECITE)": Phil Yang, GMU==
 
 
 
'''Summary''': An outgrowth of activities of the EarthCube Technology Architecture Committee (TAC)'s Testbed Working Group (TWG), ECITE provides an integration test-bed for technology and science projects for both EarthCube funded projects and community technology demonstrations.  ECITE consists of a seamless federated system of scalable and location independent distributed computational resources (nodes) across the US. The hybrid federated system provides a robust set of distributed resources utilizing including both public and private cloud capabilities.
 
 
 
'''Speaker(s)''': Chaowei Phil Yang is a Professor at George Mason University where he founded the NSF Spatiotemporal Innovation Center with colleagues from Harvard and UC-Santa Barbara. He advised over 30 graduate students and has placed over 20 geoinformatics professors around the world.  His research interest are utilizing spatiotemporal principles to optimize computing infrastructure for geospatial science applications of national and international significance. (http://cpgis.gmu.edu/homepage/)
 
 
 
'''Time''': Thursday, October 13, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=kYi-22hXY6k
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/kYi-22hXY6k" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed
 
 
 
 
 
==8 September 2016:  "Apache Open Climate Workbench": Lewis McGibbney and Kyo Lee, NASA JPL/Apache OCW==
 
 
 
'''Summary''': Apache [http://climate.apache.org Open Climate Workbench] (OCW) is an effort to develop software that performs climate model evaluation using model outputs from a variety of different sources the [http://esgf.llnl.gov/ Earth System Grid Federation], the [http://www.cordex.org/ Coordinated Regional Climate Downscaling Experiment], the [http://nca2014.globalchange.gov/ U.S. National Climate Assessment] and the [http://www.narccap.ucar.edu/ North American Regional Climate Change Assessment Program] and temporal/spatial scales with remote sensing data from [http://www.nasa.gov NASA], [http://www.noaa.gov NOAA] and other agencies. The toolkit includes capabilities for rebinning, metrics computation and visualization.
 
 
 
'''Speaker(s)''': Lewis McGibbney, NASA JPL/Apache OCW; currently a Data Scientist at the NASA Jet Propulsion Laboratory in Pasadena, California, Lewis works in the Computer Science and Data Intensive Applications Group (398M). He enjoys floating up and down the tide of technologies at the Apache Software Foundation having a real enthusiasm for Web Search and Information Retrieval in particular. You'll find him on community mailing lists including Nutch, Gora, Any23, OODT, Open Climate Workbench, Tika, Usergrid and a number of incubating mailing lists including CommonsRDF, HTrace and Joshua. Lewis is currently a Project Management Committee member and Committer on OCW.
 
 
 
'''Speaker(s)''': Huikyo Lee, NASA JPL/Apache OCW; currently a Climate Data Scientist at the NASA Jet Propulsion Laboratory in Pasadena, California, Huikyo has lead development of Regional Climate Model Evaluation System (http://rcmes.jpl.nasa.gov), an open-source software toolkit based on Open Climate Workbench to facilitate systematic evaluation of climate models using observational datasets from a variety of sources.
 
 
 
'''Time''': Thursday, September 8, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=YA8SZiG9JZk
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/YA8SZiG9JZk" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/apache-ocw
 
 
 
==11 August 2016:  "Community Data Analysis Tools (CDAT)": Charles Doutriaux, LLNL==
 
 
 
'''Summary''': CDAT is a rich set of visual-data exploration and analysis capabilities well-suited for earth science data analysis problems. It integrates many tools and technology to offer scientist a start-to-finish environment for their work. From reading in various data format, to publication-quality output of their analysis.
 
 
 
'''Speaker''': Charles Doutriaux is a senior Lawrence Livermore National Laboratory research computer scientist, where he is known for his work in climate analytics, informatics, and management systems supporting model intercomparison projects. He works closely with many international climate scientists and shares in the recognition of the Intergovernmental Panel on Climate Change 2007 Nobel Peace Prize. He has co­-authored over 30 peer­-reviewed articles. He presented his work to many scientific conferences. Aside from everything Python-related, his research interests include climate attribution and detection, visualization, and data analysis. Doutriaux has a master's degree in "Climate and Physico-­Chemistry of the Atmosphere" from the University Joseph Fourier in Grenoble. He’s a member of the AGU and AMS. You can contact him at doutriaux1@llnl.gov.
 
 
 
'''Time''': Thursday, August 11, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=nh2dqAHt5jY
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/nh2dqAHt5jY" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
 
 
==13 July 2016:  "The NOAA OneStop Data Discovery and Access Framework Project": Ken Casey, NOAA/NCEI==
 
 
 
'''Summary''': The OneStop Project is designed to improve NOAA's data discovery and access framework.  Focusing on all layers of the framework and not just the user interface, OneStop is addressing data format and metadata best practices, ensuring more data are available through modern web services, working to improve the relevance of dataset searches, and improving both collection-level metadata management and granule level metadata systems to accommodate the wide variety and vast scale of NOAA's data. 
 
 
 
'''Speaker''': Ken Casey is the Deputy Director of the Data Stewardship Division in the NOAA National Centers for Environmental Information (NCEI).  He leads the OneStop project, is active within NOAA's Big Earth Data Initiative and Big Data Project.  Ken serves on a variety of national and international science and data management panels including the US Group on Earth Observations Data Management Working Group and the Group for High Resolution Sea Surface Temperature (GHRSST) Science Team.  He co-chairs the Committee on Earth Observing Satellites SST Virtual Constellation and represents NCEI in the Federation of Earth Science Information Partners (ESIP).  He holds a PhD in Physical Oceanography from the University of Rhode Island.
 
 
 
'''Time''': Wednesday, July 13, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Recording'''
 
https://youtu.be/wp7trIRFDOs
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/wp7trIRFDOs" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/noaa-one-stop-ken-casey-ncei
 
 
 
==9 June 2016:  "Dive into Docker":  Kyle Wilcox, Dave Foster and Shane StClair: Axiom Data Science==
 
 
 
'''Summary''': Docker is an open platform for distributed applications that has taken the world by storm, making it easy to deploy services with complicated dependencies.  In this presentation you will learn what Docker is, why it will make your life easier, how to build a container, and how to install containers.
 
 
 
'''Speaker''': Kyle Wilcox, Dave Foster and Shane StClair are developers at Axiom Data Science.  Axiom Data Science works with organizations to improve the long term management, reuse and impact of their scientific data resources.  They have built Docker containers for many of the key services used by the U.S. Integrated Ocean Observing System (US-IOOS).
 
 
 
'''Time''': June 9, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Links''':
 
 
 
*http://www.docker.com/
 
 
 
'''Recording'''
 
https://youtu.be/mDR_x0E5az0
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/mDR_x0E5az0" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/dive-into-docker-kyle-wilcox-shane-stclair-dave-foster-axiom-data-science
 
 
 
==12 May 2016:  "Leaflet Time Dimension":  Biel Frontera, SOCIB==
 
 
 
'''Summary''': Leaflet.TimeDimension is a free, open-source Leaflet.js plugin that enables visualization of spatial data with a temporal dimension. It can manage different types of layers (WMS, GeoJSON, Overlay) and it can be easily extended.  It meet some common needs, enabling web maps using observational and forecasting layers generated by a THREDDS server (via ncWMS), animating trajectories of drifters, gliders, follow a simulated oil spill, and other time dependent mapping applications. 
 
 
 
'''Speaker''': Biel Frontera was trained as a mathematician, and has spent most of his career developing software.  He is a free software enthusiast and has worked for the last 3 years on data visualization and geospatial software issues for SOCIB, the Baleric Islands Coastal Observing and Forecasting System.
 
 
 
'''Time''': May 12, 2016, (3:00pm ET | 2:00pm CT | 1:00pm MT | 12:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/533510693
 
*regular phone: United States: +1 (408) 650-3123, Access Code: 533-510-693
 
 
 
'''Links''':
 
 
 
*https://github.com/socib/Leaflet.TimeDimension
 
*http://apps.socib.es/Leaflet.TimeDimension/examples/
 
*http://www.socib.eu/
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=US5FUUPqlww
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/US5FUUPqlww" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/leatlet-time-dimension-biel-frontera-socib
 
 
 
==21 Apr 2016:  "The New Geoplatform.gov":  Tod Dabolt, DOI==
 
 
 
'''Summary''': Geoplatform.gov was recently rebuilt from the ground up. Tod will talk about new features of the platform and plans for the future.
 
 
 
'''Speaker''': Tod Dabolt is the acting Geographic Information Officer for the Department of Interior, and the technical lead on Geoplatform.gov.
 
 
 
'''Time''': April 21, 2016, (2:00pm ET | 1:00pm CT | 12:00pm MT | 11:00am PT)
 
 
 
'''Join meeting''':
 
 
 
*computer, tablet or smartphone: https://www.gotomeeting.com/join/271218861
 
*regular phone: United States: +1 (872) 240-3212, Access Code: 271-218-861
 
 
 
'''Links''':
 
 
 
*http://www.geoplatform.gov
 
 
 
'''Recording'''
 
https://www.youtube.com/watch?v=f-ABUpy4Qvk
 
 
 
<html><iframe width="560" height="315" src="https://www.youtube.com/embed/f-ABUpy4Qvk" frameborder="0" allowfullscreen></iframe></html>
 
 
 
'''Slides'''
 
https://speakerdeck.com/esipfed/the-new-geoplatform-tod-dabolt-doi
 
 
 
==13 Oct 2015: Raj Pandya on AGU's Thriving Earth Exchange and Sharing Solutions==
 
 
 
The Thriving Earth Exchange is a network and platform that connects community leaders, sponsors, and scientists and helps them combine science and local knowledge to solve on-the-ground challenges related to natural hazards, natural resources, and climate change.  I’ll talk about the general principles on which we are building TEX and describe the basic modules that are part of the TEX. Drawing on the lessons learned from our pilots, I'll talk about how we are developing modules and launching new projects with several partners. I’ll describe a range of projects – from a community monitoring effort in Denver to a Pamiri Mountain project to integrate climate projections into traditional calendars. I’ll introduce our nascent “share” module, and describe our partnership with Amazon Web Services to move prototype community-based solutions to the cloud to enhance their adaptability. And, just to live up to the name, I’ll  frame it all around a small rant about the loading-dock model of science and a rave about more participatory approaches.
 
 
 
'''Slides'''
 
 
 
[[Media:2015-10-13 ESIP RantRave RajPandya.pdf| PDF]]
 
 
 
==13 Aug 2015: Rich Signell on Catalog-driven Workflows for Science==
 
 
 
"Catalog-driven, reproducible workflows for ocean science: Comparing
 
sea level forecasts along the US Coastline"
 
 
 
Rich Signell
 
 
 
Filipe Fernandes
 
 
 
The USGS Integrated Ocean Observing System (US-IOOS) requires that
 
data providers use standard web services (OPeNDAP+CF, OGC WMS, OGC
 
SOS) for distributing model products and insitu observations.  The
 
services are captured in ISO metadata records and searchable via
 
standard catalog services (OGC CSW).
 
 
 
This presentation will demonstrate how to use this system in a
 
reproducible Jupyter Notebook, discovering, accessing and using model
 
and observed water levels along the US Coastline, using a free python
 
environment that can be installed on Mac, Windows and Linux in less
 
than 10 minutes.
 
 
 
'''Slides'''
 
 
 
[https://speakerdeck.com/rsignell/catalog-driven-reproducible-workflows-for-ocean-science Speaker Deck] | [[Media:2015-08-13 ESIP RantRave.pdf| PDF]]
 
 
 
==11 June 2015: [http://www.nationaldataservice.org/projects/labs.html NDS Labs], Matt Turk==
 
 
 
Matt is a member of the NDS Labs technical advisory committee and will present NDS Labs as a platform for exploring data services -- enabling the separation of data and its representation, and how NDS Labs is functioning as an emerging platform for such separation.
 
  
'''Slides'''
+
#We seek out presentations that explore foundational aspects of interoperability. I want to help build an awareness of the basics that we all kind of know but either take for granted, haven’t learned yet, or straight up forgot.
[[Media:2015-06-11 ESIP RantRave NDSLabs.pdf| PDF]]
+
#We ask for speakers to explore how a given solution fits into multiple domain’s information systems and to discuss the tension between the diversity of use cases that are accommodated by an IT solution targeted at interoperability. We are especially interested to learn about the expense / risk of adopting dependencies vs the efficiency that can be gained from adopting pre-built dependencies.
 +
#We look for opportunities to take small but meaningful steps to record the core aspects of these sessions in the form of web resources like the ESIP wiki or even Wikipedia. On this front, we will aim to construct a summary wiki page from each meeting assembled from a working notes document and the presenting authors contribution.
 +
__FORCETOC__

Latest revision as of 07:19, July 24, 2024

Past Tech Dive Webinars (2015-2022)

July 11th: Update on OGC GeoZarr Standards Working Group

Dr. Brianna Rita Pagán

Zarr is a cloud-native data format for n-dimensional arrays that enables access to data in compressed chunks of the original array. Zarr facilitates portability and interoperability on both object stores and hard disks.

As a generic data format, Zarr has increasingly become popular to use for geospatial purposes. As such, in June 2022, OGC endorsed Zarr V2.0 as an OGC Community Standard. The purpose of the GeoZarr SWG is to have an explicitly geospatial Zarr Standard (GeoZarr) adopted by OGC that establishes flexible and inclusive conventions for the Zarr cloud-native format that meet the diverse requirements of the geospatial domain. These conventions aim to provide a clear and standardized framework for organizing and describing data that ensures unambiguous representation.

IT&I July 2024

Recording:

June 13th: "Evaluation and recommendation of practices for publication of reproducible data and software releases in the USGS"

Alicia Rhoades, Dave Blodgett, Ellen Brown, Jesse Ross.

USGS Fundamental Science Practices recognize data and software as separate information product types. In practice, (e.g., in model application) data are rarely complete without workflow code and workflows are often treated as software that include data. This project assembled a cross mission area team to build an understanding of current practices and develop a recommended path. The project conducted 27 interviews with USGS employees with a wide range of staff roles from across the bureau. The project also analyzed existing data and software releases to establish an evidence base of current practices for implemented information products. The project team recommends that a workshop be held at the next Community for Data Integration face to face or other venue. The workshop should consider the sum total of the findings of this project and plan specific actions that the Community can take or recommendations that the Community can advocate to the Fundamental Science Practices Advisory Council or others.

IT&I June 2024

Recording:

May 9th: "Achieving FAIR water quality data exchange thanks to international OGC water standards"

(Sylvain Grellet - BRGM)

Leveraging on international standards (OGC, ISO), the OGC, WMO Water Quality Interoperabily Experiment aims at bridging the gap regarding Water Quality data exchange (surface, ground water). This presentation will also give a feedback on the methodology applied on this journey. How to build on existing international standards (OGC/ISO 19156 Observations, measurements and samples ; OGC SensorThings API) while answering domain needs and maximize community effect.

IT&I FAIR water quality data

Recording:

Slides

File:FAIR water quality data OGC Grellet-compressed.pdf

Minutes:

  • Emphasis on international water data standards.
  • Introduced OGC – international standards with contribution from public, private, and academic stakeholders.
  • Hydrology Domain Working Group around since circa 2007
    • This presentation is about the latest activity, the Water Quality Interoperability Experiment
  • Relying on a baseline of conceptual and implementation modeling from the Hydro Domain Working Group and more general community works like Observations Measurements and Samples.
  • Considering both in-situ (sample observations) and ex-situ (laboratory).
  • Core data models support everything the IE has needed with some key extensions, but the models are designed to support extensions.
  • In terms of FAIR access, Sensorthings is very capable for observational data and OGC-API Features support geospatial needs well.
  • Introduced a separation between "sensor" and "procedure" – sensor is the thing you used, procedure is the thing you do.

April 11th: "A Home for Earth Science Data Professionals - ESIP Communities of Practice"

(Allison Mills)

Earth Science Information Partners (ESIP) is a nonprofit funded by cooperative agreements with NASA, NOAA, and USGS. To empower the use and stewardship of Earth science data, we support twice-annual meetings, virtual collaborations, microfunding grants, graduate fellowships, and partnerships with 170+ organizations. Our model is built on an ever-evolving quilt of collaborative tools: Guest speaker Allison Mills will share insights on the behind-the-scenes IT structures that support our communities of practice.

IT&I ESIP Communities of Practice

Recording:

Minutes:

  • Going to to talk about the IT infrastructure behind the ESIP cyber presence.
  • Shared ESIP Vision and Mission – BIG goals!!
  • Played a video about what ESIP is as a community.
  • But how do we actually "build a community"?
  • Virtual collaborations need digital tools.
  • https://esipfed.org/collaborate
    • Needs a front door and a welcome mat!
    • "It doesn't matter how nice your doormat is if your porch is rotten."
    • Tools: Homepage, Slack, Update email, and people Directory.
    • "We take easy collaboration for granted."
  • https://esipfed.org/lab
    • Microfunding – build in time for learning objectives.
    • RFP system, github, figshare, people directory.
    • "Learning objectives are a key component of an ESIP lab project."
  • https://esipfed.org/meetings
    • Web site, agendas, eventbrite, QigoChat + Zoom, Google Docs.

Problem: our emails bounce! Needed to get in the weeds of DNS and "DMARC" policies.

Domain-based Message Authentication, Reporting, and Conformance (DMARC)

Problem: Twitter is now X

Decided to focus on platforms where engagement is higher.

Problem: Old wikimedia pages are way way outdated.

Focus on creating new web pages that replace, update and maintain community content.

Problem: "I can't use platform XYZ"

Try to go the extra mile to adapt so that these issues are overcome.

March 15th: "Creating operational decision ready data with remote sensing and machine learning."

(Brian Goldin)

IT&I Operational Remote Sensing 2024

As organizations grapple with information overload, timely and reliable insights remain elusive, particularly in disaster scenarios. Voyager's participation in the OGC Disaster Pilot 2023 aimed to address these challenges by streamlining data integration and discovery processes. Leveraging innovative data conditioning and enrichment techniques, alongside machine learning models, Voyager transformed raw data into actionable intelligence. Through operational pipelines, we linked diverse datasets with machine learning models, automating the generation of new observations to provide decision-makers with timely insights during critical moments. This presentation will explore Voyager's role in enhancing disaster response capabilities, showcasing how innovative integration of technology along with open standards can improve decision-making processes on a global scale.

Recording:

Minutes:

Providing insights from the OGC Disaster Pilot 2023

Goal with work is to provide timely and reliable insights based on huge volumes of data.

"Overcome information overload in critical moments"

Example: 2022 Callao Oil Spill in Peru

Tsunami hit an oil tanker transferring oil to land.

Possibly useful data from many remote sensing products but hard to combine them all together in the moment of responding to an oil spill. (slide shows dozens of data sources)

Goal: build a centralized and actionable inventory of data resources.

  1. Connect and read data,
  2. build pipelines to enrich data sources,
  3. populate a registry of data sources,
  4. construct processing framework that can operate over the registry,
  5. build user experience framework that can execute the framework.  

Focus is on an adaptable processing framework for model execution.

At this scale and for this purpose, it's critical to have a receipt of what was completed with basic results in a registry that is searchable. Allows model results to trigger notifications or be searched based on a record of model runs that have been run previously.

For the pilot: focused on wildfire, drought, oil spill, and climate.

"What indicators do decision makers need to make the best decisions?"

What remote sensing processing models can be run in operations to provide these indicators?

Fire Damage Assessment

Detected building footprints using a remote sensing building detection model.

Can run fire detection model in real time cross referenced with building footprints.

Need for stronger / more consistent "model metadata"

Need data governance/fitness for use metadata

Need better standards that provide linkages between systems.

Need better public private partnerships.

Need better data licensing and sharing framework.

"This is not rocket science, it's really just building a good metadata registry."

February 15th: "Creating Great Data Products in the Cloud"

(Jed Sundwall)

IT&I Cloud Data Products 2024

Competition within the public cloud sector has reliably led to reduction in object storage costs, continual improvement in performance, and a commodification of services that have made cloud-based object storage a viable solution to share almost any volume of data. Assuming that this is true, what are the best ways to create data products in a cloud environment? This presentation will include an overview of lessons learned from Radiant Earth as they’ve advocated for adoption of cloud-native geospatial data formats and best practices.

Recording:

Minutes:

Jed is executive director of Radiant Earth – Focus is on human cooperation on a global scale.

Two major initiatives – Cloud Native Geospatial foundation and Source Cooperative

Cloud native geospatial is about adoption of efficient approaches Source is about providing easy and accessible infrastructure

What does "Cloud Native" mean? https://guide.cloudnativegeo.org/ partial reads, parallel reads, easy access to metadata

Leveraging market pressure to make object stores cheaper and more scalable.

"Pace Layering" – https://jods.mitpress.mit.edu/pub/issue3-brand/release/2

Observation: Software is getting cheaper and cheaper to build – it gets harder to create software monopolies in the way Microsoft or ESRI have.

This leads to a lot of diversity and a proliferation of "primitive" standards and defacto interoperability arrangements.

Source Cooperative

Borrowed a lot from github architecturally.

Repository with a README

Browser of contents in the browser.

Within this, what makes a great data product?

"Our data model is the Web"

People will deal with messy data if it's super valuable.

Case in point, IRS 990 data on non-profits was shared in a TON of xml schemas. People came together to organize it and work with it.

Story about a building footprint data released in the morning – had been matched up into at least four products by the end of the day.

Shout out to: https://www.geoffmulgan.com/ and https://jscaseddon.co/

https://jscaseddon.co/2024/02/science-for-steering-vs-for-decision-making/

"We don't have institutions that are tasked with producing great data products and making them available to the world!"

https://radiant.earth/blog/2023/05/we-dont-talk-about-open-data/

Meme hackathons.png

"There's a server somewhere where there's some stuff" – This is very different from a local hard drive where everything is indexed.

A cloud native approach puts the index (metadata) up front in a way that you can figure out what you need.

A file's metadata gives you the information you need to ask for just the part of a file that you actually need.

But there are other files where you don't need to do range requests. Instead, the file is broken up into many many objects that are indexed.

In both cases, the metadata is a map to the content. Figuring out the right size of the content's bits is kind of an art form.

https://www.goodreads.com/en/book/show/172366

Q: > I was thinking of your example of Warren Buffett's daily spreadsheet (gedanken experiment)... How do you see data quality or data importance (incl. data provider trustworthiness) being effectively conveyed to users?

A: We want to focus on verification of who people are and relying on reputational considerations to establish importance.

Q: > I agree with you about the importance of social factors in how people make decisions. What do you think the implications are of this for metadata for open data on the cloud?

A: Tracking data's impact and use is an important thing to keep track of. Using metadata as concrete records of observations and how it has been used is where this becomes important.

Q: > What about the really important kernels of information that we use to, say calibrate remote sensing products, that are really small but super important? How do we make sure those don't get drowned? A: We need to be careful not to overemphasize "everything is open" if we can't keep really important datasets in the spotlight.

January 11th: "Using Earth Observations for Sustainable Development"

"Using Earth Observation Technologies when Assessing Environmental, Social, Policy and Technical factors to Support Sustainable Development in Developing Countries"

Sharif Islam

Earth Observation (EO) technologies, such as satellites and remote sensing, provide a comprehensive view of the Earth's surface, enabling real-time monitoring and data acquisition. Within the environmental domain, EO facilitates tracking land use changes, deforestation, and biodiversity,   thereby   supporting   evidence-based   conservation   efforts.   Social   factors, encompassing population dynamics and urbanization trends, can be analyzed to inform inclusive and resilient development strategies. EO also assumes a crucial role in policy formulation by furnishing accurate and up-to-date information on environmental conditions, thereby supporting informed decision-making. Furthermore, technical aspects, like infrastructure development and resource management, benefit from EO's ability to provide detailed insights into terrain characteristics and natural resource distribution. The integration of Earth Observation across these domains yields a comprehensive understanding of the intricate interplay between environmental, social, policy, and technical factors, fostering a more sustainable and informed approach to development initiatives. In this presentation, I will discuss our lab's work in Bangladesh, Angola, and other countries, covering topics such as coastal erosion, drought, and air pollution.

Recording:

Minutes:

Plan to share data from NASA and USGS that was used in his PHD work.

Applied the EVDT Environment, Vulnerability, Decision Technology Framework.

Studied a variety of hazards – coastal erosion, air pollution, drought, deforestation, etc.

Coastal Erosion in Bangladesh:

  • Displacement, loss of land, major economic drain
  • Studied the situation in the Bay of Bengal
  • Used LANDSAT to study coastal erosion from the 80s to the present
  • Coastal erosion rates upwards of 300m/yr!
  • Combined survey data and landsat observations

Air Pollution and mortality in South Asia

  • Able to show change in air pollution over time using remote sensing

Drought in Angola and Brazil

Used SMAP (Soil Moisture Active Passive)

Developed the same index as the US Drought Monitor

Able to apply SMAP observations over time

Applied a social vulnerability model using these data to identify vulnerable populations.

Deforestation in Ghana

Used LANDSAT to identify land converted from forest to mining and urban.

Significant amounts of land to mining (gold mining and others)

Water hyacinth in a major fishery lake in Benin.

Impact on fishery and transportation

Rotting hyacinth is a big issue

Helped develop a DSS to guide management practices

Mangrove loss in Brazil

Combined information from economic impacts, urban plans, and remote sensing to help build a decision support tool.

November 9th: "Persistent Unique Well Identifiers: Why does California need well IDs?"

IT&I CA Wells November 2023

Hannah Ake

Groundwater is a critical resource for farms, urban and rural communities, and ecosystems in California, supplying approximately 40 percent of California's total water supply in average water years, and in some regions of the state, up to 60 percent in dry years. Regardless of water year type – some communities rely entirely on groundwater for drinking water supplies year-round. However, California lacks a uniform well identification system, which has real impacts on those who manage and depend upon groundwater. Clearly identifying wells, both existing and newly constructed, is vital to maintaining a statewide well inventory that can be more easily monitored to ensure the wellbeing of people, the environment, and the economy, while supporting the sustainable use of groundwater. A uniform well ID program has not yet been accomplished at a scale like California, but it is achievable, as evidenced by great successes in other states. Learn more about why a well ID program will be so important to tackle in California and offer your thoughts about how to untangle some of the particularly thorny technical challenges.

Recording:

Minutes:

  • Groundwater is 40-60% of California's Water supply
  • ~2 Million groundwater wells!
  • As many as 15k new wells are constructed each year

Sustainable groundwater management act frames groundwater sustainability agencies that develop groundwater sustainability plans

There is a need to account for groundwater use to ensure the plans are achieved.

Problem: There is no dedicated funding (or central coordinator) to create and maintain a statewide well inventory.

  • Department of Water Resources develops standards
  • State Water Resources Control Board has statewide ordinance
  • Cities and local districts adopt local ordinance
  • Local enforcement agency administers and enforces ordinance

There are a lot of IDs in use. 5 different identifiers can be used for the same well.

Solution: Create a well inventory that is statewide but is a compound (single id that stands in for many others) id from multiple id systems. – A meaningless identifier that links multiple others to each other.

There are a number of states with well id programs.

  • Trying to learn from what other states have done.

Going forward with some kind of identifier system that spans all local and federal identifier systems.

  • Q: Will this include federal wells? – Yes!
  • Q: Will this actually be a new well identifier minted by someone? – Yes.
  • Q: If someone drills a well do they have to register it? – Yes, but it's the local enforcing agency that collects the information.
  • Q: What if a well is deepened? Do we update the ID? – This has caused real problems in the past. We end up with multiple IDs for the same hole that go through time.
    • Seems to make sense to make a new one to keep things simple.

Link mentioned early in the talk:

https://groundwateraccounting.org/

Reference during Q&A

https://docs.ogc.org/per/20-067.html#_cerdi_vvg_selfie_demonstration

October 26th: "Improving standards and documentation publishing methods: Why can’t we cross the finish line?"

IT&I OGC October 2023

Scott Simmons

OGC and the rest of the Standards community have been promising for YEARS that our Standards and supporting documentation will be more friendly to the users that need this material the most. Progress has been made on many fronts, but why are we still not finished with a promise made in 2015 that all OGC Standards will be available in implementer-friendly views first, ugly virtual printed paper second?This topic bugs me as much as it bugs our whole community. Some of the problems are institutional (often from our Government members across the globe), others are due to lack of resources, but I think that most are due to a lack of clear reward to motivate people to do things differently.Major progress is being made in some areas. The OGC APIs have landing pages that include focused and relevant content for users/implementers and it takes some effort to find the owning Standard. OGC Developer Resources are growing quickly with sample code, running examples, and multiple views of API resources in OpenAPI, Swagger, and ReDoc.

Slides


Recording:

Minutes:

(missing first ~15 minutes of recording -- apologies)

Circa 2015 OGC GeoRabble

  • Took a critical look at the status of publishing standards.
  • Couldn't we format these specs in a kind of tutorial form?

9 years later

  • What makes this hard?
    • Standards must be unambiguous AND procurable.
    • The modular specification is a model for this balance.

Standards are based around testable requirements that relate to conformance classes.

Swaggerhub and ReDoc as a way to show a richer collection of information for multiple users.

Specification are much more modular (core and extensions)

Developer website: https://developer.ogc.org/

Going to be including persistent  demonstrator (example implementations) that are "in the wild".

https://www.ogc.org/initiatives/open-science/

Moving to an "OGC Building Blocks" model that are registered across multiple platforms and linked to lots of examples.

Building blocks are richly described and nuanced but linked back to specific requirements in a specification.

https://blocks.ogc.org/

https://sn80uo0zmbg.typeform.com/to/gcwDDNB6?typeform-source=blocks.ogc.org

A lot of this focused on APIs – what about data models?

  • Worked on APIs first because it was current. Also thinking about how to apply similar concepts to data models.

September 14th: "Water data standardization: Navigating the AntiCommons"

IT&I IoW September 2023

Kyle Onda

We all know interoperability rests on data standards and API standards. Many open standards are less prominent in the open water data space than proprietary solutions. This is because proprietary data management solutions are often bundled with very easy to use implementing software and more importantly—client software that address basic use cases. We’re giving people blueprints when they need houses. Community standards making processes should invest in end-user tools if they want to gain traction. The good news is that some of the newest generation of standards is much easier to develop around which has led to some reference implementations that are much easier to create end-user tools around than previously.

Recording:

Minutes:

AntiCommons – name comes from social science background

Tragedy of the commons - two solutions, enclose (privatize) or regulate

Tragedy of the anticommons - as opposed to common resources, these are resources that don't get used up – as in open data. Inefficiency and under utilization is common.

Two solutions. expropriation (like imminent domain or public data), incentivize

Example – consolidate urban sprawl into higher density housing to get more open space and room for business.

Introducing the Internet of Water.

Noting that in PNW, there are >800 USGS stream gages and >400 from other organizations. Only USGS are very broadly known about.

Thinking about open data as an anticommons – environmental data is normally publically available but only in ways that are convenient to data providers and the software that they use.

Discussion of the variety of standardized vs bespoke modes of data dissemination.

Example of Nebraska – GUI with download and separate custom API USGS has the same basic scheme where an ETL goes from data management software to a custom web service system.

What's going on here? Limited resources lead to focus on existing users and needs and administration ease.

Tools that meet this need tend to not focus on the needs of new user and standardization.

Most organizations don't need standards – they need software. Both server and CLIENT software.

New specs and efforts ARE heading in this direction. OGC-API, SensorThings, etc.

Promising developments around proxying non standard APIs and in use of structured data "decoration" to make documentation more standard.

August 10th: "Learning to love the upside down: Quarto and the two data science worlds"

IT&I Quarto August 10th

Carlos Scheidegger

There are two wonderful data science worlds. You can be a jupyter expert: you work on jupyter notebooks, with access to myriad Julia, Python, and R packages, and excellent technical documentation systems. You can also be a knitr and rmarkdown expert: you work on rmarkdown notebooks, with access to myriad Julia, Python, and R packages, and excellent technical documentation systems.

But what if your colleague works on the wrong side of the fence? What if you spent years learning one of them, only to find that the job you love is in an organization that uses the other? In this talk, I’m going to tell you about quarto, a system for technical communication (software documentation, academic papers, websites, etc) that aspires to let you choose any of these worlds.

If you’re one to worry about Conway’s law and what this two-worlds situation does to an organization’s talent pool, or if you live in one side of the world and want to be able to collaborate with folks on the other side, I think you’ll find something of value in what I have to say.

I’m also going to complain about software, mostly the one I write. Mostly.

Slides: https://cscheid.net/static/2023-esip-quarto-talk/

Recording:

Minutes:

Carlos was in a tenure computer science position at University of Arizona.

Hating bad software makes a software developer a good developer.

Two data science worlds:

tidyverse (with R and markdown)

  • Cohesive, hard to run things out of order.
  • Doesn't store output.

Jupyter (python and notebooks)

  • Notebook saves intermediate outputs.
  • State can be messed up easily – cells aren't linear steps.

Quarto:

  • Acts as a compatibility layer for tidyverse and jupyter ecosystems.
  • Emulates RMarkdown with multi language support.

Rant:

Quarto gets you a webpage and PDF output.

– note that the PDF requirement is not great.

Quarto is kind of just a huge wrapper around pandoc.

Quarto documentation is intractably hard to build out.

Consider Conways Law – that an organization that creates a large system will create a system that is a copy of the organization's communication structure.

– Quarto is meant to allow whole organizations with different technical tools exist in the same communication structure (same system).

Quarto tries to make kinda hard things easy while not making really hard things impossible.

Quarto can convert jupyter notebooks (with cached outputs) into markdown and vice versa.

Issue is, you need to know a variety of other languages (YAML, CSS, Javascript, LaTeX, etc.)

– "unavoidable but kinda gross"

You can edit Quarto in RStudio or VS Code, or any text editor.

For collaboration, Quarto projects can use jupyter or knitr engines. E.g. in a single web page, you can build one page with jupyter and another page with knitr.

– you can embed a ipynb cell in a notebook.

Orchestrating computation is hard – quarto has to take input from existing computation – which can be awkward / complex.

Quarto is extensible – CSS themes, OJS for interactive webpages, Pandoc extensions.

Can also write your own shortcodes.

July 13th 2023: "Tools to Assist Simulation Based Researchers in Deciding What Project Outputs to Preserve and Share"

IT&I EarthCube Model RCN July 13th

Doug Schuster

This presentation will highlight findings from the NSF EarthCube Research Coordination Network project titled “What About Model Data? - Best Practices for Preservation and Replicability” (https://modeldatarcn.github.io/), which suggest that most simulation based research projects only need to preserve and share selected model outputs, along with the full simulation experiment workflow to communicate knowledge. Challenges related to meeting community open science expectations will also be highlighted.

Slides available here: File:ModelDataRCN-2023-07-13-ESIP-IT&I v2.pdf

https://modeldatarcn.github.io/

Rubric: https://modeldatarcn.github.io/rubrics-worksheets/Descriptor-classifications-worksheet-v2.0.pdf

Open science expectations for simulation based research. Frontiers in Climate, 2021. https://doi.org/10.3389/fclim.2021.763420

Recording:

Minutes:

Primary motivation: What are data management requirements for simulation projects?

Project ran May 2020 to Jul 2022

We clearly shouldn't preserve ALL data / output from projects. It's just too expensive.

Project broke down components of data associated with a different project Forcings, code/documentation, selected outputs.

But what outputs to share?!?

Project developed a rubric of what to preserve / share.

"Is your project a data production project or a knowledge production project"

"How hard is it to rerun your workflow?"

"How much will it cost to store and serve the data?"

Rubric gives guidance on how much of a project's outputs should be preserved.

So this is all well and good, but it falls onto PIs and funding agencies.

What are the ethical and professional considerations of these trade offs?

What are the incentives in place currently? Sharing is not necessarily seen as a benefit to the author.

June 8 2023: "Reproducible Data Pipelines in Modern Data Science: what they are, how to use them, and examples you can use!"

IT&I Reproducible Pipelines June 8th

Julie Padilla

Modern scientific workflows face common challenges including accommodating growing volumes and complexity of data and the need to update analyses as new data becomes available or project needs change. The use of better practices around reproducible workflows and the use of automated data analysis pipelines can help overcome these challenges and more efficiently translate open data to actionable scientific insights. These data pipelines are transparent, reproducible, and robust to changes in the data or analysis, and therefore promote efficient, open science. In this presentation, participants will learn what makes a reproducible data pipeline and what differentiates it from a workflow as well as the key organizational concepts for effective pipeline development.

Recording:

Minutes:

Motivation – what if we find bad data in an input, what if we need to rerun something with new data, can we reproduce findings from previous work?

Need to be able to "trace" what we did and the way we do it needs to be reliable.

A "workflow" is a sequence of steps going from start to finish of some activity or process.

A "pipeline" is a programmatic implementation of a workflow that requires little to no interaction.

In a pipeline, if one workflow step or input gets changed, we can track what is "downstream" of it.

Note that different steps of the workflow may be influenced by different people. So a given step of a pipeline could be contributed by different programmers. But each person would be contributing a component of a consistent pipeline.

There is a difference between writing scripts to building a reproducible pipeline. Better to break it into steps. Script -> organize -> encapsulate into functions -> assemble pipeline.

Focus is on R targets – snakemake is equivalent in python.

Key concepts for going from script to workflow: Functions stored separate from workflow script. Steps clearly organized in script. Can wrap steps in pipeline steps to track them.

Pipeline software keeps track of whether things have changed and what needs to be rerun. Allows visualization of the workflow inputs, functions, and steps.

How do steps of the pipeline get related to eachother? They are named and the target names get passed to downstream targets.

Chat questions about branching. Dynamic branching lets you run the same target for a list of inputs in a map/reduce pattern.

Pipelines can have outputs that are reports that render pipeline results in a nice form.

Pipeline templates: A pipeline can adopt from a standard template that is pre-determined. Helps enforce best practices and have a quick and easy starting point.

Note that USGS data science has a template for a common pattern.

What's a best practice for tracking container function and reproducibility? Versioned Git / Docker for code and environment. For data, it is context dependent. Generally, try to pull down citeable / persistent sources. If sources are not persistent, you can cache inputs for later reuse / reproducibility.

Data change detection / cacheing is a really tricky thing but many people are working on the problem. https://cboettig.github.io/contentid/, https://dvc.org/

https://learning.nceas.ucsb.edu/2021-11-delta/session-3-programmatic-metadata-and-data-access.html#reproducible-data-access


11 May 2023: "Software Procurement Has Failed Us Completely, But No More!"

IT&I Software Procurement May 11th

Waldo Jaquith

The way we buy custom software is terrible for everybody involved, and has become a major obstacle to agencies achieving their missions. There are solutions, if we would just use them! By combining the standard practices of user research, Agile software development, open source, modular procurement, and time & materials contracts, we can make procurement once again serve the needs of government.

Slides available here: File:2023-05-Jaquith.pdf

Recording:

Minutes:

Recognizing that software procurement is one of the primary ways that Software / IT systems advance, Waldo went into trying to understand that space as a software developer.

Healthcare.gov

Contract given to CGI Federal for $93M – cost ~1.7B by launch.
Low single digits numbers of people actually made it through the system.
Senior leaders were given the impression that things were in good shape.
The developers working on the site knew it wasn't going to work out (per IG report).

  • strategic misrepresentation – things are represented as more rosy as you go up the chain of command

On launch, things went very badly and the recovery was actually quite quick and positive.

Waldo recommends reading the IG report on healthcare.gov. This article:
("The OIG Report Analyzing Healthcare.gov's Launch: What's There And What's Not", Health Affairs Blog, February 24, 2016. https://dx.doi.org/10.1377/hblog20160224.053370) provides a path to the IG report:
(HealthCare.gov - CMS Management of the Federal Marketplace: An OIG Case Study (OEI-06-14-00350), https://oig.hhs.gov/oei/reports/oei-06-14-00350.pdf) and additional perspective.

Rhode Island Unified Health Infrastructure

($364M to DeLoitte) "Big Bang" deployment – they let people running old systems go on the day of the new system launch. They "outsourced" a mission critical function to a contractor.

We don't tend to hear about relatively smaller projects because they are less likely to fail and garner less attention.

Outsourcing as started in the ~90s was one thing when the outsourcing was for internal agency software. It's different when the systems are actually public interfaces to stakeholders or are otherwise mission critical.

File:2023-05-Jaquith.pdf

It's common for software to meet contract requirements but NOT meet end user needs.

Requirements complexity is fractal. There is no complete / comprehensive set of requirements.

… federal contractors interpreting requirements as children trying to resist getting out the door ...

There is little to no potential to update or improve requirements due to contract structure.

Demos not memos!

Memorable statements:

  • Outsourced ability to accomplish agency’s mission
  • Load bearing software systems on which the agency depends to complete their mission.
  • Mission of many agencies is mediated by technology.

But no more! – approach developed by 18F

System of six parts –

1. User-centered design
2. Agile software development
3. Product ownership
4. DevOps
5. Building out of loosely coupled parts
6. Modular contracting

Roles for government and vendors in agile contracting

"You don't know what people need till you talk to them."

Basic premise of agile is good. Focus is on finished software being developed every two weeks.

Constantly delivering a usable product... e.g. A skateboard is more usable than a car part.

Key roles for government staff around operations are too often overlooked.

Product team needs to include an Agency Product Owner. Allows government representation in software development iteration.

Build out of loosely coupled / interchangeable components. Allows you to do smaller things and form big coherent systems that can evolve.

Modular contracts allow big projects that are delivered through many small task orders or contracts. The contract document is kind of a fill in the blank template and doesn't have to be hard.

Westrum typology of cultures article is relevant is relevant: http://dx.doi.org/10.1136/qshc.2003.009522

13 April 2023: "Evolution of open source geospatial python."

IT&I Python Open Source April 13th

Tom Kralidis

Free and Open Source Software in the Geospatial ecosystems (i.e. FOSS4G) play a key role in geospatial systems and services. Python has become the lingua franca for scientific and geospatial software and tooling. This rant and rave will provide an overview of the evolution of FOSS4G and Python, focusing on popular projects in support of Open Standards.

Slides: https://geopython.github.io/presentation

Recording:

Minutes:

Mapserver has been around for 23 years!

Why Python for Geospatial? Ubiquity Cross OS compatible Legible and easy to understand what it's doing Support ecosystem is strong (PyPI, etc.) Balance of performance and ease of implementation Python: fast enough, and fast in human time -- more intensive workloads can glue to C/C++

The new generation of OGC services – based on JSON, so the API interoperates with client environments / objects at a much more direct level.

The geopython ecosystem has a number of low level components that are used across multiple projects.

pygeoapi is an OGC API reference implementation and an OSGeo project. E.g. https://github.com/developmentseed/geojson-pydantic

pygeoapi implements OGC API - Environmental Data Retrieval (EDR) https://ogcapi.ogc.org/edr/overview.html

pygeoapi has a plugin architecture. https://pygeoapi.io/ https://code.usgs.gov/wma/nhgf/pygeoapi-plugin-cookiecutter

pycsw is an OGC CSW and OGC API - Records implementation. Works with pygeometa for metadata creation and maintenance. https://geopython.github.io/pygeometa/

There's a real trade off to "the shiny object" vs the long term sustainability of an approach. Geopython has generally erred on the side of "does it work in a virtualenv out of the box".

How does pycsw work with STAC and other catalog APIs? pycsw can convert between various representations of the same basic metadata resource.

"That's a pattern… People can implement things the way they want."

Chat Highlights:

  • You can also write a C program that is slower than Python if you aren't careful =).
  • https://www.ogc.org/standards/ has lots of useful details
  • For anyone interested in geojson API development in Python, I just recently came across this https://github.com/developmentseed/geojson-pydantic
  • OGC API - Environmental Data Retrieval (EDR) https://ogcapi.ogc.org/edr/overview.html
  • Our team has a pygeoapi plugin cookiecutter that we are hopeful others can get some mileage out of. https://code.usgs.gov/wma/nhgf/pygeoapi-plugin-cookiecutter
  • I'm going to post this here and run: https://twitter.com/GdalOrg/status/1613589544737148944
    • 100% agreed. That's unfortunate, but PyPI is not designed to deal with binary wheels of beasts like me which depend of ~ 80 direct or indirect other native libraries. Best solution or least worst solution depending on each one's view is "conda install -c conda-forge gdal"
  • General question here - you mentioned getting away from GDAL in a previous project. What are your thoughts on GDAL's role in geospatial python moving forward, and how will pygeoapi accommodate that?
  • Never, ever works with the wheels!
  • Kitware has some pre-compiled wheels as well: https://github.com/girder/large_image
  • In the pangeo.io project, our go to tools are geopandas for tabular geospatial data, xarray/rioxarray for n-dimensional array data, dask for parallelization, and holoviz for interactive visualization. We use the conda-forge channel pretty much exclusively to build out environments
  • If you work on Windows, good luck getting the Python gdal/geos-based tools installed without Conda
  • data formats and standards are what make it difficult to get away from GDAL -- it just supports so many different backends! Picking those apart and cutting legacy formats or developing more modular tools to deal with each of those things "natively" in python would be required to get away from the large dependency on something like GDAL.
  • Sustainability and maintainability is always good to ask yourself "how easy will it be to replace this dependency when it no longer works?"
  • No one should build gdal alone (unless it is winter and you need a source of heat). Join us at https://github.com/conda-forge/gdal-feedstock


9 Mar 2023: "Meeting Data Where it Lives: the power of virtual access patterns"

Mike Johnson (Lynker, NOAA-affiliate) will rant and rave about the VRT and VSI (curl and S3) virtual data access patterns and how he's used them to work with LCMAP and 3DEP data in integrated climate and data analysis workflows.

Recording:

Minutes:

  • VRT stands for "ViRTual"
  • VSI stands for "Virtual System Interface"
  • Framed by FAIR

LCMAP – requires fairly complex URLs to access specific data elements.

3DEP - need to understand tiling scheme to access data across domains.

Note some large packages (zip files) where only one small file is actually desired.

NWM datasets in NetCDF files that change name (with time step) daily as they are archived.


Implications for Findability, Availability, and Reuse – note that interoperability is actually pretty good once you have the data.

VRT: – an XML "metadata" wrapper around one or more tif files.

Use case 1: download all of 3DEP tiles and wrap in a VRT xml file.

  • VRT has an overall aggregated grid "shape"
  • Includes references to all the individual files.
  • Can access the dataset through the vrt wrapper to work across all the times.
  • Creates a seamless collection of subdatasets
  • Major improvement to accessibility.

If you have to download the data is that "reuse" of the data??

VSI: – allows virtualization of data from remote resources available as a few protocols (S3/http/compressed)

Wide variety of GDAL utilities to access VSI files – zip, tar, 7zip

Use case 2: Access a tif file remotely without downloading all the data in the file.

  • Uses vsi to access a single tif file

Use case 3: Use vsi within a vrt to remotely access contents of remote tif files.

  • Note that the vrt file doesn't actually have to be local itself.
  • If the tiles that the vrt points to update, the vrt will update by default.
  • Can easily access and reuse data without actually copying it around.

Use case 4: OGR using vsi to access a shapefile in a tar.gz file remotely.

  • Can create a nested url pattern to access contents of the tar.gz remotely.

Use case 5: NWM shortrange forecast of streamflow in a netcdf file.

  • Appending "HDF5:" to the front of a vsicurl url allows access to a netcdf file directly.
  • The access url pattern is SUPER tricky to get right.

Use case 5: "flat catalogs"

  • Stores a flat (denormalized) table of data variables with the information required to construct URLs.
  • Can search based on rudimentary metadata within the catalog.
  • Can access and reuse data from any host in the same workflow.

Use case 6: access NWM current and archived data from a variety of cloud data stores.

  • Leveraging the flat catalog content to fix up urls and data access nuances.

Flat catalog improves findability down at the level of individual data variables.

Take Aways / discussion:

Question about the flat catalog:

"Minimal set of shortcuts" to get at this fast access mechanism.

Is the flat catalog manually curated?

More or less – all are automated but some custom logic is required to add additional content.

Would be great to systematize creation of this flat catalog more broadly.

Question: Could some “examples” be posted either in this doc or elsewhere (or links to examples), for a beginner to copy/paste some code and see for themselves, begin to think about how we’d use this? Something super basic please.

GDAL documentation is good but doesn't have many examples.

climateR has a workflow that shows how the catalog was built.


What about authentication issues?

  • S3 is handled at a session level.
  • Earthengine can be handled similarly.

How much word of mouth or human-to-human interaction is required for the catalog.

  • If there is a stable entrypoint (S3 bucket for example) some automation is possible.
  • If entrypoints change, configuration needs to be changed based on human intervention.

9 Feb 2023: "February 2023 - Rants & Raves"

The conversation built on the "rants and raves" session from the 2023 January ESIP Meeting, starting with very short presentations and an in-depth discussion on interoperability and the Committee's next steps.

Recording:

Minutes:

  • Mike Mahoney: Make Reproducibility Easy
  • Dave Blodgett: FAIR data and Science Data Gateways
  • Doug Fils: Web architecture and Semantic Web
  • Megan Carter: Opening Doors for Collaboration
  • Yuhan (Douglas) Rao: Where are we for AI-ready data?

I had a couple major take aways from the Winter Meeting:

  • We have come a long way in IT interoperability but most of our tools are based on tried and true fundamentals. We should all know more about those fundamentals.
  • There are a TON of unique entry points to things that, at the end of the day, do more or less the same thing. These are opportunities to work together and share tools.
  • The “shiny object” is a great way to build enthusiasm and trigger ideas and we need to better capture that enthusiasm and grow some shared knowledge base.

So with that, I want to suggest three core activities:

  1. We seek out presentations that explore foundational aspects of interoperability. I want to help build an awareness of the basics that we all kind of know but either take for granted, haven’t learned yet, or straight up forgot.
  2. We ask for speakers to explore how a given solution fits into multiple domain’s information systems and to discuss the tension between the diversity of use cases that are accommodated by an IT solution targeted at interoperability. We are especially interested to learn about the expense / risk of adopting dependencies vs the efficiency that can be gained from adopting pre-built dependencies.
  3. We look for opportunities to take small but meaningful steps to record the core aspects of these sessions in the form of web resources like the ESIP wiki or even Wikipedia. On this front, we will aim to construct a summary wiki page from each meeting assembled from a working notes document and the presenting authors contribution.