Cloud Computing Cluster Plan 2021

From Earth Science Information Partners (ESIP)

ESIP Cloud Computing Cluster 2021 Plan

  • Co-Chair(s): Aimee Barciauskas, Sudhir Shrestha
  • Monthly Meeting Day/Time: 4th Monday of every month at 1pm ET

The Cloud Computing Cluster 2021 Objective is to create more cloud experts.

Create more cloud experts.

The ESIP Cloud Computing Cluster aims to create more “cloud experts”: Earth data science users using cloud resources science getting done either faster or at greater scale. The Cloud Computing Cluster will match cloud technologies with Earth data users and applications, specifically in the Earth sciences but also with decision makers and the general public in mind.

Things we will do to fulfill our objectives:

One of the challenges is this cluster is its broad cross-cutting mandate to create more cloud experts. Topics span science on the cloud to infrastructure as code. Stakeholders span scientists to decision makers to software developers.

We can address these challenges through our general and targeted approaches:

  • This cluster is a home to ambiguous cross-over and cross-cutting topics.
  • This cluster acknowledges that there is not one set of best practices or tools that will address all applications and experience levels.
  • This cluster will meet 1x / month to participate in knowledge sharing and conversation.
  • This cluster will (optionally) meet 1x / month to have a working session where we work on disseminating and formalizing the outputs of the cluster.
  • This cluster will have quarterly themes where we will lead and engage in discussions about active topics of interest.
    • The first quarter will be May 1, 2021 - July 31, 2021 and will focus on cloud-friendly data formats and tools.
    • This cluster will lead 1x / quarter #cloudclinics. These “live” events will focus on the theme of the quarter and include volunteer experts to field questions.
      • When will the first cloud clinic be? Perhaps during or right after the ESIP summer session.
      • Ongoing #cloudclinics twitter and slack channels for posting and answering questions
    • Themes for latter quarters may be:
      • Infrastructure as code and devops best practices
      • Metadata standards, i.e. guidance on when to use STAC and when to use CMR
      • Open science platforms, such as pangeo
  • This cluster will create documentation on common and current cloud science topics such as:
    • Which metadata standard is right for my use case? When to use STAC, CMR, OGC
    • What is the right cloud-optimized data format for my data and my science use case?
    • How do I find the right publicly available data on the cloud for my use case?
  • ESIP Summer session “The saga continues: new advances in cloud-friendly data formats”
    • MRF+CRF, Zarr, EPT, COG, NetCDF4/HDF5, Grib2, TileDB, Ohmy
    • Zarr metadata for NetCDF4/HDF5
    • Pangeo-forge - build your own cloud-friendly dataset
    • What is the metadata required?
    • What should be the durability/ephemeral nature of cloud-native formats?
    • How can data formats help solve the multi-cloud multi-region challenge?
    • Format for one need (visualization) may be different than another (analysis)
      • Xarray can be used for multi-dimensional analysis but is challenging to visualize
      • What are some cool solutions to this challenge?

Things our collaboration area needs to deliver our objectives?

(e.g. Partnerships, in-kind support, staff support)

How will we know we are on the right track?

  • Engineers and scientists attend and are interested in our webinar series
  • Participation in live #cloudclinics
  • Activity on social channels, twitter and slack, for #cloudclinics

How will others know what we are doing in & out of ESIP?

  • Post webinars on the ESIP youtube channel
  • Attend other sessions and advertise webinars and #cloudclinics

Existing or Desired Cross-collaboration area connections

  • Pangeo

Prior/Existing ESIP and Cloud Computing Cluster Artifacts