Enabling the Geoverse: Introducing a NOAA Vision for Next Generation Interoperability - Ryan Berkheimer, NOAA, NCEI

From Earth Science Information Partners (ESIP)

June 15, 2023 Discovery Cluster Meeting:

Go Back to the Discovery Cluster Wiki Page

Meeting Presentation:

Enabling the Geoverse: Introducing a NOAA Vision for Next Generation Interoperability; Ryan Berkheimer, NOAA NCEI.

NOAA is using a knowledge graph-of-graphs approach as the architecture core of their next generation Archive and Access System within the NESDIS Common Cloud Framework (NCCF). This service is being implemented using an RDF based, Web 3.0 focused democratized data mesh/self service model that allows data provider and access oriented teams to essentially self-manage definition and deployment of their workflows and concepts in a holistic, shared, and interoperable knowledge graph.

Domain teams can use a small core API to define their unique concepts as patterns against a common OAIS-based reference model, reusing patterns as needed; tie those concepts to any available or provided business logic (models, procedures, API calls, etc)  within classified Tasks (identity, context transformation, and dissemination) that 'fill in' those concepts to create complete, contextualized, and quality-assured records; compose those Tasks as Directed Acyclic Graphs (Processes); and deploy the Process to the system for immediate use via event trigger.

As data flows through each Task within the triggered Process, data produced by the user provided logic is converted into defined graph shapes and stored as denormalized JSON-LD files on a hierarchical Object store for immediate use via API or S3 web crawler. AWS Neptune is leveraged as a fast inference and access layer for all patterns and templates to facilitate discovery; records can be pulled in on an experiment-specific basis (up to the entire set of known records) for targeted training of various models including domain specific prediction or classification models, LLMs, hybrid models, and Digital Twin models.

The system API allows users to construct any type of process, and users can define both archive and access side processes - processes support all sorts of workflow patterns including online, streaming, aggregation, accumulation, and human-in-the-middle. This allows users to create and share processes that can be composed into tailored user interfaces for specific user communities. As processes are encoded as interoperable data, they may be used as digital threads to contextualize processing and serve as a searchable dimension of the system itself. Eventually, the goal would be to enable NOAA to share models, techniques, and also enable dynamic process creation by machines. The overall intent is to support a NOAA node in a global step change to the geoverse, eliminating data silos, while enabling NOAA to meet users where they want to be met.