Preservation Use Cases

From Earth Science Information Partners (ESIP)

Introduction

There appears to be some confusion over what use cases should do. This page is intended to reduce the confusion.

There are probably three different definitions of what a use case is:

  • A short (one or two paragraph) "story" that motivates a content discussion.
  • A UML object that shows how an external actor interacts with a system.

The intent of this kind of use case is to determine the functional specifications of the system, including what happens in exception handling.

  • An approach to developing a system.

In this approach, use case scenarios drive the description of the system in UML development. The use cases are intended to help identify objects and actions the system must have to fulfill the intended actions in the scenario. In the formal UML documentation, this is known as "use-case driven development."

Based on discussion at the GEODATA 2011 meeting in early March, the use case coverage should probably attempt to include the "full life cycle" of data, starting with data collection and going all the way through curation activities after an archive (or archives) have assumed responsibility for taking care of the collection.

Use Case Purpose 1. Ensuring Coverage

The first definition of use cases may be intended to ensure that a potential system will have coverage of a particular area. For Earth science data collections, we probably need coverage of the following topics:

  • Kind of atomic objects in the collection

[Physical Objects, Digital Files, Databases]

  • Time Range

[Short (e.g. field experimment), Moderate (e.g. 1 year or life of a single satellite), Long (e.g. multi-decadae), Climate (centuries)]

  • Time Sampling

[Irregular-Sporadic, Intermittent, Periodic, Continuous]

  • Horizontal Coverage

[Point, Region, Global]

  • Horizontal Sampling

[Point, Regular Network or Grid, Irregular Network]

  • Vertical Coverage

[Surface, Layer (Atmosphere or Ocean), Troposphere, Stratosphere, Entire Vertical Range]

  • Parameters

[Key Parameters, such as "Solar Irradiance", "Ozone Concentration", etc.]

  • Sources

[Single Source (e.g. 1 satellite instrument, 1 surface station), Multi-source (e.g. several satellite instruments, surface network)]

  • Production Pattern

[One-off, Ad hoc Exploratory (e.g. Graphical workflow engine driven by human interaction), Small Scale - no versioning, Moderate Scale - small number of processes, Industrial - highly automated with versioning allowed, Operational - highly automated with no versioning]

The intent of this classification is NOT to replicate a complete metadata specification, but to give a small number of choices for each category that will characterize the use case.

Example 1. Photographs of Glaciers This collection consists of

  • Kind of atomic objects in the collection: Physical Objects - negatives on glass plates or film
  • Time Range: Climate - negatives come from the time period 1880 through 2008
  • Time Sampling: Irregular-Sporadic - depends on schedule of expeditions
  • Horizontal Coverage: Region - glaciers in NW North America or NE North America
  • Horizontal Sampling: Irregular Network - glaciers are irregular features
  • Vertical Coverage: Surface - primarily, although the pictures may include a bit of sky
  • Parameters: Reflected radiance from glaciers and rocks
  • Sources: Multi-source - Many cameras set up by many photographers
  • Production Pattern: Small scale job shop - each negative developed by hand

Example 2: Digital Photographs of Glaciers This collection consists of

  • Kind of atomic objects in the collection: Digital Files - one file for each negative or print
  • Time Range: Climate - negatives come from the time period 1880 through 2008
  • Time Sampling: Irregular-Sporadic - depends on schedule of expeditions
  • Horizontal Coverage: Region - glaciers in NW North America or NE North America
  • Horizontal Sampling: Irregular Network - glaciers are irregular features
  • Vertical Coverage: Surface - primarily, although the pictures may include a bit of sky
  • Parameters: Reflected radiance from glaciers and rocks
  • Sources: Multi-source - Many cameras set up by many photographers
  • Production Pattern: Small scale job shop - each negative scanned by contractor

Example 3: Multi-satellite Record of Solar Constant This collection consists of

  • Kind of atomic objects in the collection: Digital Files
  • Time Range: Long - 30 years or more
  • Time Sampling: Continuous (individual measurments cover about 1 minute)
  • Horizontal Coverage: Point - above top of the atmosphere
  • Horizontal Sampling: Point - don't need more than one point
  • Vertical Coverage: Top of Atmosphere
  • Parameters: Solar Irradiance
  • Sources: Six or more satellites (ACRIM, Hickey-Frieden, ERBS, SORCE)
  • Production Pattern: Single automated production that subsets files and produces a single time series with versioning

Use cases for this purpose should probably identify the number of objects covered by a citation and should show how the citation would appear.

Use Case Purpose 2. Identifying Objects

In the second kind of use case, the intent is to create a UML model of a system to ensure that the objects in a system model are complete and that the system will have a proper set of behaviors. This kind of use case is typically done by producing a Unified Modeling Language (UML) Web site. Such a Web site has been provided for a UML model of the Digital Glacier Photo Collection by B. R. Barkstrom. The Web site includes four use case scenarios:

  • Development of a collection of glacier photo negatives and prints, together with transfer of these items and metadata to a collection of photographic archives
  • Transformational Migration of the physical collection to a digital collection of images that are made available to users in a Web site (similar to NSIDC's Glacier Photo Collection - although probably not identical with it)
  • Use of the Digital Photo Collection by a high school student to create a report that includes two glacier photo images and paraphrased text scraped from the Web site documentation
  • Use of the Digital Photo Collection by a scientific research team to determine quantitative changes in glacier area (and perhaps ice volume)

The UML model includes a great deal more detail than the use case "short stories" in the first kind of use case description.

Use Case Purpose 3. Testing Conformance of a System Design

The second kind of use case that produces a UML model provides a fairly detailed description of the objects and object behavior that a production, distribution, and preservation system should include. In the third use case description, the UML model is carried through to design and implement a system or a simulation. In this case, the detail has to be realistic enough to ensure that the system data structures and operation work according to specification. Thus, this kind of UML model would allow conformance testing of a system design.

Note that since a UML model can treat both data producers and data users (as well as archivists and administrators) as multi-threaded agents in a simulation, it is possible to determine a number of production and system behaviors of the system.