Preservation Use Cases

From Earth Science Information Partners (ESIP)
Revision as of 07:53, May 15, 2011 by Brbarkstrom (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

There appears to be some confusion over what use cases should do. This page is intended to reduce the confusion.

There are probably three different definitions of what a use case is:

  • A short (one or two paragraph) "story" that motivates a content discussion.
  • A UML object that shows how an external actor interacts with a system.

The intent of this kind of use case is to determine the functional specifications of the system, including what happens in exception handling.

  • An approach to developing a system.

In this approach, use case scenarios drive the description of the system in UML development. The use cases are intended to help identify objects and actions the system must have to fulfill the intended actions in the scenario. In the formal UML documentation, this is known as "use-case driven development."

Based on discussion at the GEODATA 2011 meeting in early March, the use case coverage should probably attempt to include the "full life cycle" of data, starting with data collection and going all the way through curation activities after an archive (or archives) have assumed responsibility for taking care of the collection.

Use Case Purpose 1. Ensuring Coverage

The first definition of use cases may be intended to ensure that a potential system will have coverage of a particular area. For Earth science data collections, we probably need to cover the following topics:

  • Kind of atomic objects in the collection

[Physical Objects, Digital Files, Databases]

  • Time Range

[Short (e.g. field experimment), Moderate (e.g. 1 year or life of a single satellite), Long (e.g. multi-decadae), Climate (centuries)]

  • Time Sampling

[Irregular-Sporadic, Intermittent, Periodic, Continuous]

  • Horizontal Coverage

[Point, Region, Global]

  • Horizontal Sampling

[Point, Regular Network or Grid, Irregular Network]

  • Vertical Coverage

[Surface, Layer (Atmosphere or Ocean), Troposphere, Stratosphere, Entire Vertical Range]

  • Parameters

[Key Parameters, such as "Solar Irradiance", "Ozone Concentration", etc.]

  • Sources

[Single Source (e.g. 1 satellite instrument, 1 surface station), Multi-source (e.g. several satellite instruments, surface network)]

  • Production Pattern

[One-off, Ad hoc Exploratory (e.g. Graphical workflow engine driven by human interaction), Small Scale - no versioning, Moderate Scale - small number of processes, Industrial - highly automated with versioning allowed, Operational - highly automated with no versioning]

In addition, we probably need four other fields:

  • Collection Name

[A variable length character string]

  • Producer

[An individual, a name for a collection of individuals, or a name for an organization]

  • Owner

[An individual, a name for a collection of individuals, a name for an organization, or the name of a government agency - which has been delegated to long-term curation or permanent archival custody]

  • Number of Atomic Entities in Collection

[A number]

The intent of this classification is NOT to replicate a complete metadata specification, but to give a small number of choices for each category that will characterize the use case in order to ensure reasonable coverage of possible kinds of Earth science data collections.

Example 1. Photographs of Glaciers This collection consists of

  • Collection Name: Glacier Photograph Collection
  • Producer: Various Photographers
  • Owner: Photographic Archive
  • Number of atomic entities in the collection: 20,000
  • Kind of atomic objects in the collection: Physical Objects - negatives on glass plates or film
  • Time Range: Climate - negatives come from the time period 1880 through 2008
  • Time Sampling: Irregular-Sporadic - depends on schedule of expeditions
  • Horizontal Coverage: Region - glaciers in NW North America or NE North America
  • Horizontal Sampling: Irregular Network - glaciers are irregular features
  • Vertical Coverage: Surface - primarily, although the pictures may include a bit of sky
  • Parameters: Reflected radiance from glaciers and rocks
  • Sources: Multi-source - Many cameras set up by many photographers
  • Production Pattern: Small scale job shop - each negative developed by hand

Example 2: Digital Photographs of Glaciers This collection consists of

  • Collection Name: Glacier Digital Photograph Collection
  • Producer: NSIDC
  • Owner: U.S. Government
  • Number of atomic entities in the collection: 20,000
  • Kind of atomic objects in the collection: Digital Files - one file for each negative or print
  • Time Range: Climate - negatives come from the time period 1880 through 2008
  • Time Sampling: Irregular-Sporadic - depends on schedule of expeditions
  • Horizontal Coverage: Region - glaciers in NW North America or NE North America
  • Horizontal Sampling: Irregular Network - glaciers are irregular features
  • Vertical Coverage: Surface - primarily, although the pictures may include a bit of sky
  • Parameters: Reflected radiance from glaciers and rocks
  • Sources: Multi-source - Many cameras set up by many photographers
  • Production Pattern: Small scale job shop - each negative scanned by contractor

Example 3: Multi-satellite Record of Solar Constant This collection consists of

  • Collection Name: Multi-satellite Solar Constant Record
  • Producer: Frolich & Lean
  • Owner: PMOD (Physicalische Meteorologishe Observatory Davos)
  • Number of atomic entities in the collection: 3
  • Kind of atomic objects in the collection: Digital Files
  • Time Range: Long - 30 years or more
  • Time Sampling: Continuous (individual measurments cover about 1 minute)
  • Horizontal Coverage: Point - above top of the atmosphere
  • Horizontal Sampling: Point - don't need more than one point
  • Vertical Coverage: Top of Atmosphere
  • Parameters: Solar Irradiance
  • Sources: Six or more satellites (ACRIM, Hickey-Frieden, ERBS, SORCE)
  • Production Pattern: Single automated production that subsets files and produces a single time series with versioning

Use cases for this purpose should probably identify the number of objects covered by a citation and should show how the citation would appear.

Note that these categories can be kept in a tab-delimited text file for purposes of reference.

Use Case Purpose 2. Identifying Objects

In the second kind of use case, the intent is to create a UML model of a system to ensure that the objects in a system model are complete and that the system will have a proper set of behaviors. This kind of use case is typically done by producing a Unified Modeling Language (UML) Web site. Such a Web site has been provided for a UML model of the Digital Glacier Photo Collection by B. R. Barkstrom. The Web site includes four use case scenarios:

  • Development of a collection of glacier photo negatives and prints, together with transfer of these items and metadata to a collection of photographic archives
  • Transformational Migration of the physical collection to a digital collection of images that are made available to users in a Web site (similar to NSIDC's Glacier Photo Collection - although probably not identical with it)
  • Use of the Digital Photo Collection by a high school student to create a report that includes two glacier photo images and paraphrased text scraped from the Web site documentation
  • Use of the Digital Photo Collection by a scientific research team to determine quantitative changes in glacier area (and perhaps ice volume)

The UML model includes a great deal more detail than the use case "short stories" in the first kind of use case description.

Use Case Purpose 3. Testing Conformance of a System Design

The second kind of use case that produces a UML model provides a fairly detailed description of the objects and object behavior that a production, distribution, and preservation system should include. In the third use case description, the UML model is carried through to design and implement a system or a simulation. In this case, the detail has to be realistic enough to ensure that the system data structures and operation work according to specification. Thus, this kind of UML model would allow conformance testing of a system design.

Note that since a UML model can treat both data producers and data users (as well as archivists and administrators) as multi-threaded agents in a simulation, it is possible to determine a number of production and system behaviors of the system.

Use Cases for Data Citations

One important kind of use case involves data citations. There are three generic purposes for data citation:

  • Giving credit to producers
  • Providing an auditable reference that will allow statistical verification of provenance procedures and due diligance in maintaining a collection
  • Providing references to material that would be required to replicate data

Note that these three purposes have different levels of precision. Giving credit can be quite generic and unspecific. Auditable reference will require information about the number of entities in a collection and of the preservation context with enough detail to pull out a sample of entities. Replication requires an exhaustive list of objects and the ability to obtain such artifacts as source code, ancillary data (such as calibration data), and validation data (such as field experiment data used in intercomparisons with a particular data set).

It would probably be helpful to produce a Rough categorization of the one or two paragraph kind of use case story with the following fields:

  • Use Case Name: [a variable length character string]
  • Collection Name: [a variable length character string that references a collection name in the previous catalog]
  • Number of Objects Cited: [a number]
  • Purpose: [Giving Credit, Auditable Verification, Replication]
  • Story: [a pointer to the story]

For example, the high school report scenario might receive the following categorization:

  • Use Case Name: High school report on glacier changes
  • Collection Name: Glacier Digital Photograph Collection
  • Number of Objects Cited: 2
  • Purpose: Giving Credit
  • Story: http:...

The story in this example would be something like The high school report on glacier changes refers to the activity in which a high school student accesses the Glacier Digital Photograph Collection to download two digital images that will be included in a report on visible changes in glaciers over the last century. The report is a class assignment from a class in environmental science.