US Federal RORs

From Earth Science Information Partners (ESIP)

The U.S. Federal Government is a major participant in the global research community and keeping track of research supported by the U.S. Federal Government is an important task. The Wikipedia List of United States research and development agencies includes five independent agencies, thirteen cabinet level agencies, and four multi-agency initiatives. Most of these include multiple layers of centers, divisions, services, institutes, directorates, and other organizations, each of which carry out, fund, and oversee research in almost every conceivable discipline.

The Research Organization Registry is a community-led project that aims to develop an open, sustainable, usable, and unique identifier for every research organization in the world. It is supported by several large identifier infrastructure operators (CrossRef and DataCite) as well as the University of California systemwide California Digital Library and recently released the first version of a registry based on GRIDs donated to the effort by Digital Science. Until the ROR community takes over curation and management of the Registry, the contents will continue to be the GRID data . The GRID database is updated several times a year according to open policies, and is publicly available at no cost and dumps of the ROR data are also available.

Adopting any new identifier system, even when benefits are well known, is a significant challenge. The ROR community roadmap identifies a number of product, policy, and community development tasks including raising awareness among community members. The goal of this blog is to raise ROR awareness among researchers affiliated with the U.S. Federal Government. How can these researchers use ROR and what future work might help facilitate adoption in the U.S. Federal research community?

Finding U.S. Federal RORs

Characterizing the granularity of ROR identifiers is an important step towards using these identifiers effectively in the U.S. Federal context. In the academic setting, the goal of ROR is generally to identify organizations at the University or College level, i.e. no academic departments. It is not immediately clear how this translates to the public sector. Given the deep hierarchy of Federal agencies and departments, finding Federal organizations can be a challenge. As mentioned above, there are roughly 20 “department level” entities in the U.S. Government that do research and development. Does this mean that 20 RORs will cover the whole U.S. Government? That seems like a very small number, perhaps too small to be very useful. Rather than try to answer this question theoretically, we start with an empirical characterization of the current ROR data as a baseline.

The ROR data includes two fields that may provide a starting point: organization country and type. The United States makes up 31% of the data (29,686 RORs) and 1247 of those are identified as Government types. Many of these organizations are state or regional entities, out of scope for this work, so we need a better search strategy. The ROR data also includes links to organization homepages and many U.S. Federal entities are in the .gov or .mil domains. The second step was to look up the domain paths to identify specific organizations with *.gov or .mil domains. The .gov domain still includes many state and regional entities, so some manual selection is still required. For example, many U.S. National Labs have specific domain names that do not include agency abbreviations, e.g. Lawrence Berkeley National Laboratory: http://www.lbl.gov/ which is part of the Department of Energy Office of Science.

Table 1 shows the number of RORs for U.S. Federal organizations discovered using domains in the homepage links for the organizations and manual exploration. The largest number of RORs identify regional and state Veteran’s Administration Medical Centers (e.g. VA Eastern Colorado Health Care System). Other include five cabinet level departments, ~10 National Labs, over 100 Centers, ~20 Services, and many other organizations. The Table also includes the highest-level RORs for each organization in parentheses.