Search NFAIS

Home
About NFAIS
Events

Promotions
Information Community News
Press Releases
Members
Committees
Join NFAIS
Contact NFAIS

Member Login



 

 

 

 

 

 

 

 

 

 

 

Home  >>  Publications  >>  Metadiversity  >>  Preprints Contents
 
Preprints of the Metadiversity Conference Proceedings

  Session 4: Building the Infrastructure

The Metadata Challenge for NBII

ANNE FRONDORF, Program Manager, National Biological Information Infrastructure, U.S. Geological Survey’s Biological Resources Division

ABSTRACT

This presentation will briefly review the development of the National Biological Information Infrastructure (NBII) to date and describe key recommendations of the PCAST Teaming with Life report relating to development of a "Next Generation" NBII. Some of the challenges involved in developing and implementing metadata standards and tools for the NBII include: the need to address a very wide variety of data types, information products, and analysis tools within the NBII infrastructure; the need to provide metadata approaches that help link the biological sciences/biodiversity conservation community with other related communities (e.g., geospatial data and library communities); and the need to develop approaches and tools that not only lead to production of high-quality metadata but that also are understood and accepted by those who will need to enter the metadata.

Today I want to summarize briefly the work to-date in building what currently is the National Biological Information Infrastructure (NBII). Then I will look at some of the major infrastructure-related challenges that are facing us, particularly as we work toward the goal of the next generation NBII as described in the PCAST report.

Part of the charge that was given to the original National Biological Survey–when it was first created by Secretary Babbitt and predating our merger with the U.S. Geological Survey–was to help to create a national partnership for sharing biological information. That idea of information sharing was the basis of what we have tried to do to date in NBII.

The philosophy behind NBII is to help build a distributed federation of biological data and information. The standards, the policies, the rules by which we would all agree to work together–these are the underpinnings that make it possible to collaborate in a distributed way and allow us to have discovery, retrieval, and integration of data across different sources. Perhaps most importantly, they also allow for the application of biological data and information to the real questions on which we are focusing.

In this talk, I will focus on two areas of the NBII program that are most pertinent to the reason we are meeting here this week. The first area is the diversity of the content, because obviously that diversity of content has significant implications for how we put the infrastructure together–particularly on the metadata side. It also has implications for how we look at metadata standards, tools, techniques, and approaches.

The second area is the ongoing NBII commitment to link with and build on existing, parallel, infrastructure efforts in other communities. You have heard about several of these efforts–both at the national and international levels–at the meeting this week. The NBII focus in this regard has always been to help create a bridge between other infrastructure efforts (which represent other views and other communities) and the biological sciences community. That has been the philosophical commitment that we have had in working on the NBII.

Building the NBII

Our approach has been to be very inclusive in terms of looking at the content of NBII, which, of course, means looking at databases, data sets, a variety of information products, and analysis tools to use on data. Some of this NBII content comes from USGS biologists, who are out on the ground doing biology. Some comes from our many partner agencies and organizations outside the USGS.

The USGS North American Breeding Bird Survey is an example of this diversity. This is a program run by our Patuxent Wildlife Research Center. This program has very valuable long-term data sets. It includes over 35 years’ worth of data on over 400 different species of North American birds. The center obviously has very valuable data that it is trying to make more accessible through the NBII. The center also provides information products derived from the data sets, such as maps of bird distributions for individual species.

Involving State and Federal Agencies

Other key biodiversity data producers that we want very much to engage in this effort include state agencies–most importantly, the state fish and wildlife agencies and the state natural heritage programs, because those two groups together collect, maintain, and provide a very large amount of very valuable biodiversity data. These data are maintained in diverse formats that may vary from agency to agency and state to state. All this is a big challenge for us. How can we work collectively with those groups at the state level that really are great developers and repositors of biodiversity data and link them into this bigger effort?

Collections and Museums

Obviously natural history collections and museums are tremendous producers and maintainers of biodiversity data. What we have tried to do at NBII is work–wherever we have the opportunity to work–individually with particular museums or collections to help them be in a position to make more of their biological specimen data accessible, as well as help them look at these issues strategically. We are working with our partner agencies–including federal, state, and non-government agencies and organizations–to see what we collectively can do with the collections and museums to help put those institutions in a position where their data are more accessible, more interoperable, and more applicable for resource management decisions.

Directories

Directories of biodiversity or biological science experts can also be another really valuable information product if you think about federating them and making them accessible for different people to use and access. The Taxonomic Resources and Expertise Directory is one example. This is a cooperative project among the federal agencies that work together on the Integrated Taxonomic Information System (ITIS) and the Association of Systematics Collections (ASC). Basically, we have created an online directory of taxonomic specialists for North America that includes information on their areas of taxonomic and geographic specialty and that is available for people to find and use as a resource. We now have about 1,000 different specialists listed there. Experts can both enter and update their data online that way.

Analysis Tools

Another important part of the NBII content is tools for biological analysis. The idea here is that we want to be in a position not just to let people find data and information more easily but also to find and share analytical tools–such as ecological models or GIS applications. These are tools that people can use to get to the point at which they are actually answering a question or producing a result. We can use the federation to share the tools just as you can use the federation to share the data. We have a component of NBII where we are working to make biological analysis tools available for people to find, share, and use. And we are working to populate this component with more tools and make it an important part of the NBII.

Partnerships to Build the Infrastructure

I have gone through these areas of content very quickly. But my goal was to emphasize that what we are talking about are communities–communities either of producers and suppliers of data and information and tools or communities of customers or users. Whatever we are talking about–state agencies or technical-report writers or modelers or some other group–these are all communities that we have to involve in building the infrastructure.

With regard to metadata, for example: We need to have approaches to metadata–whether you are talking about metadata for technical reports, metadata for analytical tools, metadata for data sets, metadata for information products, metadata for directories of experts–that cover all those aspects of the content. And we want to try to do this in a way that engages all the various communities and makes them want to be part of the broader endeavor. Only with the involvement of all communities will we be able to provide the common framework that knits all information together and makes it possible for someone to find museum specimen data, satellite imagery data, an ecological model, and a technical report that all relate to the very specific question that a person has to answer. That is the goal that we are trying to reach as we work with our partners to help build the NBII.

Linking NBII with Other Infrastructure Efforts

Now I want to discuss the importance to NBII of linking with and leveraging existing, parallel infrastructure efforts in other communities. I am going to use as an example just one infrastructure effort, the National Spatial Data Infrastructure (NSDI). Obviously part of that content I just described were biological data sets that are spatially referenced (e.g., bird distribution). That is why it has been such an important part of our focus to help make a linkage with spatial data initiatives. Again, our goal has always been to try to be a bridge between the biological sciences community and other communities (such as the geospatial data community). By supporting collaboration between the NSDI and the NBII, we can help make that bridge and, by doing so, hopefully build support for both the NSDI and the NBII.

The Federal Geographic Data Committee

Within the Federal Geographic Data Committee, which coordinates the NSDI, we have established a Biological Data Working Group. This Working Group has members from several federal agencies, as well as some non-federal partners, that are working together under the structure of the FGDC to look at ways to help ensure that we are doing our utmost to increase sharing and access of biological spatial data.

We are also working through the FGDC standards process to try to build some federal data and metadata standards that we can use in the NBII. The first thing we have done in that regard is a metadata standard for NBII. We did this by developing a biological profile of the existing FGDC geospatial metadata content standard. This profile includes the entire FGDC geospatial metadata standard and adds some elements to it, so that hopefully it is more pertinent or meaningful to the biological sciences community. For example, we added some elements about nomenclature and taxonomy, which the spatial standard doesn't really cover, since that is not what it is set up to do. This is a good example of the kind of bridge we are trying to build to link biology back to the spatial data community.

Clearing House

Another way we are working with the NSDI is through the Clearing House. We have an online NBII Metadata Clearing House, created along the same lines and procedures of the NSDI Clearing House. We operate as a node off the NSDI. Again, we have extended our Clearing-House function a little bit to allow people who are looking for metadata and data sets to search on those additional biological metadata fields that we have added in our biological profile. So, in a way we have taken the NSDI and biologically enhanced it. To me, this is a very visible example of bridge-building between biology and the spatial data community.

Cooperative Funding Efforts

We also cooperate with the NSDI to help fund non-federal projects (with state agencies or universities) that are, again, helping to build the NSDI and NBII. This has been a very successful partnership–one that allows you to actually "see" some of those bridges being built. For example, I have seen instances in which a state fish and wildlife agency or a state heritage program, in order to make this project work, will join forces with an organization like a state’s Geographic Information Systems Council–two kinds of groups that might not normally have a lot of interaction with each other at the state level. By helping provide money and by looking for projects that link biology and spatial data, we have hopefully encouraged some groups at the state level to start making some connections.

Again, I want to emphasize that although I have used NSDI as an example of linking to other infrastructures, that is only one example. In fact, a huge part of the NBII effort involves looking for ways we can link with other infrastructure efforts, both nationally and internationally. One other example is the Global Change Master Directory. The NBII Program has a cooperative relationship with NASA’s Global Change Directory. Again, we pull resources together, and that has allowed us to find and document biodiversity data sets and then make those data sets accessible through both the Global Change Directory and the NBII.

Essential NBII Infrastructure Components

Now I would like to identify a couple of key NBII infrastructure elements. The first is the development of a controlled biological vocabulary. This means having a consistent, standard reference of biological terms that is available for people to use–both on the supply side, to use in describing data and information products, and on the demand (or the customer) side, to use when one is searching for information. This is the kind of key contribution on which we can all work together.

The second important component of the NBII infrastructure is a standard reference for biological nomenclature and taxonomy. For the NBII, this reference is the Integrated Taxonomic Information System (IT IS). I know you heard about ITIS yesterday from Bruce Collette. But I will just add that since we started work, the NBII program has been a very strong advocate and supporter of ITIS, because as far as we are concerned, having a common frame of reference that is scientifically credible for species names is a linchpin concept to make all of this work. The species names are what locate us in a biological data world.

The Effect of the PCAST Report on NBII

The PCAST report has definitely laid out some challenges for the advancement of NBII. I just want to touch on two major ideas that I think have implications for the kinds of things we are discussing here this week. The first is the PCAST recommendation to significantly increase the biodiversity and the ecosystem data and information content of the NBII. We have really only scratched the surface of the diversity and the extent of content that we need to include. To advance farther, we really must increase our investment in and funding for all those different kinds of communities that I identified earlier as the producers and maintainers of biodiversity data. To make that happen, we must continue to focus our efforts on involving those communities in both the design and the building of the infrastructure. Again, we need to help them see themselves within that broader picture.

The second big focus of PCAST was the idea of a next-generation NBII and the fact that we want to increase the amount of research and development funding that is focused on biodiversity information science and biodiversity computer science. This will support the idea of true interoperability of all this distributed content–we eventually want to be truly interoperating all these data and all these information products, all these tools.

Challenges Faced by NBII

The greatest challenges faced by NBII are encountered as we look for ways to link together all interested stakeholder communities–communities that represent aspects of the totality of biodiversity data and information and analysis–communities such as state agencies, museums, collections, library communities, and spatial data communities. We need to try to engage all these communities, even if they do not totally agree on all the aspects of our work in terms of building the infrastructure.

In terms of metadata, we need to be thinking of ways to make metadata standards that are modular, so that we can link things together across communities. The metadata standards and approaches also should allow people to prepare high-quality metadata. Yesterday, Jeff Frithsen spoke of the "20-year-rule." I have a slightly different–and personal–take on what quality metadata are under the 20-year-rule. My idea is that you want to have metadata that are good enough that someone with whom you may never come in contact can use your data for some application that you yourself would have never imagined. I think that when we are in a position to let people really create quality metadata and when we give people usable metadata tools that make sense from their perspective, we will be well on the way of reaching our goal.

Previous | Next

 


Questions: Email us or Call (215) 893-1561

Copyright © 2003 NFAIS. All rights reserved. No part of this product or service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written consent.

Privacy Policy