Preprints of the
Metadiversity
Conference
Proceedings
Session 4: Building the
Infrastructure
The Metadata Challenge for
NBII
ANNE FRONDORF, Program
Manager, National Biological Information Infrastructure,
U.S. Geological Survey’s Biological Resources Division
|
ABSTRACT
This presentation will
briefly review the development of the National
Biological Information Infrastructure (NBII) to
date and describe key recommendations of the PCAST
Teaming with Life report relating to development of
a "Next Generation" NBII. Some of the challenges
involved in developing and implementing metadata
standards and tools for the NBII include: the need
to address a very wide variety of data types,
information products, and analysis tools within the
NBII infrastructure; the need to provide metadata
approaches that help link the biological
sciences/biodiversity conservation community with
other related communities (e.g., geospatial data
and library communities); and the need to develop
approaches and tools that not only lead to
production of high-quality metadata but that also
are understood and accepted by those who will need
to enter the metadata. |
Today I want to summarize
briefly the work to-date in building what currently is the
National Biological Information Infrastructure (NBII). Then
I will look at some of the major infrastructure-related
challenges that are facing us, particularly as we work
toward the goal of the next generation NBII as described in
the PCAST report.
Part of the charge that was
given to the original National Biological Survey–when it was
first created by Secretary Babbitt and predating our merger
with the U.S. Geological Survey–was to help to create a
national partnership for sharing biological information.
That idea of information sharing was the basis of what we
have tried to do to date in NBII.
The philosophy behind NBII is to
help build a distributed federation of biological data and
information. The standards, the policies, the rules by which
we would all agree to work together–these are the
underpinnings that make it possible to collaborate in a
distributed way and allow us to have discovery, retrieval,
and integration of data across different sources. Perhaps
most importantly, they also allow for the application of
biological data and information to the real questions on
which we are focusing.
In this talk, I will focus on
two areas of the NBII program that are most pertinent to the
reason we are meeting here this week. The first area is the
diversity of the content, because obviously that diversity
of content has significant implications for how we put the
infrastructure together–particularly on the metadata side.
It also has implications for how we look at metadata
standards, tools, techniques, and approaches.
The second area is the ongoing
NBII commitment to link with and build on existing,
parallel, infrastructure efforts in other communities. You
have heard about several of these efforts–both at the
national and international levels–at the meeting this week.
The NBII focus in this regard has always been to help create
a bridge between other infrastructure efforts (which
represent other views and other communities) and the
biological sciences community. That has been the
philosophical commitment that we have had in working on the
NBII.
Building the NBII
Our approach has been to be very
inclusive in terms of looking at the content of NBII, which,
of course, means looking at databases, data sets, a variety
of information products, and analysis tools to use on data.
Some of this NBII content comes from USGS biologists, who
are out on the ground doing biology. Some comes from our
many partner agencies and organizations outside the USGS.
The USGS North American Breeding
Bird Survey is an example of this diversity. This is a
program run by our Patuxent Wildlife Research Center. This
program has very valuable long-term data sets. It includes
over 35 years’ worth of data on over 400 different species
of North American birds. The center obviously has very
valuable data that it is trying to make more accessible
through the NBII. The center also provides information
products derived from the data sets, such as maps of bird
distributions for individual species.
Involving State and Federal
Agencies
Other key biodiversity data
producers that we want very much to engage in this effort
include state agencies–most importantly, the state fish and
wildlife agencies and the state natural heritage programs,
because those two groups together collect, maintain, and
provide a very large amount of very valuable biodiversity
data. These data are maintained in diverse formats that may
vary from agency to agency and state to state. All this is a
big challenge for us. How can we work collectively with
those groups at the state level that really are great
developers and repositors of biodiversity data and link them
into this bigger effort?
Collections and Museums
Obviously natural history
collections and museums are tremendous producers and
maintainers of biodiversity data. What we have tried to do
at NBII is work–wherever we have the opportunity to
work–individually with particular museums or collections to
help them be in a position to make more of their biological
specimen data accessible, as well as help them look at these
issues strategically. We are working with our partner
agencies–including federal, state, and non-government
agencies and organizations–to see what we collectively can
do with the collections and museums to help put those
institutions in a position where their data are more
accessible, more interoperable, and more applicable for
resource management decisions.
Directories
Directories of biodiversity or
biological science experts can also be another really
valuable information product if you think about federating
them and making them accessible for different people to use
and access. The Taxonomic Resources and Expertise Directory
is one example. This is a cooperative project among the
federal agencies that work together on the Integrated
Taxonomic Information System (ITIS) and the Association of
Systematics Collections (ASC). Basically, we have created an
online directory of taxonomic specialists for North America
that includes information on their areas of taxonomic and
geographic specialty and that is available for people to
find and use as a resource. We now have about 1,000
different specialists listed there. Experts can both enter
and update their data online that way.
Analysis Tools
Another important part of the
NBII content is tools for biological analysis. The idea here
is that we want to be in a position not just to let people
find data and information more easily but also to find and
share analytical tools–such as ecological models or GIS
applications. These are tools that people can use to get to
the point at which they are actually answering a question or
producing a result. We can use the federation to share the
tools just as you can use the federation to share the data.
We have a component of NBII where we are working to make
biological analysis tools available for people to find,
share, and use. And we are working to populate this
component with more tools and make it an important part of
the NBII.
Partnerships to Build the
Infrastructure
I have gone through these areas
of content very quickly. But my goal was to emphasize that
what we are talking about are communities–communities either
of producers and suppliers of data and information and tools
or communities of customers or users. Whatever we are
talking about–state agencies or technical-report writers or
modelers or some other group–these are all communities that
we have to involve in building the infrastructure.
With regard to metadata, for
example: We need to have approaches to metadata–whether you
are talking about metadata for technical reports, metadata
for analytical tools, metadata for data sets, metadata for
information products, metadata for directories of
experts–that cover all those aspects of the content. And we
want to try to do this in a way that engages all the various
communities and makes them want to be part of the broader
endeavor. Only with the involvement of all communities will
we be able to provide the common framework that knits all
information together and makes it possible for someone to
find museum specimen data, satellite imagery data, an
ecological model, and a technical report that all relate to
the very specific question that a person has to answer. That
is the goal that we are trying to reach as we work with our
partners to help build the NBII.
Linking NBII with Other
Infrastructure Efforts
Now I want to discuss the
importance to NBII of linking with and leveraging existing,
parallel infrastructure efforts in other communities. I am
going to use as an example just one infrastructure effort,
the National Spatial Data Infrastructure (NSDI). Obviously
part of that content I just described were biological data
sets that are spatially referenced (e.g., bird
distribution). That is why it has been such an important
part of our focus to help make a linkage with spatial data
initiatives. Again, our goal has always been to try to be a
bridge between the biological sciences community and other
communities (such as the geospatial data community). By
supporting collaboration between the NSDI and the NBII, we
can help make that bridge and, by doing so, hopefully build
support for both the NSDI and the NBII.
The Federal Geographic Data
Committee
Within the Federal Geographic
Data Committee, which coordinates the NSDI, we have
established a Biological Data Working Group. This Working
Group has members from several federal agencies, as well as
some non-federal partners, that are working together under
the structure of the FGDC to look at ways to help ensure
that we are doing our utmost to increase sharing and access
of biological spatial data.
We are also working through the
FGDC standards process to try to build some federal data and
metadata standards that we can use in the NBII. The first
thing we have done in that regard is a metadata standard for
NBII. We did this by developing a biological profile of the
existing FGDC geospatial metadata content standard. This
profile includes the entire FGDC geospatial metadata
standard and adds some elements to it, so that hopefully it
is more pertinent or meaningful to the biological sciences
community. For example, we added some elements about
nomenclature and taxonomy, which the spatial standard
doesn't really cover, since that is not what it is set up to
do. This is a good example of the kind of bridge we are
trying to build to link biology back to the spatial data
community.
Clearing House
Another way we are working with
the NSDI is through the Clearing House. We have an online
NBII Metadata Clearing House, created along the same lines
and procedures of the NSDI Clearing House. We operate as a
node off the NSDI. Again, we have extended our
Clearing-House function a little bit to allow people who are
looking for metadata and data sets to search on those
additional biological metadata fields that we have added in
our biological profile. So, in a way we have taken the NSDI
and biologically enhanced it. To me, this is a very visible
example of bridge-building between biology and the spatial
data community.
Cooperative Funding Efforts
We also cooperate with the NSDI
to help fund non-federal projects (with state agencies or
universities) that are, again, helping to build the NSDI and
NBII. This has been a very successful partnership–one that
allows you to actually "see" some of those bridges being
built. For example, I have seen instances in which a state
fish and wildlife agency or a state heritage program, in
order to make this project work, will join forces with an
organization like a state’s Geographic Information Systems
Council–two kinds of groups that might not normally have a
lot of interaction with each other at the state level. By
helping provide money and by looking for projects that link
biology and spatial data, we have hopefully encouraged some
groups at the state level to start making some connections.
Again, I want to emphasize that
although I have used NSDI as an example of linking to other
infrastructures, that is only one example. In fact, a huge
part of the NBII effort involves looking for ways we can
link with other infrastructure efforts, both nationally and
internationally. One other example is the Global Change
Master Directory. The NBII Program has a cooperative
relationship with NASA’s Global Change Directory. Again, we
pull resources together, and that has allowed us to find and
document biodiversity data sets and then make those data
sets accessible through both the Global Change Directory and
the NBII.
Essential NBII Infrastructure
Components
Now I would like to identify a
couple of key NBII infrastructure elements. The first is the
development of a controlled biological vocabulary. This
means having a consistent, standard reference of biological
terms that is available for people to use–both on the supply
side, to use in describing data and information products,
and on the demand (or the customer) side, to use when one is
searching for information. This is the kind of key
contribution on which we can all work together.
The second important component
of the NBII infrastructure is a standard reference for
biological nomenclature and taxonomy. For the NBII, this
reference is the Integrated Taxonomic Information System (IT
IS). I know you heard about ITIS yesterday from Bruce
Collette. But I will just add that since we started work,
the NBII program has been a very strong advocate and
supporter of ITIS, because as far as we are concerned,
having a common frame of reference that is scientifically
credible for species names is a linchpin concept to make all
of this work. The species names are what locate us in a
biological data world.
The Effect of the PCAST
Report on NBII
The PCAST report has definitely
laid out some challenges for the advancement of NBII. I just
want to touch on two major ideas that I think have
implications for the kinds of things we are discussing here
this week. The first is the PCAST recommendation to
significantly increase the biodiversity and the ecosystem
data and information content of the NBII. We have really
only scratched the surface of the diversity and the extent
of content that we need to include. To advance farther, we
really must increase our investment in and funding for all
those different kinds of communities that I identified
earlier as the producers and maintainers of biodiversity
data. To make that happen, we must continue to focus our
efforts on involving those communities in both the design
and the building of the infrastructure. Again, we need to
help them see themselves within that broader picture.
The second big focus of PCAST
was the idea of a next-generation NBII and the fact that we
want to increase the amount of research and development
funding that is focused on biodiversity information science
and biodiversity computer science. This will support the
idea of true interoperability of all this distributed
content–we eventually want to be truly interoperating all
these data and all these information products, all these
tools.
Challenges Faced by NBII
The greatest challenges faced by
NBII are encountered as we look for ways to link together
all interested stakeholder communities–communities that
represent aspects of the totality of biodiversity data and
information and analysis–communities such as state agencies,
museums, collections, library communities, and spatial data
communities. We need to try to engage all these communities,
even if they do not totally agree on all the aspects of our
work in terms of building the infrastructure.
In terms of metadata, we need to
be thinking of ways to make metadata standards that are
modular, so that we can link things together across
communities. The metadata standards and approaches also
should allow people to prepare high-quality metadata.
Yesterday, Jeff Frithsen spoke of the "20-year-rule." I have
a slightly different–and personal–take on what quality
metadata are under the 20-year-rule. My idea is that you
want to have metadata that are good enough that someone with
whom you may never come in contact can use your data for
some application that you yourself would have never
imagined. I think that when we are in a position to let
people really create quality metadata and when we give
people usable metadata tools that make sense from their
perspective, we will be well on the way of reaching our
goal.
Previous |
Next
Questions:
Email us or Call (215)
893-1561
Copyright © 2003 NFAIS. All rights
reserved. No part of this product or service may be
reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written consent.
Privacy
Policy |