Preprints of the
Metadiversity
Conference
Proceedings
Session 1: The Nation’s Call to Action
The National Biological
Information Infrastructure (NBII) Framework Plan A Roadmap
for Interoperable Sharing of Biodiversity Information
JAMES L. EDWARDS,
Deputy Assistant Director, Directorate for Biological
Sciences, National Science Foundation
|
ABSTRACT
The NBII is a growing
network of collaborating organizations that make
available a wide range of biodiversity and other
biological data. The PCAST report envisions the
next generation network (NBII-2) in which both
technological innovation and institutional
cooperation take a quantum leap. An essential step
in developing the NBII-2 is a framework, similar to
the Strategy for the National Spatial Data
Infrastructure, published by the Federal Geographic
Data Committee, that lays out strategic goals and
sets a process for fully involving the wider
biodiversity community. The Biodiversity and
Ecosystems Informatics Working Group (BioEco), a
subgroup of the National Science and Technology
Council’s (NSTC’s) Committee on Environment and
Natural Resources (CENR), has developed a draft
framework for NBII-2 that includes five major
goals: 1) getting the broadest possible
participation of both public and private sectors;
2) encouraging greater coordination of and support
for research and development on advanced systems
and technologies; 3) promoting the use of
collaboratively developed standards; 4) increasing
federal R&D to support biodiversity and ecosystems
informatics; and 5) cooperatively developing the
long-range implementation plan for the next
generation National Biological Information
Infrastructure. These goals will be presented in
the context of where the NBII is today; the vision
for the future; and how the NBII-2 intends to work
synergistically with other biodiversity
data-sharing efforts at the national, regional, and
global levels. |
As noted in the previous paper, the President's Committee of Advisors on Science
and Technology (PCAST) has an extremely ambitious vision for what our next generation
of the National Biological Information Infrastructure should be. It is
going to be a magical place that will let us do everything we want to do in
the area of biological information. What I will present is where are we
in realizing that goal and what our road map is for trying to make it happen.
One of our problems in implementing the National Biological Information Infrastructure is dealing with biologists.
When we went to graduate school or undergraduate school, biologists didn't really get that much training in quantitative
methods. Also, as Bill Brown pointed out in his opening remarks last night, when he was in graduate school, computers
did not existstudents had to utilize mechanical calculators. A few years ago at a conference on computing and biology,
Michael Levitt made what I think was a very prescient quote: "Computers have changed biology forever even if most biologist
don't yet realize it." I think most of you in this room have realized itand this is what this conference is all
about. But one of the things we are going to have to do as a group is to find out how to get biologists to utilize
computers. Most importantly, we will have to figure out how to get the information that we need to use in our daily
livesas biologists, as computer scientists, as whateverdigitized and utilized. Part of the way we get there is
through the National Biological Information Infrastructure (NBII).
The Purpose of the NBII
The NBII was first conceived in the 1993 report put out by the National Academy of Sciencesa report not coincidentally
chaired by Peter Raven, the individual who was also the chair of the PCAST subgroup that produced Teaming with Life.
So, it is not surprising that many of the recommendations in the Biological Survey for the Nation report and in Teaming with Life
are quite congruent with each other. As has been pointed out, the NBII is a federated activity, in which the databases,
the control of the data, and the feeding and care of the data reside out in the sites that collect and own those data.
The NBII acts as a pointer, as a metadata source for information about these databases. It is managed by the Biological
Resources Division of the United States Geological Survey (USGS/BRD).
The USGS does not do this work alone. It works in cooperation with a growing number of partners. Many
collaboratorsincluding museums, universities, other federal agencies, libraries, commercial organizations, and
nongovernmental organizationsare involved in the partnership that leads to the NBII. Currently there are
about 300 active data sets. The Integrated Taxonomic Information System (ITIS), a system that provides access to
accredited names of organisms in North America, is being used to pull together the taxonomic coordination of these
various data sets.
The NBII is working on vocabulary and metathesaraus development. In addition, one of the most important
aspects of the current NBII is the Metadata Clearing-House Gateway. The metadata descriptions in the NBII
are developed using a biological profile from the Federal Geographic Data Committee's (FGDCs) Content
Standard for Geospatial Metadata. The Clearing House also is a participating node in the National Spatial
Data Clearing House. The NBII must, and is, working with other relevant organizations and other relevant
standard setters in order to develop its activities.
Where do we go from these 300 data sets that currently exist within the NBII? How do we implement the
grand vision that the Presidents Committee of Advisers on Science and Technology has laid out for us?
The answer lies with the next generationthe NBII-2.
Challenges Facing the NBII-2
The NBII-2 will not simply be a data center, a traditional library, or a research
institute. Rather it will partake of attributes of all of these kinds
of things. It will be a distributed facility with the capability to interoperably
access these various data sets to simultaneously query them; to synthesize,
correlate, and analyze the information; and to be able to produce and present
the information in a way that is visually appealing and useful to a wide variety
of users.
Obviously this is going to take a lot of research relating to the kinds of data that we are talking aboutthe
difficult data, the complicated data that individuals dealing in biodiversity are developing. This leads to
the first challenge facing NBII-2: to find a way to get the research done that would allow the NBII-2 dream
to become reality.
The second challenge is to develop an infrastructure and an organization that would allow us to pull this
information together and make the NBII-2 happen. As mentioned in the previous paper, PCAST suggests that
this should happen through a series of nodesat least five nodes regionally distributed around the United
States. These nodes would act as sites where the appropriate software and the appropriate computing power
would allow users to interoperably dial into the node to get information to do the kind of searches they want to
do. These centers will also be able to act as archiving sites. The NBII does not want to take over
datainstead, it wants to leave data out in the sites that developed those data, so that the data are locally
owned. We all know that there are many situations occurring where data sets are being orphaned, where
information is being lost. We need some way, when a researcher retires or when an institution goes out of
existence, to archive the information that would otherwise be lost and to make that data available to future users.
The NBII-2, through its national nodes, will be able to act as a place where that kind of archiving and central data
storage can occur.
The third challenge in making NBII-2 happen is semantic or social interoperability. How do we get the
various people, the various organizations, the various institutions necessary to this project to work together?
To help start us on that path, the Biological and Ecological Informatics Working Group has prepared a draft framework
for NBII-2, with five goals. The framework is appended to this paper.
The Goals of the NBII-2
The first goal of the framework is to obtain the broadest possible participation of both
public and private sectors in developing the NBII-2. We cannot do it alone.
Even though we appear "all-powerful" in Washington, we still need to have
interactions with and help from the rest of society (unless you want to give us all
your tax dollars!). We need to develop a common vision and understanding of the
mutual benefits and activity the NBII-2 can bring to define what the fundamental data
and information components of the NBII-2 should be; to develop a long-term plan; to
encourage broad participation; to coordinate with other national and international
initiatives like the Clearing-House Mechanism; and to promote policies and programs
to fulfill this vision. These are the broad kind of goals that any organization
has, but we really need to do this by interacting and pulling together as many different
kinds of public and private organizations and individuals as we possibly can.
Second, we need to encourage better coordination and support for research and
development on advanced systems and technologies. This is in response to a
challenge I have already mentioned: to do the right kinds of research to
develop the tools, technologies, and architectures that will allow us to implement
the vision of PCAST; to find ways to define the respective interests and the
complementary roles that will be necessary to do this research and development; and
to identify and overcome the barriers that exist to developing software, hardware,
and other interoperable tools.
Third, and very germane to this particular meeting, we need to promote the use of highly
collaboratively developed standards. We all know that there are hundredsthousandsof
different biodiversity databases out there. How do we develop the means for them to
interoperably talk to each other? Subsequent papers will refer to these problems and
present suggestions about how we can get around them. In this framework, we suggest
that we need to work in public and private partnerships to identify and prioritize the
kinds of standards that we need through activities like this particular conference, to
promote standards development, and to encourage linkages with other groups doing similar
kinds of thingsgroups like the Geospatial Data Committee and like the national and
international standards organizations.
The fourth goal in the framework is to increase federal support for R&D on
biodiversity and ecosystems information. Federal support will be used both
for funding within the government and for funding outside of the governmentin
universities, in museums, and in other research venues. We need to identify
existing research and development activities like the digital libraries, like the
Knowledge and Distributed Intelligence thrust at the National Science Foundation.
We need to work through things like the National Science and Technology Councils
high performance computing thrust, where part of the goal is to promote biological and
ecological information as an important application area within the National Biological
Information Infrastructure.
Finally, the fifth goal is to cooperatively develop the long-range implementation
plan for the next generation of the NBII. How do we propose to do that?
First we propose to put together an interagency and public/private task force that
will work to construct this framework, work to develop the next generation, and work
to implement the vision of PCAST. It will identify funding and other means for
bringing information systems R&D to the next generation, will help develop an out-
year budget for implementing this within the federal sector, and will figure out how to
develop partnerships of the industrial sector, the private sector, and the
Non-Government Organizations (NGOs).
When we are able to fulfill the grand dream for the next generation of the NBII,
it will truly be a gateway to a wide diversity of different kinds of biological data,
not just biodiversity data. We need to be able to link biodiversity data to
genetic data. For example, we need to find ways to link the large sequence
databases and the Protein Data Bank (PDB) to biodiversity data, in order to
answer lots of different kinds of questions that we all have.
We see the NBII-2, then, as being a "one-stop shopping" node,
a place where you can go to get access to not only biodiversity data,
but also other kinds of data, and where competent technologies make the
information available to everybody.
Connecting to Other Entry Points
Now, I said we wanted it to be this one-stop shopping node, this place
where you can go to get access to other kinds of information and data
developed at other places. We see it as providing access to information
from local-level sitescounty parks, the nature conservatories, state
heritage sites, museum collections, etc. The NBII would also be the
U.S. entry point that would work with common or similar kinds of entry points
in other countries. The Canadian Biodiversity Information Infrastructure,
CONABIO in Mexico, ERIN in Australia, INBio in Costa Ricathese are examples
of the kind of national-level node that the NBII-2 intends to become.
We see it then as also providing information to allow integration with
activities at the regional and global levelto the North American
Biodiversity Information Network, the InterAmerican Biodiversity Information
Network, the Clearing-House Mechanism, and the Global Biodiversity Information
Facility. I will say just a couple of words about a few of some of these
regional- and international-level activities.
The North American Biodiversity Information Network. The North
American Biodiversity Information Network (NABIN), as the name implies,
focuses on North America. It is sponsored by the Council for
Environmental Cooperation under the North American Free Trade Agreement
(NAFTA). NABIN intends to develop standards and protocols for the
exchange of biodiversity informationfocusing, of course, on the North
American perspectiveto develop Internet-based query systems.
NABIN has a pilot project up and running. The pilot project is looking
at the birds of North America, building on the quite successful activity begun
in Mexico to determine which birds of Mexico are catalogued in data sets and
museums in the rest of the world.
The InterAmerican Biodiversity Information Network
The InterAmerican Biodiversity Information Network (INABIN) covers North
America, South America, and the Caribbean. As with all these activities
that are Internet-based, it is intended to be a site that will be especially
useful for decision-making and education activities. It builds on other
initiativesthe Clearing-House Mechanism, the Man and the Biosphere,
MABNET, and the Biodiversity Conservation Information System. Its
implementation support is through the Organization of American States (OAS),
and Brazil is going to host an InterAmerican INABIN meeting in the Spring
of 1999.
Global Biodiversity Information Facility
Finally I would like to say a couple words about the Global Biodiversity
Information Facility (GBIF). Not least of the reasons for doing this
is that I chair the group that is making this recommendation. The
Organization for Economic Cooperation and Development (OECD) has something
called the Megascience Forum. The Megascience Forum is intended to
focus on big-scale science. Up until now, big-scale science has meant
big, fixed physics facilities, primarily telescopes and synchrotrons.
A few years ago, we in the U.S. made the argument that if you look at
information, information is notin and of itselfsomething that
requires one big, fixed facility. But if you look at the need for
information, the need for developing databases, and the need for pulling
those databases together, then information is indeed a megascienceone
that is a distributed megascience, but nevertheless requires thinking about
and development of the same kinds of things needed for big, fixed physics
facilities. We were able to convince the other Megascience Forum members
of the OECD that this was an important and appropriate area of consideration.
OECD thus formed a Biological Informatics Working Group that has focused on
two major kinds of biological informationneuroinformatics (how do we
develop databases and tools?) and biodiversity informatics. The
Biodiversity Informatics subgroup is recommending the formation of a Global
Biodiversity Information FacilityGBIF, as we like to refer to it.
GBIF, like all the other things we talked about today, is going to be a distributed
activityone that will focus on biodiversity information and one that will
pull together the large amount of biodiversity information that resides in OECD
countries and, we hope, elsewhere.
Part of the rationale for making this recommendation is to note that information
about the world's biodiversitynot the biodiversity itself, but information
about that biodiversitylargely resides in OECD countries. Mobilizing
that information, digitizing it, and making it available to the world at large is
something that the OECD countries can do on behalf of the world at large.
So, we see providing that information as a component, as something that the OECD
can do to aid the Clearing-House Mechanism of the Convention on Biological Diversity,
even though all OECD countries are not signatories of that convention.
In order to make that happen, we recognize that one needs to have, in essence,
an authority file of the names of the species of the worldsomething that
currently does not exist. So, one of the major activities of GBIF will be
to help pull together a list of the names of the world's species. Of course,
there are other international activities that are already ongoing to make that
happenmost especially, Species 2000. We are working with Species
2000 in order to help develop this worldwide authority file of the names of
existing species in the world. This is another big idea of PCAST for NBII-2.
When and how is GBIF going to happen? We have made the recommendation that
GBIF will start when five countries have agreed to form a secretariat to pull together
the activities within those countries that will be the nucleus of the formation of
GBIF. That is likely to happen sometime in the middle of 2000. There
will be a ministerial meeting of the OECD countries held in June, 1999, and I am
quite hopeful that we will have five countries at about that time that will agree to
form GBIF. If that happens, we will then put out a request for proposals from
countries to host the secretariat.
I want to stress that although the idea for and planning of the GBIF has happened
within the OECD, it is not simply intended to an OECD activity. Rather it is
one where we want any country to be able to join in. Certainly the data, once
they are up and available, will be open to anybody to retrieve, but we would also
like to try to get everybody to help provide data that they have available.
Conclusion
Finally, I would like to go back to why we need something like the NBII-2 what
is it going to accomplish and what do we want to do with it. Here are
some lofty words that come from the NBII framework. They say that "The
basis for all efforts to effectively conserve biodiversity and natural ecosystems,
while at the same time supporting economic development, lies in our ability
to get at the widest possible access to the existing body of knowledge on biodiversity
and ecosystem resources and processes." Another quote I really like
is by someone who does not work in biodiversityAlan Bleasby. In
The Biochemist, he said, "Two months in the lab can easily save an afternoon
on the computer." This is a situation that already pertains for people
working in many parts of biochemistry. The databases exist for people
doing genomics, for people who want to look at sequence activities, for people
who want to look at protein structure. The data already exist for them
to be able to not have to spend two months in the lab. But we don't yet
have that situation for people working in the area of biodiversity informatics.
However, I am quite hopeful that as a result of this conference and others like
it, we will at least be on the road to having that capability be developed.
Previous |
Next
Questions:
Email us or Call (215)
893-1561
Copyright © 2003 NFAIS. All rights
reserved. No part of this product or service may be
reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written consent.
Privacy
Policy |