Search NFAIS

Home
About NFAIS
Events

Promotions
Information Community News
Press Releases
Members
Committees
Join NFAIS
Contact NFAIS

Member Login



 

 

 

 

 

 

 

 

 

 

 

Home  >>  Publications  >>  Metadiversity  >>  Preprints Contents
 
Preprints of the Metadiversity Conference Proceedings

  Session 1: The Nation’s Call to Action

The National Biological Information Infrastructure (NBII) Framework Plan A Roadmap for Interoperable Sharing of Biodiversity Information

JAMES L. EDWARDS, Deputy Assistant Director, Directorate for Biological Sciences, National Science Foundation

ABSTRACT

The NBII is a growing network of collaborating organizations that make available a wide range of biodiversity and other biological data. The PCAST report envisions the next generation network (NBII-2) in which both technological innovation and institutional cooperation take a quantum leap. An essential step in developing the NBII-2 is a framework, similar to the Strategy for the National Spatial Data Infrastructure, published by the Federal Geographic Data Committee, that lays out strategic goals and sets a process for fully involving the wider biodiversity community. The Biodiversity and Ecosystems Informatics Working Group (BioEco), a subgroup of the National Science and Technology Council’s (NSTC’s) Committee on Environment and Natural Resources (CENR), has developed a draft framework for NBII-2 that includes five major goals: 1) getting the broadest possible participation of both public and private sectors; 2) encouraging greater coordination of and support for research and development on advanced systems and technologies; 3) promoting the use of collaboratively developed standards; 4) increasing federal R&D to support biodiversity and ecosystems informatics; and 5) cooperatively developing the long-range implementation plan for the next generation National Biological Information Infrastructure. These goals will be presented in the context of where the NBII is today; the vision for the future; and how the NBII-2 intends to work synergistically with other biodiversity data-sharing efforts at the national, regional, and global levels.

As noted in the previous paper, the President's Committee of Advisors on Science and Technology (PCAST) has an extremely ambitious vision for what our next generation of the National Biological Information Infrastructure should be.  It is going to be a magical place that will let us do everything we want to do in the area of biological information.  What I will present is where are we in realizing that goal and what our road map is for trying to make it happen.

One of our problems in implementing the National Biological Information Infrastructure is dealing with biologists.  When we went to graduate school or undergraduate school, biologists didn't really get that much training in quantitative methods.  Also, as Bill Brown pointed out in his opening remarks last night, when he was in graduate school, computers did not exist–students had to utilize mechanical calculators.  A few years ago at a conference on computing and biology, Michael Levitt made what I think was a very prescient quote:  "Computers have changed biology forever even if most biologist don't yet realize it."  I think most of you in this room have realized it–and this is what this conference is all about.  But one of the things we are going to have to do as a group is to find out how to get biologists to utilize computers.  Most importantly, we will have to figure out how to get the information that we need to use in our daily lives–as biologists, as computer scientists, as whatever–digitized and utilized.  Part of the way we get there is through the National Biological Information Infrastructure (NBII).

The Purpose of the NBII

The NBII was first conceived in the 1993 report put out by the National Academy of Sciences–a report not coincidentally chaired by Peter Raven, the individual who was also the chair of the PCAST subgroup that produced Teaming with Life.  So, it is not surprising that many of the recommendations in the Biological Survey for the Nation report and in Teaming with Life are quite congruent with each other.  As has been pointed out, the NBII is a federated activity, in which the databases, the control of the data, and the feeding and care of the data reside out in the sites that collect and own those data.  The NBII acts as a pointer, as a metadata source for information about these databases.  It is managed by the Biological Resources Division of the United States Geological Survey (USGS/BRD).

The USGS does not do this work alone.  It works in cooperation with a growing number of partners.  Many collaborators–including museums, universities, other federal agencies, libraries, commercial organizations, and nongovernmental organizations–are involved in the partnership that leads to the NBII.  Currently there are about 300 active data sets.  The Integrated Taxonomic Information System (ITIS), a system that provides access to accredited names of organisms in North America, is being used to pull together the taxonomic coordination of these various data sets.

The NBII is working on vocabulary and metathesaraus development.  In addition, one of the most important aspects of the current NBII is the Metadata Clearing-House Gateway.  The metadata descriptions in the NBII are developed using a biological profile from the Federal Geographic Data Committee's (FGDC’s) Content Standard for Geospatial Metadata.  The Clearing House also is a participating node in the National Spatial Data Clearing House.  The NBII must, and is, working with other relevant organizations and other relevant standard setters in order to develop its activities.

Where do we go from these 300 data sets that currently exist within the NBII?  How do we implement the grand vision that the President’s Committee of Advisers on Science and Technology has laid out for us?  The answer lies with the next generation–the NBII-2.

Challenges Facing the NBII-2

The NBII-2 will not simply be a data center, a traditional library, or a research institute.  Rather it will partake of attributes of all of these kinds of things.  It will be a distributed facility with the capability to interoperably access these various data sets to simultaneously query them; to synthesize, correlate, and analyze the information; and to be able to produce and present the information in a way that is visually appealing and useful to a wide variety of users.

Obviously this is going to take a lot of research relating to the kinds of data that we are talking about–the difficult data, the complicated data that individuals dealing in biodiversity are developing.  This leads to the first challenge facing NBII-2:  to find a way to get the research done that would allow the NBII-2 dream to become reality.

The second challenge is to develop an infrastructure and an organization that would allow us to pull this information together and make the NBII-2 happen.  As mentioned in the previous paper, PCAST suggests that this should happen through a series of nodes–at least five nodes regionally distributed around the United States.  These nodes would act as sites where the appropriate software and the appropriate computing power would allow users to interoperably dial into the node to get information to do the kind of searches they want to do.  These centers will also be able to act as archiving sites.  The NBII does not want to take over data–instead, it wants to leave data out in the sites that developed those data, so that the data are locally owned.  We all know that there are many situations occurring where data sets are being orphaned, where information is being lost.  We need some way, when a researcher retires or when an institution goes out of existence, to archive the information that would otherwise be lost and to make that data available to future users.  The NBII-2, through its national nodes, will be able to act as a place where that kind of archiving and central data storage can occur.

The third challenge in making NBII-2 happen is semantic or social interoperability.  How do we get the various people, the various organizations, the various institutions necessary to this project to work together?  To help start us on that path, the Biological and Ecological Informatics Working Group has prepared a draft framework for NBII-2, with five goals.  The framework is appended to this paper.

The Goals of the NBII-2

The first goal of the framework is to obtain the broadest possible participation of both public and private sectors in developing the NBII-2.  We cannot do it alone.  Even though we appear "all-powerful" in Washington, we still need to have interactions with and help from the rest of society (unless you want to give us all your tax dollars!).  We need to develop a common vision and understanding of the mutual benefits and activity the NBII-2 can bring to define what the fundamental data and information components of the NBII-2 should be; to develop a long-term plan; to encourage broad participation; to coordinate with other national and international initiatives like the Clearing-House Mechanism; and to promote policies and programs to fulfill this vision.  These are the broad kind of goals that any organization has, but we really need to do this by interacting and pulling together as many different kinds of public and private organizations and individuals as we possibly can.

Second, we need to encourage better coordination and support for research and development on advanced systems and technologies.  This is in response to a challenge I have already mentioned:  to do the right kinds of research to develop the tools, technologies, and architectures that will allow us to implement the vision of PCAST; to find ways to define the respective interests and the complementary roles that will be necessary to do this research and development; and to identify and overcome the barriers that exist to developing software, hardware, and other interoperable tools.

Third, and very germane to this particular meeting, we need to promote the use of highly collaboratively developed standards.  We all know that there are hundreds–thousands–of different biodiversity databases out there.  How do we develop the means for them to interoperably talk to each other?  Subsequent papers will refer to these problems and present suggestions about how we can get around them.  In this framework, we suggest that we need to work in public and private partnerships to identify and prioritize the kinds of standards that we need through activities like this particular conference, to promote standards development, and to encourage linkages with other groups doing similar kinds of things–groups like the Geospatial Data Committee and like the national and international standards organizations.

The fourth goal in the framework is to increase federal support for R&D on biodiversity and ecosystems information.  Federal support will be used both for funding within the government and for funding outside of the government–in universities, in museums, and in other research venues.  We need to identify existing research and development activities like the digital libraries, like the Knowledge and Distributed Intelligence thrust at the National Science Foundation.  We need to work through things like the National Science and Technology Council’s high performance computing thrust, where part of the goal is to promote biological and ecological information as an important application area within the National Biological Information Infrastructure.

Finally, the fifth goal is to cooperatively develop the long-range implementation plan for the next generation of the NBII.  How do we propose to do that?  First we propose to put together an interagency and public/private task force that will work to construct this framework, work to develop the next generation, and work to implement the vision of PCAST.  It will identify funding and other means for bringing information systems R&D to the next generation, will help develop an out- year budget for implementing this within the federal sector, and will figure out how to develop partnerships of the industrial sector, the private sector, and the Non-Government Organizations (NGOs).

When we are able to fulfill the grand dream for the next generation of the NBII, it will truly be a gateway to a wide diversity of different kinds of biological data, not just biodiversity data.  We need to be able to link biodiversity data to genetic data.  For example, we need to find ways to link the large sequence databases and the Protein Data Bank (PDB) to biodiversity data, in order to answer lots of different kinds of questions that we all have.

We see the NBII-2, then, as being a "one-stop shopping" node, a place where you can go to get access to not only biodiversity data, but also other kinds of data, and where competent technologies make the information available to everybody.

Connecting to Other Entry Points

Now, I said we wanted it to be this one-stop shopping node, this place where you can go to get access to other kinds of information and data developed at other places.  We see it as providing access to information from local-level sites–county parks, the nature conservatories, state heritage sites, museum collections, etc.  The NBII would also be the U.S. entry point that would work with common or similar kinds of entry points in other countries.  The Canadian Biodiversity Information Infrastructure, CONABIO in Mexico, ERIN in Australia, INBio in Costa Rica–these are examples of the kind of national-level node that the NBII-2 intends to become.  We see it then as also providing information to allow integration with activities at the regional and global level–to the North American Biodiversity Information Network, the InterAmerican Biodiversity Information Network, the Clearing-House Mechanism, and the Global Biodiversity Information Facility.  I will say just a couple of words about a few of some of these regional- and international-level activities.

The North American Biodiversity Information Network.  The North American Biodiversity Information Network (NABIN), as the name implies, focuses on North America.  It is sponsored by the Council for Environmental Cooperation under the North American Free Trade Agreement (NAFTA).  NABIN intends to develop standards and protocols for the exchange of biodiversity information–focusing, of course, on the North American perspective–to develop Internet-based query systems.  NABIN has a pilot project up and running.  The pilot project is looking at the birds of North America, building on the quite successful activity begun in Mexico to determine which birds of Mexico are catalogued in data sets and museums in the rest of the world.

The InterAmerican Biodiversity Information Network

The InterAmerican Biodiversity Information Network (INABIN) covers North America, South America, and the Caribbean.  As with all these activities that are Internet-based, it is intended to be a site that will be especially useful for decision-making and education activities.  It builds on other initiatives–the Clearing-House Mechanism, the Man and the Biosphere, MABNET, and the Biodiversity Conservation Information System.  Its implementation support is through the Organization of American States (OAS), and Brazil is going to host an InterAmerican INABIN meeting in the Spring of 1999.

Global Biodiversity Information Facility

Finally I would like to say a couple words about the Global Biodiversity Information Facility (GBIF).  Not least of the reasons for doing this is that I chair the group that is making this recommendation.  The Organization for Economic Cooperation and Development (OECD) has something called the Megascience Forum.  The Megascience Forum is intended to focus on big-scale science.  Up until now, big-scale science has meant big, fixed physics facilities, primarily telescopes and synchrotrons.  A few years ago, we in the U.S. made the argument that if you look at information, information is not–in and of itself–something that requires one big, fixed facility.  But if you look at the need for information, the need for developing databases, and the need for pulling those databases together, then information is indeed a megascience–one that is a distributed megascience, but nevertheless requires thinking about and development of the same kinds of things needed for big, fixed physics facilities.  We were able to convince the other Megascience Forum members of the OECD that this was an important and appropriate area of consideration.

OECD thus formed a Biological Informatics Working Group that has focused on two major kinds of biological information–neuroinformatics (how do we develop databases and tools?) and biodiversity informatics.  The Biodiversity Informatics subgroup is recommending the formation of a Global Biodiversity Information Facility–GBIF, as we like to refer to it.  GBIF, like all the other things we talked about today, is going to be a distributed activity–one that will focus on biodiversity information and one that will pull together the large amount of biodiversity information that resides in OECD countries and, we hope, elsewhere.

Part of the rationale for making this recommendation is to note that information about the world's biodiversity–not the biodiversity itself, but information about that biodiversity–largely resides in OECD countries.  Mobilizing that information, digitizing it, and making it available to the world at large is something that the OECD countries can do on behalf of the world at large.  So, we see providing that information as a component, as something that the OECD can do to aid the Clearing-House Mechanism of the Convention on Biological Diversity, even though all OECD countries are not signatories of that convention.  In order to make that happen, we recognize that one needs to have, in essence, an authority file of the names of the species of the world–something that currently does not exist.  So, one of the major activities of GBIF will be to help pull together a list of the names of the world's species.  Of course, there are other international activities that are already ongoing to make that happen–most especially, Species 2000.  We are working with Species 2000 in order to help develop this worldwide authority file of the names of existing species in the world.  This is another big idea of PCAST for NBII-2.

When and how is GBIF going to happen?  We have made the recommendation that GBIF will start when five countries have agreed to form a secretariat to pull together the activities within those countries that will be the nucleus of the formation of GBIF.  That is likely to happen sometime in the middle of 2000.  There will be a ministerial meeting of the OECD countries held in June, 1999, and I am quite hopeful that we will have five countries at about that time that will agree to form GBIF.  If that happens, we will then put out a request for proposals from countries to host the secretariat.

I want to stress that although the idea for and planning of the GBIF has happened within the OECD, it is not simply intended to an OECD activity.  Rather it is one where we want any country to be able to join in.  Certainly the data, once they are up and available, will be open to anybody to retrieve, but we would also like to try to get everybody to help provide data that they have available.

Conclusion

Finally, I would like to go back to why we need something like the NBII-2 –what is it going to accomplish and what do we want to do with it.  Here are some lofty words that come from the NBII framework.  They say that "The basis for all efforts to effectively conserve biodiversity and natural ecosystems, while at the same time supporting economic development, lies in our ability to get at the widest possible access to the existing body of knowledge on biodiversity and ecosystem resources and processes."  Another quote I really like is by someone who does not work in biodiversity–Alan Bleasby.  In The Biochemist, he said, "Two months in the lab can easily save an afternoon on the computer."  This is a situation that already pertains for people working in many parts of biochemistry.  The databases exist for people doing genomics, for people who want to look at sequence activities, for people who want to look at protein structure.  The data already exist for them to be able to not have to spend two months in the lab.  But we don't yet have that situation for people working in the area of biodiversity informatics.  However, I am quite hopeful that as a result of this conference and others like it, we will at least be on the road to having that capability be developed.

Previous | Next

 


Questions: Email us or Call (215) 893-1561

Copyright © 2003 NFAIS. All rights reserved. No part of this product or service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written consent.

Privacy Policy