Search NFAIS

Home
About NFAIS
Events

Promotions
Information Community News
Press Releases
Members
Committees
Join NFAIS
Contact NFAIS

Member Login



 

 

 

 

 

 

 

 

 

 

 

Home  >>  Publications  >>  Metadiversity  >>  Preprints Contents
 
Preprints of the Metadiversity Conference Proceedings

  Session 2: The Challenge in Species Discovery and Taxonomic Information

Doing the Impossible: Creating a Stable Species Index and Operating a Common Access System on the Internet

FRANK BISBY, Director, Centre for Plant Diversity and Systematics,
University of Reading, Species 2000

ABSTRACT

The Species 2000 Project is working with some novel techniques in its ambitious mission to create an index of the world’s known species. One is the creation of stable taxonomic indexes for individual groups of organisms by the member organizations: the Global Species Databases. How may this be done? How may the taxonomy be stabilized yet fluid enough to accommodate change? Another is the creation of a common access system to address an array of such databases so that they can operate as a single virtual index covering all groups. If existing Global Species Databases are to be used, this becomes a demanding specification at the computer science level, quite apart from the challenge of forming a seamless index from the components that are compiled independently. The task may indeed be severe, but it is not impossible. Species 2000 can report progress in both areas.

I must admit, I was not initially delighted to receive this request to speak on "Doing the Impossible–Creating a Stable Species Index and Operating a Common Access System on the Internet." Apart from being the longest title I have ever had for a paper given in a symposium, I also felt that this might be a poisoned chalice. But I decided that, in fact, it provided a nice challenge, and I shall try to face that challenge.

Creating Stable Taxonomic Indexes

The first thing you need to know is that I have been working with a team of people around the world who call themselves Species 2000. Our motive is to create an index to the world's species–not by creating one database with a list of all the species in it, but, in fact, by setting up an interoperable system that, using a common access system, will address a central array of taxonomic databases. The second thing you need to know is that we have already made some structural plans regarding how to do that. We are trying to make a stable index–or what we call a Global Species Database (GSD)–for each taxon of organisms.

Clearly, there are many taxonomic databases around the world that have species checklists and taxonomic opinions at their call. Many of those databases are very good databases with excellent data in them. But if you think about taking those databases and using them as the source of information to make a universal list of all organisms, you run into two problems. One is that the data sets overlap. But worse than that, each of them has been internally optimized for one region of the world but not globalized among the different systems. So, if you were to put them together you not only would have to deal with the overlapping species, which might be classified differently in the different databases, but also with the fact that the species may be categorized in the taxonomic structure of families and orders and so on differently. Therefore, you cannot put these databases together end to end. You have to look inside them and rework them.

If you can persuade different communities around the world to create a global index, a global checklist of species for each taxon, then the organization that tries to compile a universal list does not have to understand how each database is structured. They can be put together end to end. Provided there are no overlaps, no demarcation disputes, then that is satisfactory from the point of view of a global list. So, this is the reason why we set ourselves the two tasks that are addressed in the title for this talk. First, how can different groups of specialists create a stable taxonomic index for each group of organisms? And second, how can we create a common access system on the Internet that will allow us to address those components?

My title begins with the words "Doing the impossible." The question to answer is, is the taxonomist capable of producing a stable species list? If the answer is "yes," the next question is, would that list be a useful thing to have? I am going to respond to this with two examples. The first is one that is very close to home. Those of you who know me know that I am a botanist and I work on legumes. For 11 years now, I have led an enterprise worldwide in which we have been creating a Global Species Database for legumes. It is on the Web as Legume Web. It has many faults, this project. It is far from being an ideal, but I can use it as a vehicle for explaining to you how it is that I believe we can make stable species indexes. I will then move away from being egocentric by discussing some of the other models, which will indicate how this is, in fact, a reasonably achievable goal throughout the community.

Legumes: A Global Species Database

With the legume database, we are talking about creating a list of species recognized to exist by specialists. That means that we must include not only taxa and the Latin names of taxa as some people accept them, but also synonyms and taxonomic opinion.

How have we tried to do this for the 19,000 legumes around the world? We have taken it as a two-stage process. For the first stage, we organized regional centers that have been compiling species lists of legumes for their parts of the world, and those lists are the starting material. In most cases the centers use the same software, and in most cases we had extreme difficulty–and still have extreme difficulty–in merging those databases together into one file.

The second stage is to get panels of experts for different groups of legumes. Legumes are normally thought of as falling into 32 tribes of plants. For each of the small tribes or for each of the large genera, we have anywhere from one to four monographic specialists around the world, thus creating a network approaching 100 people whom we contact to try and bring the taxonomic checklist into a responsible opinion.

One job of this network of people is to globalize, to establish a system of genera of the species that will function on all the continents. Of course, the regional data sets sent to us include some local features in the taxonomy. But we have to make sure that the features of acacias, for example, are treated the same way for African plants as they are for American plants as they for Australian plants. For example, one of the Australian acacia experts has decided that acacias should be divided into three genera. That may cause a problem if he has not stated where the African species or the American species would fit in those three genera. We get group panel specialists to say which system will work for the whole world.

Now, of course, the result is an opinion. And there are alternative schemes. The alternative schemes must be cross-indexed in the system through the synonyms, so you can see the data from another scheme if you go in using the preferred or default scheme that we have adopted.

So while we are going for a preferred or usable system, we are also cross-linking the alternatives. This is achieved through panels of experts, who subdivide the various tribes of plants. For example, a friend at the Royal Botanical Gardens is one of four people who work on the Caesalpinia tribe. He, in fact, is the one to whom others defer. He recently did his thesis on Caesalpinia and sorts that particular genus. So that database has been through two processes. It reflects the local expertise of which species are where, and it captures global taxonomic expertise to bring them together into a coherent system. It is available on the Web, at our Legume Web Service at <www.ILDIS.org>. That is just one example, then, of a team of people from around the world deeply imbedded in the taxonomic profession making a Global Species Database for one group of plants.

Going Beyond Legumes

How can that be accomplished for other groups of organisms? Well, there is not a single route to that destination. In fact, there are various routes. For instance, some organizations appear to me to be working in a region-by-region system. For example, we are talking to the producers of a mollusk database in the U.S.A. and a mollusk database in Paris. In some taxon-by-taxon systems, many of the families have been provided by special family experts putting them into the larger system. The International Legume Database and Information Service (ILDIS) used a combination of these two techniques. Some people start with the names from an index or from the zoological record. So Marshal Crosby, making the mollusk list for the world, started from the names–and, of course, some databases didn't inherit the taxonomy from specialists–and then worked with them to create the database. In the Philippines, experts are working on a fish base, but the baseline taxonomy comes from experts in California. Similarly, a database on bacteria takes its base list of species from the International Journal of Systematic Biology (IJSB). So these are different routes by which existing databases or data systems can approach this ideal of becoming part of a world species list.

Now the ideals here are very demanding. Once we thought we would not find one database in the world that met all the demands. We are now talking to 65 such database organizations around the world covering many more than 65 groups. And I have to say that my own project–ILDIS–comes fairly close, but it is certainly not completely there. The one that comes closest in my mind is the world's list of mammals based in the Smithsonian. The only question about it is whether or not the taxonomic expertise put into it is fully global or whether it, in fact, is rather restricted by the set of 20 Americans who developed it. But apart from that, it meets all the demands that I know of for such a system.

So, stable species lists do exist and I would contend that they can be produced and they can be maintained through time. They must be embedded deeply in the taxonomic community so that they can move forward and be fluid. Nobody is talking about their being frozen. Rather, we are talking about their being decoupled. We are talking about their together forming a responsible taxonomic consensus–a practical system decoupled from some of the minutiae of the day-to-day taxonomic debates that move to and fro.

Creating a Dynamic Access System

The second part of my talk is about how we are going to organize these different systems to be available on the Internet through a dynamic access system and what challenges are faced in creating that system. The key word here is federated systems. But federated systems are completely different levels of endeavor.

Let us look quickly at the different challenges that make what we were doing seem impossible and that were, to some of us, seemingly insurmountable at the start. I would like to report to you that we are making at least some progress with them.

At the top level, there is great complexity. We–the taxonomic community--are not a multinational organization telling its offices around the world how to use identical software and how to proceed. We are a moving, seething mass of heterogeneous, different databases around the world operating on various platforms and using different database management systems. Of course, this is the classic heterogeneity problem. We have to have interoperability by cross-mapping onto a very simple model at the center to ensure that we can get minimum data to and from those systems.

The Problem of Scalability

We have prototypes working for five or ten databases. Will this extend or operate nicely with 100 or more other systems?

The Problem of Autonomy

Autonomy is another issue. We need a model that makes it possible and desirable for participation by these different projects. We must accept their heterogeneity and learn to live with their autonomist behavior.

The Problem of Stability

Another issue is stability. You might think this is just a matter of there being an ice storm in Montreal or a tornado in the Philippines that puts all the systems out of commission for a day or two. Actually, the most frequent reason for the databases going down is because of internal management problems within multilayered institutions. So, I am at the University. I go away for a week to a conference and I get back and find that our server is down or our server is disconnected. Why? Because bureaucrats in the Computer Service Department changed the allocation of machines and a little piece of paper went around telling us about it six months before and that paper went to the head of the department and not to me. So, I get back and find that the system is disconnected, and it takes me two days to get it back up. Now, this happens in all multilayered institutions around the world. If you are confident that this never happened in your institution, then that is great–but just watch out. That is computer science.

There are other issues to consider with regard to stability, including the issue of interoperability and the question of which standards to use. One of our prototypes provided by the Japanese uses CORBA to link the different databases. It also is necessary to decide whether or not everything goes by a server hub and out to the databases. And this is where the question of stability comes in.

Clearly, we can replicate the servers by having mirror sites. But what about the actual taxonomic databases? At present if you go to the American site or the Japanese site and ask about legumes, you still go to the same server that holds the legume database. If that server is down, you will not get your reply about legumes. We have two ways of dealing with this. We are going to have a backup to the so-called "annual checklist." So, if you cannot get a live version for any sector, you will fall back to a static version. Of course we could also just duplicate each of the databases at the peripheral sites by having a second site holding each database.

The Problem of Taxonomic Knowledge

How are we going to create a seamless catalogue produced from these bits and pieces from different peoples’ databases? The answer is that we know a great deal about how different databases vary. We do have a model we are working on, for which the main challenge is getting the name base and the taxon base interfaces to give a harmonious appearance.

The Problem of Demarcation and Overlaps

Of course, some of the databases have duplication. And it is not true that each species that is covered with a Global Species Database is covered only once. So, there are overlaps, and it becomes a question of shading in. For example, we may want, as a Global Species Database, to go to the Missouri Botanical Garden just for mosses and at that particular point in time have the flowering plants shaded out.

Pluralism, of course, worries people a great deal. They are a little bit afraid that our Species 2000 project is going to somehow impose on them, and that everybody will have to conduct research on a fish according to a certain person’s system, or legumes according to somebody else’s system. What we need is at least one good Global Species Database for each group of organisms. If we have two or three, then is that not a wonderful excess, for then we can then choose among them.

There is more than one world system for mammals, for fishes, and for bacteria. The list does not go a lot further than that for groups that are duplicated. Where we have duplicate groups, then there are at least two different user attitudes: Some people really know which system they want to use, and others do not care. The people who do not care often ask the question: Will you tell us the taxonomy that we can use just to name these organisms (which must be the same as the one you tell the other people down the street)? They need this question answered because if they use the same names, their data will match. So, in areas of taxonomy where there is pluralism, there is pressure for us to get some organization–maybe BIOSIS, which can monitor the uses around the world in the literature–to tell us which taxonomy to use for the default and for the people who do not care. But clearly you want to offer a choice to those people who do care, who want to follow a particular system for fishes or whatever.

The Problem of Missing Sectors

Another problem is missing sectors. The databases that we are working with, if they were full, would cover only 40 percent of the world's known organisms. So there is a remaining 60 percent to be done. We are working very carefully with the Organization for Economic Cooperation and Development (OECD) and with the Global Environment Facility (GEF) to try to make proposals as to how new projects might be started or existing projects might be diverted to achieve Global Species Database status.

The Problem of the Human Element

The human element–the sociology–is enormously important. The great institutions–the Smithsonian, the Royal Botanical Garden, the Missouri Botanical Garden–must come alongside network projects like the ILDIS project I described to you earlier, alongside smaller institutions, and alongside individual people whose whole careers have gone into making one database such as their personal property. All of these different databases have to be used, and we must figure out how to bring these people alongside each other in a federation.

Then there is the question of nationalism and regionalism. Our plan is a global plan, but there are almost no global resources. So, we have to set up Species 2000 Japan. We have to work with the Integrated Taxonomic Information System (ITIS) program here in the States. We have to work with the European Union to try and mobilize parts of the project with regional or nationalistic names on them even though they are part of the global program.

The Problem of Money

Lastly, of course is the question of whether to make things available for free or whether there has to be some cost recovery on the usage of systems. Desires and attitudes vary enormously around the world, and this is very troublesome to all global organizations. We are trying to live with a heterogeneity there as well.

So, these are the challenges that we continue to face. Scalability and autonomy are, in my opinion, more likely to trip us up on the computer science than on the system heterogeneity or the stability, which are our priorities. With taxonomic knowledge, we do know how to handle the heterogeneity. On demarcation, we have some ideas. On pluralism, I think we know how to handle it, but it is a very sticky issue with the taxonomist. The question is whether the taxonomists in a particular group of organisms will allow one system to be used or whether they will insist on slugging it out with alternatives.

Lastly, we are working very hard to draw many institutions together. We are also working very hard in Australia, in Europe, and in the U.S. and North America to make sure that nationalism does not pull us apart. We need to use national and regional funds, but we must aspire to a global program.

Previous | Next

 


Questions: Email us or Call (215) 893-1561

Copyright © 2003 NFAIS. All rights reserved. No part of this product or service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written consent.

Privacy Policy