Search NFAIS

Home
About NFAIS
Events

Promotions
Information Community News
Press Releases
Members
Committees
Join NFAIS
Contact NFAIS

Member Login



 

 

 

 

 

 

 

 

 

 

 

Home  >>  Publications  >>  Metadiversity  >>  Preprints Contents
 
Preprints of the Metadiversity Conference Proceedings

  Session 2: The Challenge in Species Discovery and Taxonomic Information

Taxonomic Information Systems–Stability through Diversity

HUGH WILSON, Professor of Biology and Curator, Departmental Herbarium, Texas A&M University

ABSTRACT

Biodiversity data are, by necessity, linked to taxa. This linkage is established at the headwaters of data flow–the scientific literature–and retained to the end point in that most clients will query biodiversity data using taxon names. While taxonomic names provide a symbolic "hook" for information linked to a given entity, the overall classification system expresses a nested array of biological relationships among taxa, i.e., a structure that allows interpretation and extrapolation of biodiversity data. This fundamental importance of classification systems as a source of stability and structure is countered by the simple fact that classification systems are inherently unstable. Continuous modification of classification systems is an ongoing response to new knowledge and scientific progress. This changing superstructure for biodiversity data presents minimal problems beyond the digital arena in that data are presented in a fixed (hardcopy) taxonomic context. However, the increased potential for metadata development via digital technologies will not be realized until procedures are in place that allow data placement within a dynamic or fluid taxonomic information matrix. These procedures will be difficult to develop via a centralized nexus of "standardization" or "authority" that carries a limited base of expertise or representation relative to the relevant scientific community. The digital network provides a new medium through which interactive development could be accomplished among a global community of taxonomic specialists. This path of "distributed" development, possibly mediated by extant professional societies, would incorporate the broadest range of expertise and also by default, allow progress toward a dynamic biodiversity data system via consensus-based decisions.

I attended an NSF-funded workshop on roughly similarly topics in 1996, in San Diego, and the result of that workshop was essentially nil. As you can see if you go to the home page describing the workshop, there is really no product. The "product" represented a mixture of grandiose schemes, competing programs, and a lot of jargon. It sidestepped some of the fundamental problems involved with the expression of biodiversity data on the Internet.

So, what I would like to do today is touch on at least one of these problems in some detail. This perspective that I am presenting has to do with my work as Chair of the Internet Communications Committee for the American Society of Plant Taxonomists and also as a functioning member of the Texas A&M working group that is dealing with reporting biodiversity data to the network. I do not represent either group, but my perspective is based on the work that I have done with these groups.

This symposium is concerned with enhanced data flow from various producers of various sorts to the ultimate consumer. The consumer is of utmost importance in the sense that this is where the data are going to be applied. The "metadiversity" notion certainly relates to levels of complexity in linking complex data resources to other complex data resources in ways that are yet to be defined.

Certainly open access is of primary importance. The producers have to have open access to the system to be able to develop data resources, and the consumers–wherever they may be–have to have open access to the product. So, you cannot operate or proceed with regard to biodiversity expression without assuming that it is the diversity of information itself that is of primary consideration. As scientific specialists, we have our concerns about the array of taxa and complex interlinkages and data structures and so on.

The Mission

Our primary mission must be to put fundamental, basic biodiveristy information on the Web and to make that information available for both specialists and nonspecialists to examine. This involves a data "triage." Since there is no way all the information can be put out there right from the beginning, things have to be prioritized. The highest-profile bits of information have to be established as the highest priority. Our discussion of metadata seems to me to be a bit premature when we really have not dealt with putting in the fundamental bits of information that are needed to, for instance, determine the distribution of species X, (a fairly simple problem): What is the distribution of species in genera X across North America? This is a fundamental question for which we really do not have a good answer at this point in time.

Certainly, another assumption is that whatever goes on with regard to metadiversity–or whatever else is going to go on the Web–is going to happen within an environment that is provided by the browser. And that environment requires that the systems established are fast and can deal with whatever problems there might be on the network, thus allowing the user to access both text and image data as quickly as possible. Again, we have to assume that access to the Web sites that express biodiversity data is going to be multifaceted in terms of the user base. Technicians, scientists, and specialists can certainly use the Web, but it has to be designed in such a way that the broadest user base–including users such as decision-makers in administrative positions–can also grasp the information that is provided.

As has been referenced here many times, the applications for this information may not yet be defined. In relation to this, let’s look at vascular plants. My area is in vascular plants and I am a bit biased. But the fact is that if you look at any system that relates to biodiversity in general, you are dealing primarily with vascular plants. For example, the Texas Parks and Wildlife Department provides a brief summary of the natural areas of Texas, which are defined in terms of the vascular plants occurring in these regions. Certain plants are, in fact, indicator species. Those that are dominant in certain areas can be used to define regions. Vascular plants also are indicators of diversity, because the relative pattern of diversity expressed by vascular plants is very often concordant with the pattern of diversity that we find in other organisms. You can extend this concept to a national–or even global–level, and biotic regions can be defined by the vascular plants present.

On the World Wide Web, you can go to
http://www.csdl.tamu.edu/FLORA/b98/check98.htm
and find a map that provides a graphic view of relative diversity of vascular plant taxa across the United States, Puerto Rico, and the Virgin Islands. The map is color-coded and illustrates the diversity mapping systems that we have developed based on 70,000 records. It includes not only accepted taxa (about 30,000+) but also geographic checkpoints at the state level for each taxon. These maps are generated "on the fly," and they can be generated for any taxonomic level. You can look at all vascular plants and their subclasses, as well as orders, families, genera, and so on, in this way. It gives you an immediate view of relative diversity of vascular plants across, in this case, the United States. The system can be applied to a single state within the country using a segmented color-coded item. Again, the maps are generated on the fly. There is a counter that racks up the number of taxa, as well as a device that allows the color-coding to be done of the California and Texas, pretty diverse states with regard to vascular plants. The point here is that the pattern of diversity among arthropods and whatever else is probably concordant with this pattern. Therefore, vascular plants are fundamental, and you can examine a Web site or an informatics resource and very quickly determine its content by what it has available in terms of vascular plants.

With this particular mapping system, each state is a segment. Not only can it be color-coded depending on how many taxa it carries, but also you can click on a state and get a text listing of the taxa that occur in that state. So, for example, a click on Texas will produce 6,000+ lines–two megabytes of text file–and give you a full listing of the entities in the vascular plant world that occur in Texas. In addition, it is interlinked–that is, these listings are coupled with references to other sources. We have links to fish and wildlife services, vegetation and regional maps for Texas, county-level maps for California, images of varying sorts involved with this other index system, and a map for a given taxon.

The Application of Names

The fundamental item of importance relative to this discussion is the array of taxa. This array provides a "hook" for biodiversity information, metadata, and so on. These individual entities also provide a view of biological structure. The placement of infra-specific groups, species within genera, and so on, is an interpretation of reality symbolized by the application of names.

As a result, the system of classification as expressed here is of fundamental importance. It serves two functions. It gives a picture of biological reality and relationships. It also provides individual hooks for bits of information–whether they be images or other entries for the taxa involved.

So, with regard to taxonomic structure, we have an array of names. There has been a lot of discussion of names here. Names fundamentally are in the public domain. The scientific names are comparable to musical notes in that anybody can access notes and put them together any way they want. The difference with regard to taxonomic structure or classification systems is that the array–the hierarchy that is established by a classifier–is the result of expertise, a lot of work, and personal investment, and is personally expressed. As a result, it is comparable to a score of music, a compilation of music. And as a result of that, it is owned. It associated with an individual.

Taxonomic Structure

Taxonomic structure will vary. There are different perspectives on the array of diversity that one finds in vascular plants or other taxa. This variance can occur depending on the geographic range covered by the system. For example, if you combine North America and Mexico, you get a different picture than you would have for each country separately. It relates to interpretations and applications of the international code of botanical nomenclature for vascular plants. It can change daily on the basis of research, different views generated by data that relate to relationships, and–finally and most importantly–taxonomic opinion. We have the USDA Plants Database, which carries at its Web site over 35,000 taxa of plants (many of economic importance). The classification system expressed there has some synonymy involved–that is, decisions made are based at the USDA. Also, the National Institutes of Health (NIH) has access to genetic information through its Biotechnology Information Center. There is less synonymy here, but there is a full-blown classification at the species level. This is another federal-based item that has a structure, a foundation established locally.

Fully synonymized checklists for vascular plants available on the network today have only one source–the Biota of North America Program in North Carolina. The first display of this information on the Web was produced at Berkeley through the Museum Informatics Program there. A digital checklist provides both mapping and listing. The version is concordant with the published version of the BONAP checklist, which was dated 1994. I learned today that the 1996 version of the BONAP, listed both in terms of the taxonomic structure generated by John Kartesz and the distribution information also generated by John Kartesz at BONAP, is now available.

he U.S. Federal Standard established by ITIS has been referred to here many times as the NBII. I am not sure of the connections between these groups. It is directly associated with the USDA Plants Database, evidently a direct transfer of the BONAP checklist to ITIS. As a result it also is the 1996 version. There were 6,000 changes between 1996 and 1998 for this checklist. But the version expressed at the ITIS site today is two years old, so those 6,000 changes are not expressed there.

The most recent expression of this very critical data set with regard to vascular plant information on the Web is at our site, which does express this 1998 version of the BONAP classification system, including the structure implied by that classification system and the distribution of information. Now a standard is possible.

Agreeing to a Standard

There has been discussion today about following the ITIS nomenclature for the Discover Life in America program. ITIS taxonomy has also been discussed for use in other programs. However, does ITIS represent a valid, legitimate standard that will be followed by those working in the area? There are many difficulties involved. These systems are very complicated, but they are also a moving target. A standard becomes immediately obsolete if it is published in hardcopy. This is because it is under constant revision and constant update, and as data becomes available it is just going to change. That, fundamentally, is the beauty of digital classification systems–that they can change immediately, that they can be updated immediately online. It is a very diffuse target and there are many variants. A lot of difficulties, including authority ownership and general "turf" problems, are involved with the development of the standard, both in terms of the source of the standard and from the perspective of those using the standard.

Finally, with regard just to vascular plants: Putting aside the insects, the viruses, and so on, you are looking at a very large and complex array of biotic entities. You have, for the area covered by the BONAP program (the United States, Puerto Rico, and the Virgin Islands), 30,000+ species organized into 290+ families. And there are variants available within any element of that hierarchy. So, it is a difficult task and not one to be quickly sidestepped if you are considering expression of metadata or any sort of data as a function of vascular plants.

In terms of expressing this information on the network, there are various options. I don't pretend to offer up a solution to the problem, but basically if something is going to be done that is functional, it will have to move beyond talking, jargon, and competing centers of development. It is going to have to be inclusive. You cannot have one person–John Kartesz at the BONAP program in North America–providing a standard for the global user base. It is not going to work and it is not going to be acceptable. You have to tap in to the full community of active folks in plant taxonomy. There has to be some sort of input involved. The computational system is set up in such a way that movement with regard to classification is a doable item on the network. It certainly is not an impossible task. And certainly any effort to develop a standard should focus on the fact that it is a reference and that the user would benefit if, in areas of controversy or uncertainty, different optional alignments could be expressed, both in terms of the classification and the underlying associated data.

Just looking at distribution, you can imagine what has to be done. But it can be done computationally to show the distribution between a species and a subspecies, depending on the rank of the entity. Certainly, with regard to anything that is produced by the federal government or any publicly supported entity, open access needs to be defined to include anybody who wants to download the information in a form that can be incorporated into whatever task is to be performed.

A Role for Professional Societies

You can't have the individuals or groups that are providing this fundamental bit of information be invisible at the site presenting that information. There has to be some sort of remuneration within the commerce of the Web that is relevant to contributors. And certainly the full community, again, has to be involved in the enterprise. Certainly a possible path for development–a path that we have recently proposed to NSF be at least explored through the Web site of the American Society of Plant Taxonomists–is to first of all focus on the Internet as a medium, not only for expression of information but also to use as an "Internet commons." The term "Internet Commons" has been applied here as a common medium through which an active group of specialists can work through the network to develop, maintain, and curate these sorts of classification systems.

Institutional or governmental centers?–they certainly have been tried. And other patterns of funding will continue to be tried. But I don't think they are going to be successful in terms of expressing biodiversity information in a useful and updateable way. If this effort is distributed among specialists, I think it has a much greater chance of success. Examine traditional infrastructures that are available today that allow, facilitate, and are preadapted to this sort of enterprise. Certainly professional Societies should be mentioned here–for example, the best expression available is the American Society of Mammalogists. Such professional groups are composed of people who are actively engaged in a specific enterprise, taxonomists with expertise who are familiar with the nuances of these systems and their subparts. So, professional societies are essentially, from a cultural point of view, preadapted. All that is required is whatever modifications are needed to allow them to move into this arena to try to pursue this sort of activity.

So, basically you have these large taxonomic groups. Many professional societies have a direct link to not only individuals, but also facilities, areas that house specimens. They have people accustomed to curating these materials. You see the emergence on the Web of these societies, who are creating a Web presence. These societal Web sites serve as interactive nodes for various things, such as membership directories. But, certainly this enterprise could be expanded into developing content for these sites. You have again the tradition established at the societies for hardcopy publication, submission of articles, peer review, etc. It is part of societal activity at this point in time. You have societies that have been in existence for decades. Therefore, you have the potential for the permanent transfer of responsibilities through generations, through cohorts, and so on, already in place with professional societies. You have, most importantly with regard to the social parameters of science that have been mentioned here, a society ready to distribute responsibility, data resources, and credit among a membership of participating individuals, not centered with an agency or a commercial indexing enterprise.

Previous | Next

 


Questions: Email us or Call (215) 893-1561

Copyright © 2003 NFAIS. All rights reserved. No part of this product or service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written consent.

Privacy Policy