Preprints of the
Metadiversity
Conference
Proceedings
Session 2: The Challenge in Species Discovery and
Taxonomic Information
Taxonomic Information
Systems–Stability through Diversity
HUGH WILSON,
Professor of Biology and Curator, Departmental Herbarium,
Texas A&M University
|
ABSTRACT
Biodiversity data are,
by necessity, linked to taxa. This linkage is
established at the headwaters of data flow–the
scientific literature–and retained to the end point
in that most clients will query biodiversity data
using taxon names. While taxonomic names provide a
symbolic "hook" for information linked to a given
entity, the overall classification system expresses
a nested array of biological relationships among
taxa, i.e., a structure that allows interpretation
and extrapolation of biodiversity data. This
fundamental importance of classification systems as
a source of stability and structure is countered by
the simple fact that classification systems are
inherently unstable. Continuous modification of
classification systems is an ongoing response to
new knowledge and scientific progress. This
changing superstructure for biodiversity data
presents minimal problems beyond the digital arena
in that data are presented in a fixed (hardcopy)
taxonomic context. However, the increased potential
for metadata development via digital technologies
will not be realized until procedures are in place
that allow data placement within a dynamic or fluid
taxonomic information matrix. These procedures will
be difficult to develop via a centralized nexus of
"standardization" or "authority" that carries a
limited base of expertise or representation
relative to the relevant scientific community. The
digital network provides a new medium through which
interactive development could be accomplished among
a global community of taxonomic specialists. This
path of "distributed" development, possibly
mediated by extant professional societies, would
incorporate the broadest range of expertise and
also by default, allow progress toward a dynamic
biodiversity data system via consensus-based
decisions. |
I attended an NSF-funded
workshop on roughly similarly topics in 1996, in San Diego,
and the result of that workshop was essentially nil. As you
can see if you go to the home page describing the workshop,
there is really no product. The "product" represented a
mixture of grandiose schemes, competing programs, and a lot
of jargon. It sidestepped some of the fundamental problems
involved with the expression of biodiversity data on the
Internet.
So, what I would like to do
today is touch on at least one of these problems in some
detail. This perspective that I am presenting has to do with
my work as Chair of the Internet Communications Committee
for the American Society of Plant Taxonomists and also as a
functioning member of the Texas A&M working group that is
dealing with reporting biodiversity data to the network. I
do not represent either group, but my perspective is based
on the work that I have done with these groups.
This symposium is concerned with
enhanced data flow from various producers of various sorts
to the ultimate consumer. The consumer is of utmost
importance in the sense that this is where the data are
going to be applied. The "metadiversity" notion certainly
relates to levels of complexity in linking complex data
resources to other complex data resources in ways that are
yet to be defined.
Certainly open access is of
primary importance. The producers have to have open access
to the system to be able to develop data resources, and the
consumers–wherever they may be–have to have open access to
the product. So, you cannot operate or proceed with regard
to biodiversity expression without assuming that it is the
diversity of information itself that is of primary
consideration. As scientific specialists, we have our
concerns about the array of taxa and complex interlinkages
and data structures and so on.
The Mission
Our primary mission must be to
put fundamental, basic biodiveristy information on the Web
and to make that information available for both specialists
and nonspecialists to examine. This involves a data
"triage." Since there is no way all the information can be
put out there right from the beginning, things have to be
prioritized. The highest-profile bits of information have to
be established as the highest priority. Our discussion of
metadata seems to me to be a bit premature when we really
have not dealt with putting in the fundamental bits of
information that are needed to, for instance, determine the
distribution of species X, (a fairly simple problem): What
is the distribution of species in genera X across North
America? This is a fundamental question for which we really
do not have a good answer at this point in time.
Certainly, another assumption is
that whatever goes on with regard to metadiversity–or
whatever else is going to go on the Web–is going to happen
within an environment that is provided by the browser. And
that environment requires that the systems established are
fast and can deal with whatever problems there might be on
the network, thus allowing the user to access both text and
image data as quickly as possible. Again, we have to assume
that access to the Web sites that express biodiversity data
is going to be multifaceted in terms of the user base.
Technicians, scientists, and specialists can certainly use
the Web, but it has to be designed in such a way that the
broadest user base–including users such as decision-makers
in administrative positions–can also grasp the information
that is provided.
As has been referenced here many
times, the applications for this information may not yet be
defined. In relation to this, let’s look at vascular plants.
My area is in vascular plants and I am a bit biased. But the
fact is that if you look at any system that relates to
biodiversity in general, you are dealing primarily with
vascular plants. For example, the Texas Parks and Wildlife
Department provides a brief summary of the natural areas of
Texas, which are defined in terms of the vascular plants
occurring in these regions. Certain plants are, in fact,
indicator species. Those that are dominant in certain areas
can be used to define regions. Vascular plants also are
indicators of diversity, because the relative pattern of
diversity expressed by vascular plants is very often
concordant with the pattern of diversity that we find in
other organisms. You can extend this concept to a
national–or even global–level, and biotic regions can be
defined by the vascular plants present.
On the World Wide Web, you can
go to
http://www.csdl.tamu.edu/FLORA/b98/check98.htm
and find a map that provides a graphic view of relative
diversity of vascular plant taxa across the United States,
Puerto Rico, and the Virgin Islands. The map is color-coded
and illustrates the diversity mapping systems that we have
developed based on 70,000 records. It includes not only
accepted taxa (about 30,000+) but also geographic
checkpoints at the state level for each taxon. These maps
are generated "on the fly," and they can be generated for
any taxonomic level. You can look at all vascular plants and
their subclasses, as well as orders, families, genera, and
so on, in this way. It gives you an immediate view of
relative diversity of vascular plants across, in this case,
the United States. The system can be applied to a single
state within the country using a segmented color-coded item.
Again, the maps are generated on the fly. There is a counter
that racks up the number of taxa, as well as a device that
allows the color-coding to be done of the California and
Texas, pretty diverse states with regard to vascular plants.
The point here is that the pattern of diversity among
arthropods and whatever else is probably concordant with
this pattern. Therefore, vascular plants are fundamental,
and you can examine a Web site or an informatics resource
and very quickly determine its content by what it has
available in terms of vascular plants.
With this particular mapping
system, each state is a segment. Not only can it be
color-coded depending on how many taxa it carries, but also
you can click on a state and get a text listing of the taxa
that occur in that state. So, for example, a click on Texas
will produce 6,000+ lines–two megabytes of text file–and
give you a full listing of the entities in the vascular
plant world that occur in Texas. In addition, it is
interlinked–that is, these listings are coupled with
references to other sources. We have links to fish and
wildlife services, vegetation and regional maps for Texas,
county-level maps for California, images of varying sorts
involved with this other index system, and a map for a given
taxon.
The Application of Names
The fundamental item of
importance relative to this discussion is the array of taxa.
This array provides a "hook" for biodiversity information,
metadata, and so on. These individual entities also provide
a view of biological structure. The placement of
infra-specific groups, species within genera, and so on, is
an interpretation of reality symbolized by the application
of names.
As a result, the system of
classification as expressed here is of fundamental
importance. It serves two functions. It gives a picture of
biological reality and relationships. It also provides
individual hooks for bits of information–whether they be
images or other entries for the taxa involved.
So, with regard to taxonomic
structure, we have an array of names. There has been a lot
of discussion of names here. Names fundamentally are in the
public domain. The scientific names are comparable to
musical notes in that anybody can access notes and put them
together any way they want. The difference with regard to
taxonomic structure or classification systems is that the
array–the hierarchy that is established by a classifier–is
the result of expertise, a lot of work, and personal
investment, and is personally expressed. As a result, it is
comparable to a score of music, a compilation of music. And
as a result of that, it is owned. It associated with an
individual.
Taxonomic Structure
Taxonomic structure will vary.
There are different perspectives on the array of diversity
that one finds in vascular plants or other taxa. This
variance can occur depending on the geographic range covered
by the system. For example, if you combine North America and
Mexico, you get a different picture than you would have for
each country separately. It relates to interpretations and
applications of the international code of botanical
nomenclature for vascular plants. It can change daily on the
basis of research, different views generated by data that
relate to relationships, and–finally and most
importantly–taxonomic opinion. We have the USDA Plants
Database, which carries at its Web site over 35,000 taxa of
plants (many of economic importance). The classification
system expressed there has some synonymy involved–that is,
decisions made are based at the USDA. Also, the National
Institutes of Health (NIH) has access to genetic information
through its Biotechnology Information Center. There is less
synonymy here, but there is a full-blown classification at
the species level. This is another federal-based item that
has a structure, a foundation established locally.
Fully synonymized checklists for
vascular plants available on the network today have only one
source–the Biota of North America Program in North Carolina.
The first display of this information on the Web was
produced at Berkeley through the Museum Informatics Program
there. A digital checklist provides both mapping and
listing. The version is concordant with the published
version of the BONAP checklist, which was dated 1994. I
learned today that the 1996 version of the BONAP, listed
both in terms of the taxonomic structure generated by John
Kartesz and the distribution information also generated by
John Kartesz at BONAP, is now available.
he U.S. Federal Standard
established by ITIS has been referred to here many times as
the NBII. I am not sure of the connections between these
groups. It is directly associated with the USDA Plants
Database, evidently a direct transfer of the BONAP checklist
to ITIS. As a result it also is the 1996 version. There were
6,000 changes between 1996 and 1998 for this checklist. But
the version expressed at the ITIS site today is two years
old, so those 6,000 changes are not expressed there.
The most recent expression of
this very critical data set with regard to vascular plant
information on the Web is at our site, which does express
this 1998 version of the BONAP classification system,
including the structure implied by that classification
system and the distribution of information. Now a standard
is possible.
Agreeing to a Standard
There has been discussion today
about following the ITIS nomenclature for the Discover Life
in America program. ITIS taxonomy has also been discussed
for use in other programs. However, does ITIS represent a
valid, legitimate standard that will be followed by those
working in the area? There are many difficulties involved.
These systems are very complicated, but they are also a
moving target. A standard becomes immediately obsolete if it
is published in hardcopy. This is because it is under
constant revision and constant update, and as data becomes
available it is just going to change. That, fundamentally,
is the beauty of digital classification systems–that they
can change immediately, that they can be updated immediately
online. It is a very diffuse target and there are many
variants. A lot of difficulties, including authority
ownership and general "turf" problems, are involved with the
development of the standard, both in terms of the source of
the standard and from the perspective of those using the
standard.
Finally, with regard just to
vascular plants: Putting aside the insects, the viruses, and
so on, you are looking at a very large and complex array of
biotic entities. You have, for the area covered by the BONAP
program (the United States, Puerto Rico, and the Virgin
Islands), 30,000+ species organized into 290+ families. And
there are variants available within any element of that
hierarchy. So, it is a difficult task and not one to be
quickly sidestepped if you are considering expression of
metadata or any sort of data as a function of vascular
plants.
In terms of expressing this
information on the network, there are various options. I
don't pretend to offer up a solution to the problem, but
basically if something is going to be done that is
functional, it will have to move beyond talking, jargon, and
competing centers of development. It is going to have to be
inclusive. You cannot have one person–John Kartesz at the
BONAP program in North America–providing a standard for the
global user base. It is not going to work and it is not
going to be acceptable. You have to tap in to the full
community of active folks in plant taxonomy. There has to be
some sort of input involved. The computational system is set
up in such a way that movement with regard to classification
is a doable item on the network. It certainly is not an
impossible task. And certainly any effort to develop a
standard should focus on the fact that it is a reference and
that the user would benefit if, in areas of controversy or
uncertainty, different optional alignments could be
expressed, both in terms of the classification and the
underlying associated data.
Just looking at distribution,
you can imagine what has to be done. But it can be done
computationally to show the distribution between a species
and a subspecies, depending on the rank of the entity.
Certainly, with regard to anything that is produced by the
federal government or any publicly supported entity, open
access needs to be defined to include anybody who wants to
download the information in a form that can be incorporated
into whatever task is to be performed.
A Role for Professional
Societies
You can't have the individuals
or groups that are providing this fundamental bit of
information be invisible at the site presenting that
information. There has to be some sort of remuneration
within the commerce of the Web that is relevant to
contributors. And certainly the full community, again, has
to be involved in the enterprise. Certainly a possible path
for development–a path that we have recently proposed to NSF
be at least explored through the Web site of the American
Society of Plant Taxonomists–is to first of all focus on the
Internet as a medium, not only for expression of information
but also to use as an "Internet commons." The term "Internet
Commons" has been applied here as a common medium through
which an active group of specialists can work through the
network to develop, maintain, and curate these sorts of
classification systems.
Institutional or governmental
centers?–they certainly have been tried. And other patterns
of funding will continue to be tried. But I don't think they
are going to be successful in terms of expressing
biodiversity information in a useful and updateable way. If
this effort is distributed among specialists, I think it has
a much greater chance of success. Examine traditional
infrastructures that are available today that allow,
facilitate, and are preadapted to this sort of enterprise.
Certainly professional Societies should be mentioned
here–for example, the best expression available is the
American Society of Mammalogists. Such professional groups
are composed of people who are actively engaged in a
specific enterprise, taxonomists with expertise who are
familiar with the nuances of these systems and their
subparts. So, professional societies are essentially, from a
cultural point of view, preadapted. All that is required is
whatever modifications are needed to allow them to move into
this arena to try to pursue this sort of activity.
So, basically you have these
large taxonomic groups. Many professional societies have a
direct link to not only individuals, but also facilities,
areas that house specimens. They have people accustomed to
curating these materials. You see the emergence on the Web
of these societies, who are creating a Web presence. These
societal Web sites serve as interactive nodes for various
things, such as membership directories. But, certainly this
enterprise could be expanded into developing content for
these sites. You have again the tradition established at the
societies for hardcopy publication, submission of articles,
peer review, etc. It is part of societal activity at this
point in time. You have societies that have been in
existence for decades. Therefore, you have the potential for
the permanent transfer of responsibilities through
generations, through cohorts, and so on, already in place
with professional societies. You have, most importantly with
regard to the social parameters of science that have been
mentioned here, a society ready to distribute
responsibility, data resources, and credit among a
membership of participating individuals, not centered with
an agency or a commercial indexing enterprise.
Previous |
Next
Questions:
Email us or Call (215)
893-1561
Copyright © 2003 NFAIS. All rights
reserved. No part of this product or service may be
reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written consent.
Privacy
Policy |