Preprints of the
Metadiversity
Conference
Proceedings
Session
6: The Metadata Challenge for Museums
Museum Informatics:
Where We Have Been and Where We Need to Go
JULIAN HUMPHRIES,
Associate Professor/Research Department of Biological
Sciences, University of New Orleans
|
ABSTRACT
Fragments of the museum
community were early adopters of the technology
associated with personal computers and databases.
However, early growth was sporadic and very
un-centralized and, as a consequence, there were
many singular efforts that died quick deaths. Other
institutions or groups worked together but
frequently without significant communication with
peers. During the early and mid-‘90s, the
heterogeneity in system and logical design of
museum information systems resembled a course
patchwork of database technology and
sophistication. The National Science Foundation
(NSF) became frustrated with the repeated building
of new systems and started requiring more
thoughtful design and cooperation for institutions
that sought their funds. Elements of the community
started working on collaborative projects, and a
few individuals spearheaded efforts to build some
community infrastructure. To date, these efforts
have achieved a small measure of success, but
large-scale integration of the museum community
(and its data) awaits as of yet unseen levels of
cooperation and will require a healthy infusion of
technology and personnel into the typical museum
operation. |
Today I would like to address
issues about the expectations of a directory service
community—especially the issue of what those expectations
ought to be from the viewpoint of the people who actually
have to provide data and metadata about their collections. I
am also going to be talking about who we are, what we do
(and why what we do is important), the priorities of natural
history museums from the viewpoint of content providers, and
the nature of the problem for biological collection and the
corresponding biodiversity data providers.
The Role of Natural History
Collections
The traditional and primary role
of natural history collections is as archivers and
catalogers of specimens and proxies for specimens. The
result is a curated collection of objects and data about
those objects. The product that comes out of natural history
collections is dynamic knowledge about those specimens in a
lot of contexts— taxonomic, ecological, and functional
descriptions of species, higher level taxa, and communities.
The people in museums who make
these products are also frequently the same people who are
responsible for the curation and cataloging of these
collections. As a consequence, their role is twofold. First,
they have the responsibility for seeing that the collections
are well maintained and curated. But they also have the
responsibility for taking the information from those objects
and doing something with it in terms of publication.
Why Biodiversity Data are
Important
The importance of biodiversity
data is probably fairly obvious, but I want to make a few
points about the role these three billion objects play in
this scheme of discovery about the Earth, the history of the
Earth, and what is going to happen to the Earth with time.
Virtually most of what we know about the natural world in
some way or another works its way back to the truth that is
represented in natural history collections. So that original
piece of information that someone got about a bird, fish, or
insect—as well as where the specimen was collected—is the
start of a long pathway of knowledge and dissemination of
knowledge about our natural world.
It is also true that those same
data, once used as an indication of the past, are also
valuable for prediction about the future in terms of telling
us about parts of the world with which we are unfamiliar,
about things that we can predict, and about the location of
taxa that we have not yet collected. The data also help
measure the impact of particular human activities on the
natural world, as well as the likely impact of a human
activity on flora and fauna.
Aspects of a Museum Curator’s
Job
I am speaking from the viewpoint
of a former curator (and I actually remain curator of a
small collection). If you look at the things that are
important to me as a curator, the first thing is
conservation of the specimens under my charge. It has been
interesting to hear others relate the value of data to its
age as they discuss the 20-year-rule. In contrast, our data
don't even start to get interesting until they are more than
20 years old, and the older they are, the better they are.
We think, literally, in terms of centuries—centuries past
and centuries future. We are interested in data and
specimens from a century or more ago. And we are interested
in having those specimens that are under our charge be just
as useful 100 or 200 years down the road.
That scope of time puts a very
heavy load on us curators in terms of determining what to do
first when we begin working with a new specimen. For the
most part, the first thing we have to determine is that that
specimen is going to be there tomorrow in the same condition
in which we found it today.
The next aspect of being a
content-provider curator is to understand that the specimens
are to be used. Therefore, there must be in place tracking
mechanisms that track the transactions associated with a
specimen. Increasing specimen usage, contacting the
researchers (both internally and externally) who might make
use of the collections’ specimens, and satisfying the
funding requirements of the parent organization or agency
also are part of a curator’s responsibilities (in fact, they
comprise a part for which the curators are not necessarily
trained). At different times, this one position might be a
service function, a training function, or even a research
function.
Because of all this
responsibility, data and data management—despite their
importance—almost always comes at the end of the curator’s
priority list. In addition, since data management was simply
paper up until 15 years ago, the idea of devoting additional
resources to data management meant something had to be taken
away from the other responsibilities.
The Internet Impacts Curators
What has happened since the
advent of the Internet has been just amazing and has
signaled a truly revolutionary change for natural history
collections. Before the Internet, paper itself was very,
very important to us. We cared a lot about physical paper.
We made labels out of it, we wrote our catalogs on it. It
was something that we studied and investigated and used for
research.
Then the digital age came upon
us. Now we had to take the records that we had carefully
curated and managed for a century or more and transfer them
to computer technology. This raised many questions about how
we manage our collections and what we were going to do with
those data. These concerns became even more acute when
microcomputers moved into the collections, and it became
possible not only to digitize our data but also to actually
put the data on machines that sat right next to the
specimens themselves!
Originally what we saw was the
creation of a large number of disparate and heterogeneous
information systems that did not do a very good job of
talking to each other. Remember, at the time there really
was no organization that dealt with issues of
standardization in our museums. In fact, one of the first
efforts in that regard took place just 10 years ago, in
1988, at a workshop organized and funded by the Association
of Systematic Collections. At that workshop, one of our
charges was to deal with issues related to standardization
of hardware and software, necessary primarily because the
era of the PC was anything but a standard at that time.
Intel was certainly the primary vendor, but there were
other, competing operating systems. There also were many
software choices, and these software choices varied
dramatically in their capabilities and in their interchange
with other software choices.
A number of workshops followed.
For example, in 1993, I helped organize a workshop on
standards and the exchange of data. And Berkeley personnel
organized a workshop on interoperability of mechanical
databases. And all the while we talked about federation,
federation, federation.
During this time, the data
management systems that were being created got increasingly
sophisticated. We moved from a written set of applications
to a more sophisticated set of applications. But within the
community, we still lacked any kind of high-level effort
that would make it possible for all of these databases to
speak to each other. So, the challenge was—and remains—to
determine what is necessary for all of these biological data
sets and the data associated with them to be readily
accessible and available, and for the information about
these data sets to be readily accessible and available in
all of the variety of clearing houses and search engines and
directory services that exist today.
Curators Consider
Collection-Level and Object-Level Metadata
Earlier a distinction was made
between collection-level metadata and object-level metadata.
Collection-level metadata might involve the creation of
50,000 to 100,000 record objects to capture the data about
all the places where museum information is stored. However,
when we are talking about object-level metadata, we are
looking at a number at least five or six orders of magnitude
larger!
There is no easy way to move
from good descriptions of our collection data sets to good
descriptions of our collection objects. So if we take the
numbers discussed earlier in terms of the cost per record—up
to $60 for each record—we realize that more detailed
metadata-descriptor creation would cost tens of billions of
dollars. This is not a likely investment in the near future.
On the other hand, when we are talking about
collection-level metadata, the cost falls to the millions of
dollars. I believe that this is a feasible goal for our
community, and I believe that it is one that—with the proper
incentives and resources—is attainable.
Previous
Questions:
Email us or Call (215)
893-1561
Copyright © 2003 NFAIS. All rights
reserved. No part of this product or service may be
reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written consent.
Privacy
Policy |