Preprints of the
Metadiversity
Conference
Proceedings
Session
3: The Challenge in Earth Observation, Ecosystem
Monitoring, and Environmental Information
Locating Biodiversity
Data Through the Global Change Master Directory
LOLA OLSEN, Project
Manager, NASA Global Change Master Directory
|
ABSTRACT
The Global Change
Master Directory (GCMD) currently holds
descriptions for approximately 7000 data sets held
worldwide. The directory's primary purpose is for
data discovery. The information provided through
the GCMD's Directory Interchange Format (DIF) is
the set of information that a researcher would need
to determine if a particular data set could be of
value. By offering data set descriptions worldwide
in many scientific disciplines - including
meteorology, oceanography, ecology, geology,
hydrology, geophysics, remote sensing, paleoclimate,
solar-terrestrial physics, and human dimensions of
climate change - the GCMD simplifies the discovery
of data sources. Direct linkages to many of the
data sets are also provided. In addition, several
data set registration tools are offered for
populating the directory. To search the directory,
one may choose the Guided Search or Free-Text
Search. Two experimental interfaces were also made
available with the latest software release - one
based on a keyword search and another based on
graphical interface. The graphical interface was
designed in collaboration with the Human Computer
Interaction Laboratory at the University of
Maryland. The latest version of the software,
Version 6, was released in April 1998. It features
the implementation of a scheme to handle
hierarchical data set collections (parent-child
relationships); a hierarchical geospatial location
search scheme; a Java-based geographic map for
conducting geospatial searches; a Related URL field
for project-related data set collections, metadata
extensions (such as more detailed inventory
information), etc.; a new implementation of the
Isite software; a new data set language field;
hyperlinked e-mail addresses; and more. The key to
the continued evolution of the GCMD is in the
flexibility of the GCMD database, allowing
modifications and additions to be made relatively
easily to maintain currency, thus providing the
ability to capitalize on current technology while
importing all existing records. Changes are
discussed and approved through an online
"interoperability" forum. The next major release of
the GCMD is scheduled for early 1999 and will
include the incorporation of a new matrix-based
interface, a rapid valid-based query system;
improvement in the operations facility - important
for future distributed options; new streamlined
code for greater performance and maintainability;
improvements in the handling of seven current
fields proposed through the interoperability forum
(at no expense to the data providers); and the
release of DOCmorph, a more robust version of
DIFmorph to translate many "standards"
multidirectionally. Issues and actions will also be
addressed. |
The Global Change Master
Directory is a directory of data sets on Earth science,
including broad coverage of the atmosphere, oceans,
biosphere, and land. It connects users to 24 sites fairly
evenly distributed around the world among those that
maintain global data sets.
This is a project that began
at NASA and continues to be supported by NASA. The original
directory focused on remote sensing data. However, it has
grown to be much more than that. It is now a system that
holds data set descriptions for earth science data.
My comments today will focus
on aspects of the GCMD that might apply to anyone developing
a system of metadata records.
User Working Groups
First, I would like to stress
the importance of having a science user working group. One
needs to focus on those who will use the data system. The
GCMD has two ecology-related advisors involved in our
current user working group, and I want to emphasize just how
important the group’s input has been.
User group members often ask
interesting and fundamental questions: For example, one
member of the user working group asked us to estimate the
number of data sets in the world. Here we were, designing a
system and talking about scalability! Did we know what we
were scaling to? Did we ever think about how many data sets
there really were? I independently asked the science
coordinators on the project to estimate the number of data
sets. At the time we were working on the Environmental Task
Force efforts, so we made an estimate of DoD data sets. The
estimate they determined for biological data sets was 3.5
million! But after listening to this morning’s talks, I
contend that there actually are even more than that. Of
course, the total depends on how you count data sets and how
they are aggregated.
Our user working group
recently asked another interesting question: Do we know how
many of our users are actually getting to the data sets? Our
answer was "no." Now we are tapping into the links so we can
get a better idea about how many users are accessing those
data sets.
Registering Data Sets
Users are not only important
in helping to design (and redesign) the system, but they
also play a direct and fundamental part in database
building. Users register their own data sets and provide the
metadata record on the GCMD using a Web form. The main
questions are: What? Where? When? Who? With this
information, we gather what is basically needed for someone
to begin searching.
Data Fields
There are 32 fields of
information that can be stored. Not all are required, and
some are quite new with our latest version. Two new ones are
the Related URL, which links users to data sets or extended
descriptions, and the parent-child aggregation option.
There is nothing about the
way the fielded data are stored that mandates the output
format. The metadata record can be displayed as a GILS, NBII,
or FGDC record.
We believe that the set of 32
fields (known as the Directory Interchange Format, or DIF)
is the necessary set of fields for the user to determine if
a particular data set is a candidate for a desired
application. It currently has multidisciplinary capability,
and it is alive and evolving as there are new needs and new
capabilities. All the modifications are decided through an
interoperability forum, where the participants discuss their
needs.
In addition to the data-set
descriptions, there also are fields for sources, sensors,
campaigns, and other data center information, along with
modification tools for supplementary information.
Getting Scientists to
Register
One of the staff members
recently published a paper about why scientists aren't
writing metadata. Of course, there are many reasons.
However, one important reason is that the scientists did not
believe they would be credited for their work. We are hoping
to promote the use of data set citations. The current data
set citation will be modified to comply with the new
standard citation being proposed by the International
Standards Organization (ISO).
Searching the
Directory–User Interfaces
We have been working to
simplify the directory search. The current search interfaces
have been modified for this particular version of the
directory. One is the Guided Interface, in which valid terms
for all the parameters are listed in pull-down menus.
Anytime changes are made to the "valids," they take effect
here.
Another interface is based on
the Isite search engine software, which is Z39.50 compliant.
Since Isite permits numeric searches, you can do temporal
searches for date ranges and spatial searches. You can also
search for data in specific fields. The parent-child
capability implemented through the "Guided Search" using the
database has also now been implemented through Isite.
There are two experimental
search interfaces for which we are seeking user feedback.
One is based on science keywords. We have noticed that
several other groups have adopted this particular interface,
because it is a simple way to help users to look for data
sets. The database is searched via topic, term, and
variable–a hierarchical set of Earth science keywords. The
keyword list has remained stable over time. We attribute
this stability to a set of 14 rules for adding or modifying
the keywords.
The other interface that has
inspired us to do some additional work is one that was
developed with the Human Computer Interaction Lab at the
University of Maryland. A paper that was written about this
interface was called "The End of Zero-Hit Queries." As you
select your options, this interface shows you the number of
data sets that are left in that particular category. One can
visually watch the number of data sets to view change as one
selects and de-selects by temporal and spatial coverage. We
have extended this concept for the next version of the
directory by taking all the fields that have valid values
and putting them in a matrix interface, so that all the
fields can be used. The user always knows how many data sets
are left in any combination of parameters that you choose.
Who Uses this System?
We can't know "exactly" who
uses this system. However, we do collect metrics that
indicate the "domain" name for unique users–those who have
gone at least beyond the homepage. The "domain" indicates
whether the user is a foreign user, a commercial user, an
educational user, a government user, or a military user. We
also track the number of DIFs or the data set descriptions
accessed. Soon we will have a quick look at who is actually
getting to the data. Usage continues to increase, and we are
recording access by more than 17,000 users per month.
Homepage Options
In addition to offering
access to the GCMD, our homepage offers other options. One
of the options is the Global Change Calendar, which is a
listing of all the conferences for which we have
information. This is one of the offerings that we at NASA
contribute to the Interagency U.S. Global Change Research
Programs’ Global Change Data and Information System. This
list is also maintained with the help of system users. So if
you are sponsoring a conference and want to advertise it,
you can register it here.
Another link from the
homepage is to the interagency U.S. Global Change Research
Program. One of two choices to note here is the Global
Change Research Information Office (GCRIO). The GCRIO acts
as a user-support arm for the Global Change Data and
Information System. That work is very important and now
includes information on the U.S. National Assessment.
The last link I would like to
mention is to Committee on Earth Observation Satellites
(CEOS) International Directory of data sets. In this case,
there are mirror sites that also host the data set
information in this directory.
Information on Archiving
Workshop
I would like to mention an
important data archival workshop that was just held by the
Data Management Working Group of the U.S. Global Change
Research Program. The information will be out on the Web
within the week.
http://gcmd.nasa.gov/dmwg98/
Previous |
Next
Questions:
Email us or Call (215)
893-1561
Copyright © 2003 NFAIS. All rights
reserved. No part of this product or service may be
reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written consent.
Privacy
Policy |