Search NFAIS

Home
About NFAIS
Events

Promotions
Information Community News
Press Releases
Members
Committees
Join NFAIS
Contact NFAIS

Member Login



 

 

 

 

 

 

 

 

 

 

 

Home  >>  Publications  >>  Metadiversity  >>  Preprints Contents
 
Preprints of the Metadiversity Conference Proceedings

 Session 3: The Challenge in Earth Observation, Ecosystem Monitoring, and Environmental Information

Locating Biodiversity Data Through the Global Change Master Directory

LOLA OLSEN, Project Manager, NASA Global Change Master Directory

ABSTRACT

The Global Change Master Directory (GCMD) currently holds descriptions for approximately 7000 data sets held worldwide. The directory's primary purpose is for data discovery. The information provided through the GCMD's Directory Interchange Format (DIF) is the set of information that a researcher would need to determine if a particular data set could be of value. By offering data set descriptions worldwide in many scientific disciplines - including meteorology, oceanography, ecology, geology, hydrology, geophysics, remote sensing, paleoclimate, solar-terrestrial physics, and human dimensions of climate change - the GCMD simplifies the discovery of data sources. Direct linkages to many of the data sets are also provided. In addition, several data set registration tools are offered for populating the directory. To search the directory, one may choose the Guided Search or Free-Text Search. Two experimental interfaces were also made available with the latest software release - one based on a keyword search and another based on graphical interface. The graphical interface was designed in collaboration with the Human Computer Interaction Laboratory at the University of Maryland. The latest version of the software, Version 6, was released in April 1998. It features the implementation of a scheme to handle hierarchical data set collections (parent-child relationships); a hierarchical geospatial location search scheme; a Java-based geographic map for conducting geospatial searches; a Related URL field for project-related data set collections, metadata extensions (such as more detailed inventory information), etc.; a new implementation of the Isite software; a new data set language field; hyperlinked e-mail addresses; and more. The key to the continued evolution of the GCMD is in the flexibility of the GCMD database, allowing modifications and additions to be made relatively easily to maintain currency, thus providing the ability to capitalize on current technology while importing all existing records. Changes are discussed and approved through an online "interoperability" forum. The next major release of the GCMD is scheduled for early 1999 and will include the incorporation of a new matrix-based interface, a rapid valid-based query system; improvement in the operations facility - important for future distributed options; new streamlined code for greater performance and maintainability; improvements in the handling of seven current fields proposed through the interoperability forum (at no expense to the data providers); and the release of DOCmorph, a more robust version of DIFmorph to translate many "standards" multidirectionally. Issues and actions will also be addressed.

The Global Change Master Directory is a directory of data sets on Earth science, including broad coverage of the atmosphere, oceans, biosphere, and land. It connects users to 24 sites fairly evenly distributed around the world among those that maintain global data sets.

This is a project that began at NASA and continues to be supported by NASA. The original directory focused on remote sensing data. However, it has grown to be much more than that. It is now a system that holds data set descriptions for earth science data.

My comments today will focus on aspects of the GCMD that might apply to anyone developing a system of metadata records.

User Working Groups

First, I would like to stress the importance of having a science user working group. One needs to focus on those who will use the data system. The GCMD has two ecology-related advisors involved in our current user working group, and I want to emphasize just how important the group’s input has been.

User group members often ask interesting and fundamental questions: For example, one member of the user working group asked us to estimate the number of data sets in the world. Here we were, designing a system and talking about scalability! Did we know what we were scaling to? Did we ever think about how many data sets there really were? I independently asked the science coordinators on the project to estimate the number of data sets. At the time we were working on the Environmental Task Force efforts, so we made an estimate of DoD data sets. The estimate they determined for biological data sets was 3.5 million! But after listening to this morning’s talks, I contend that there actually are even more than that. Of course, the total depends on how you count data sets and how they are aggregated.

Our user working group recently asked another interesting question: Do we know how many of our users are actually getting to the data sets? Our answer was "no." Now we are tapping into the links so we can get a better idea about how many users are accessing those data sets.

Registering Data Sets

Users are not only important in helping to design (and redesign) the system, but they also play a direct and fundamental part in database building. Users register their own data sets and provide the metadata record on the GCMD using a Web form. The main questions are: What? Where? When? Who? With this information, we gather what is basically needed for someone to begin searching.

Data Fields

There are 32 fields of information that can be stored. Not all are required, and some are quite new with our latest version. Two new ones are the Related URL, which links users to data sets or extended descriptions, and the parent-child aggregation option.

There is nothing about the way the fielded data are stored that mandates the output format. The metadata record can be displayed as a GILS, NBII, or FGDC record.

We believe that the set of 32 fields (known as the Directory Interchange Format, or DIF) is the necessary set of fields for the user to determine if a particular data set is a candidate for a desired application. It currently has multidisciplinary capability, and it is alive and evolving as there are new needs and new capabilities. All the modifications are decided through an interoperability forum, where the participants discuss their needs.

In addition to the data-set descriptions, there also are fields for sources, sensors, campaigns, and other data center information, along with modification tools for supplementary information.

Getting Scientists to Register

One of the staff members recently published a paper about why scientists aren't writing metadata. Of course, there are many reasons. However, one important reason is that the scientists did not believe they would be credited for their work. We are hoping to promote the use of data set citations. The current data set citation will be modified to comply with the new standard citation being proposed by the International Standards Organization (ISO).

Searching the Directory–User Interfaces

We have been working to simplify the directory search. The current search interfaces have been modified for this particular version of the directory. One is the Guided Interface, in which valid terms for all the parameters are listed in pull-down menus. Anytime changes are made to the "valids," they take effect here.

Another interface is based on the Isite search engine software, which is Z39.50 compliant. Since Isite permits numeric searches, you can do temporal searches for date ranges and spatial searches. You can also search for data in specific fields. The parent-child capability implemented through the "Guided Search" using the database has also now been implemented through Isite.

There are two experimental search interfaces for which we are seeking user feedback. One is based on science keywords. We have noticed that several other groups have adopted this particular interface, because it is a simple way to help users to look for data sets. The database is searched via topic, term, and variable–a hierarchical set of Earth science keywords. The keyword list has remained stable over time. We attribute this stability to a set of 14 rules for adding or modifying the keywords.

The other interface that has inspired us to do some additional work is one that was developed with the Human Computer Interaction Lab at the University of Maryland. A paper that was written about this interface was called "The End of Zero-Hit Queries." As you select your options, this interface shows you the number of data sets that are left in that particular category. One can visually watch the number of data sets to view change as one selects and de-selects by temporal and spatial coverage. We have extended this concept for the next version of the directory by taking all the fields that have valid values and putting them in a matrix interface, so that all the fields can be used. The user always knows how many data sets are left in any combination of parameters that you choose.

Who Uses this System?

We can't know "exactly" who uses this system. However, we do collect metrics that indicate the "domain" name for unique users–those who have gone at least beyond the homepage. The "domain" indicates whether the user is a foreign user, a commercial user, an educational user, a government user, or a military user. We also track the number of DIFs or the data set descriptions accessed. Soon we will have a quick look at who is actually getting to the data. Usage continues to increase, and we are recording access by more than 17,000 users per month.

Homepage Options

In addition to offering access to the GCMD, our homepage offers other options. One of the options is the Global Change Calendar, which is a listing of all the conferences for which we have information. This is one of the offerings that we at NASA contribute to the Interagency U.S. Global Change Research Programs’ Global Change Data and Information System. This list is also maintained with the help of system users. So if you are sponsoring a conference and want to advertise it, you can register it here.

Another link from the homepage is to the interagency U.S. Global Change Research Program. One of two choices to note here is the Global Change Research Information Office (GCRIO). The GCRIO acts as a user-support arm for the Global Change Data and Information System. That work is very important and now includes information on the U.S. National Assessment.

The last link I would like to mention is to Committee on Earth Observation Satellites (CEOS) International Directory of data sets. In this case, there are mirror sites that also host the data set information in this directory.

Information on Archiving Workshop

I would like to mention an important data archival workshop that was just held by the Data Management Working Group of the U.S. Global Change Research Program. The information will be out on the Web within the week.
http://gcmd.nasa.gov/dmwg98/

Previous | Next

 


Questions: Email us or Call (215) 893-1561

Copyright © 2003 NFAIS. All rights reserved. No part of this product or service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written consent.

Privacy Policy