Search NFAIS

Home
About NFAIS
Events

Promotions
Information Community News
Press Releases
Members
Committees
Join NFAIS
Contact NFAIS

Member Login



 

 

 

 

 

 

 

 

 

 

 

Home  >>  Publications  >>  Metadiversity  >>  Preprints Contents
 
Preprints of the Metadiversity Conference Proceedings

  Session 3: The Challenge in Earth Observation, Ecosystem Monitoring, and Environmental Information

Beyond Metadata: Scientific Information Management Approaches Supporting Ecosystem Monitoring and Assessment Activities

JEFFREY FRITHSEN, National Center for Environmental Assessment (NCEA) of the U.S. Environmental Protection Agency’s Office of Research and Development

ROBERT F. SHEPANEK, Senior Scientist and Director of the Information Resources Development Staff (IRDS) in the National Center for Environmental Assessment (NCEA)

ABSTRACT

We present an integrated vision for scientific information management approaches supporting long-term monitoring and assessment activities within the USEPA’s Office of Research and Development (ORD). This vision was developed based upon lessons learned from the implementation of several scientific information management systems and from development of the ORD’s strategic and implementation plans for scientific information management. The vision reflects that effective management of scientific information must address technical, cultural, and management challenges. Technical challenges include management and integration of metadata, data, and the modeling, analysis, and visualization tools used as part of assessment activities. Cultural challenges relate mainly to the protection of intellectual capital produced by individual investigators. Management issues include commitment of adequate resources for systems development and operation, support for related policies and procedures, and appropriate incentives for involvement by staff and project participants. Past experience with EPA and other organizations have shown that the management issues are frequently most limiting to successful implementation of integrated information management solutions. USEPA ORD’s vision for information management addresses the following technical challenges: developing directories of environmental resources collected and maintained by multiple organizations, providing access to descriptive information (metadata) sufficient to support secondary use of those resources; integrating data collected at multiple spatial and temporal scales; and integrating data resources with analytical tools and models. Metadata efforts have focused initially on the development of environmental resource directories enabling users to find data of potential interest, and development of detailed catalogs of descriptive information that enable users to evaluate the use of data as part of some assessment activity. In ORD’s strategy, the concept of a data directory has been extended to include analysis tools, models, documents, and multimedia products to better reflect the complexity of environmental inventory and monitoring activities. Additionally, the strategic vision expands the focus of technical efforts such that various levels of metadata can support integration of data and data systems and integration of data with modeling, analysis, and visualization tools. This type of integration becomes useful for integrated assessments of biodiversity and is exemplified by integration of project-specific systems with a common data dictionary, or a common reference database for taxonomy, such as the Integrated Taxonomic Information System (ITIS). Effective information management approaches supporting monitoring and assessment activities must also recognize that there exist significant cultural challenges that must be met to ensure success of a long-term monitoring project. The cultural challenges relate to the sharing of data, loss of control of the use of the data, and realizing credit for collecting data, or adding value to data. ORD’s vision for information management addresses these challenges by leveraging technology to restrict access to data and information as assessment products are developed, and proposes an incentive-based approach to catalyze sharing of data.

The title of today’s talk is, "Beyond Metadata: Scientific Information Management Approaches Supporting Ecosystem Monitoring and Assessment Activities." What we will be talking about is the need to–as much as possible–leverage the use of information technologies to support all aspects of the environmental assessment process.

Information Diversity

In order to set the stage and put this topic in context, I would like to describe ever so briefly the scientific information management environment and what are we dealing with when we are talking about scientific assessments. First of all, we are dealing with information diversity. We are dealing with a lot of different types of information–not just biodiversity. But even if you are just looking at biodiversity, we still have to bring in a lot of other types of data in order to deal with the subject.

Environmental assessments are becoming much more multidisciplinary. In many of the government agencies, we have to consider environmental assessments in the context of combining ecology with human health. And all of a sudden, we have a whole mess of data that we have to pull together. The big challenge in terms of information management here is to manage many small pieces of information and a few very large pieces of information.

We also have the scale problem when we do an environmental assessment. For example, we can start off with large remote sensing data sets. These are large-scale data sets that may need to be combined with regional monitoring studies in order to conclude something about status and trends in the environment.

But even if we stop there, we still don't have the full picture, because we haven’t yet considered the ecological processes. Therefore we have to go down to some site-specific intensive studies. It is the combination of these three types of studies–large-scale, regional, and site–that makes a complete environmental assessment. This is not just a message from the EPA–this is the Committee on the Environment and Natural Resources Monitoring Framework that came out in 1996. The Framework is a federal monitoring strategy to combine these three levels of data in order to do environmental assessments. It is actually pretty complex.

Systems Diversity

In addition to information diversity, the other thing that we have in the scientific realm is systems diversity. I am not talking about ecological systems here–I am talking about data systems where we have multiple information management systems. These systems are all individually developed for individual organizations and they, by-and-large, don't talk to each other. So the challenge here is to develop and provide interoperability between systems and with reference databases.

If I am developing a database for Project A here and another one for Project B there, then one of the things that I want to bring is some consistency in terms of the way I name data elements. If I refer to water temperature in one way for one database, for example, then I should refer to it in the same way in another database. At the least, we must have some sort of translator in-between the two databases that can interpret what has been stored in each.

We have heard before about the Integrated Taxonomic Information System (ITIS). One of the uses of ITIS is to promote a common way of naming the same taxa or taxon. Well, our databases in Project A and Project B, therefore, ought to refer to this reference database of taxonomy in the same manner so that Project A and Project B are calling the same species the same thing. Similarly, we have the same problem with chemical names.

The final complexity here is we have a very distributed workforce. Gone are the days of the individual investigator in academia coming up with some grand discovery and publishing it. No. Now we are forming research teams that transcend organizational and geographic boundaries. And the participants in those teams bring to the ball game their own information technology, their own information management environment. This means that there is by necessity another level of integration required. The challenge here is to link heterogeneous environments.

Three Challenges for Scientific Information Management

This situation brings up various challenges for scientific information management, and we categorize them into three big categories. We have the technical challenges–those that are related to the management of metadata and the tools needed to complete assessments. We have the management challenges–those that have to do with providing adequate resources–we are always asking for more money, right?–and also the support for policies and procedures to make the information management systems work. (Remember, a system is comprised of people, software, and hardware. If management is not enforcing the procedures, then people are not part of the equation there.) And, we have the cultural challenges. The cultural challenges relating to scientific information management have to do with the protection of the intellectual property rights of authors. If we don't acknowledge that, then we are going to be developing systems that don't work.

Cultural Challenges

Let me start with the cultural challenges. The cultural challenges basically are to provide protection for the actual property rights of others. If I as an investigator have collected a chunk of data, I usually want first publication rights to those data, because my career depends upon getting the results of my work published in a journal. If we don't acknowledge that, then we are not going to have buy-in at the principal investigator level. At the same time we are going to have to promote data-sharing and, to a certain extent, change the thinking of the scientific community. What we need to do is achieve recognition that the publication of metadata and data are as important as the publication of a journal article. One way to achieve this is to work with the scientific societies, the professional organizations, peer review panels, and so on, to reinforce the fact that there ought to be "brownie points" given out for someone who publishes metadata as well as data. Because until they get that credit, until the principal investigator can say, "Hey! I got something for that," they are not going to do it. Earlier someone mentioned a publication that came out a few months ago that said exactly those things. And to reinforce that, one of NASA's campaigns came up with a few "commandments" for their working group. I will share just a couple of them:

  1. Thou shalt make thy data available even unto thine enemies. (Now that is promoting data-sharing!)
  2. Thou shalt release thy data from bondage. (How many times have we heard about a guy still sitting on the data two years after the research is completed? Just hasn't published yet–and that doesn't help the community.)
  3. Thou shalt not covet thy neighbor's data until they have had a crack at them. You may laugh and it may sound trite, but you know, we do have those impediments that keep some scientists from using the information management systems that we develop.

Management Challenges

Some of the management challenges involve pleas for more money, commitment of adequate resources, and various publications advocating that 10 percent to 20 percent of the research budget ought to be allocated for information-management activities. Management challenges are probably seen more in the beginning of a program and less as time goes on. We need support for related policies and procedures, and we need appropriate incentives for the involvement of staff and project participants. Again, this aspect relates to the need for management to acknowledge that you published your metadata.

Technical Challenges

I classify the technical challenges into two different types of needs. First, we need tools to help users find relevant data and information in a distributed environment. We need to provide adequate descriptions of data so that a user can judge whether he or she can use those particular data for some particular use (often a use that was not considered by the guys who originally collected the data). And secondly we need to provide access to that metadata and the other resources.

Most of our scientific information management efforts so far have focused on those two needs, but there are some additional technical challenges. We need to develop approaches and standards that facilitate data integration. This will allow us to pull together data from multiple data sets and have information technology help with that process, instead of having to change the headings in your spreadsheet, for example. We need to enhance the interoperability of data systems. We must develop and use some sort of intelligent agents that can bring together information from multiple databases so that data integration is not a lot of laborious work on the part of individual investigators.

We are obviously providing some model and analysis and visualization tools now. However, there has to be an integration of those tools with the data themselves. In other words, choose your data set, choose your tool, and information technology can bring them together. I am not saying that we have all this developed, and I am not saying we have all the answers. But this is the vision of where we want to go. And I think information technologies can be used to support more of these kinds of activities, which are part of the assessment process.

EPA Efforts in Scientific Information Management

Within the EPA we have recently developed an implementation plan that spells out a vision for information management within the office of Research and Development. The plan encompasses the next three, four, and five years, so not everything is in place yet. But the major crux of it is to basically leverage information management technology to support all aspects of the assessment process. Part of that is to adopt or develop (but hopefully adopt as much as possible) approaches, standards, and procedures to maximize the integration of data, data systems, modules, and other analysis tools. We are using information technology to make this assessment activity more efficient.

We also are trying to integrate as much as possible our efforts with ongoing national and international efforts, because the EPA as an agency realizes that we can't do it all, we certainly haven't done it all, and–to some extent–we are behind organizations like NASA and NOAA in having effective data-management policies and systems in place.

This vision of the EPA’s Office of Research and Development (ORD) attempts to address the technical management and cultural challenges that I have already discussed. This vision is developed and guided by the newly formed ORD Science Information Management Coordination Board, so that there is actually an organizational entity within our shop that is trying to pay attention to what information resources management should provide to support the types of activities that EPA has to conduct. If you wish, you can download the strategic plan from the Web page (http://www.epa.gov/ord).

What we are trying to achieve in terms of scientific management systems is an end result that combines these five elements: a metadata directory (how do I find something and describe it?); a data format wizard (how do I bring together various types of data that are in a distributed environment?); a geographic module (how do I deal with data that has some sort of spatial context in terms of management and reorganization?); a statistical module (how can I pull statistical routines and combine them with data?); and a modeling module (how do I pull together all those various modules, atmospheric depositions, ground water infiltration, agricultural run off, and so on, with the data that I have?). In application, what we envision is that at the start of a project the principal investigator would come along, enter their project description, and then begin to discover the background material needed to start the project using the metadata directory. As they pull together data they would use something like the data format wizard for the collection and integration of data. As they got into the analysis they would use the other modules, such as the geographic module, the statistical module, and the modeling module, to analyze and add value to the data they pull together. Finally they produce the report, putting another entry back into the metadata directory that essentially tracks the project from there. Thus, the metadata directory as we conceive it is fairly robust, representing various types of metadata objects, data sets, databases, projects, modules, documents, and even multimedia material.

Recommendations

I would like to close with a few lessons we have learned from going through the process of trying to understand the scientific information management environment. I will present these in the form of recommendations. First, I would put forward three general recommendations: 1) view information management as more than just storing or capturing data sets and distributing them; 2) use an incremental type of process–start with the metadata, go to the data, add on the tools, and so on; 3) use the best practical technology. (Using state-of-the-art technology usually means someone gets caught on the bleeding edge and it is tough to be there, so opt for practicality.)

The 20-Year-Rule Recommendation

Data are a resource that needs to be protected. A lot of money goes into collecting data. Some experts speak of the idea of "data entropy," where the value of data is very high as the principal investigator collects it. Gradually the value tapers off and goes off into nothingness as distance and time gets put in-between the data collection effort and later steps. Data entropy doesn't have to happen if there are adequate metadata. We think in terms of a 20-year-rule. The 20-year-rule simply asks: Will someone 20 years from now, not familiar with your data, be able to use and understand the data solely with the metadata that you provided?

I submit that there are not a lot of records out there that could pass the 20-year-rule. But if we could create such records now, we would avoid data entropy in the future. We need robust directories of environmental data information and tools–the types of things that represent more than just data sets. And the metadata standards that we use need to be developed based upon the needs of science, which may mean not built from top down. With metadata, we need to build basic, starting with the basic entry, such as title, abstract, contact, description of themes, spatial and temporal extent, and so on.

Network Architecture Recommendations

A few words about network architecture: We have all come to the conclusion in the field that we need to implement some sort of hybrid of centralized and distributive approaches. A purely centralized approach does not work, and a purely distributive approach does not work either. We probably, at a directory level, need to restrict network nodes to summary-level metadata. Detailed metadata and the data themselves are probably best stored close to the originating sources. So there is some need for data archiving facilities.

Management Recommendations

Some suggestions for management: Plan for and provide adequate resources. Again, it sounds like I have my hand out, but we have all been involved with projects that were at some point in time inadequately funded. In addition, management needs to provide incentives for data-sharing and publication. It also must get people to use the systems that we develop, share the vision for an integrated information management environment, and promote collaborative efforts.

Within the EPA, for example, we have an Ultraviolet Band monitoring program, as well as other atmospheric monitoring programs. Wouldn't it be neat if they were developing an interoperable type of data system? Well, before we weren’t, but now we are. We need to link administrative management and scientific systems to reduce the burden of preparing data documentation. For it is burdensome. It does take time. If you describe the project once for the budget people because you are about to go out and spend extra dollars, for example, can't you use that description as part of your description in your metadata system?

Cultural Recommendations

Perceived threats to loss of intellectual property can impede the use of IM systems. I think mostly those threats are overstated and overemphasized, but they are real. They keep people from using IM. Data-sharing as an approach needs to be promoted, because data-sharing can lead to mutual career advancement. I am reminded that the most influential or the most interesting scientific advancements often are those that are as a result of merging two fields.

In addition, publication of metadata and data should be recognized as a worthwhile effort by peers. That idea is currently supported by several journals, including those published by The Ecological Society of America, The American Geophysical Union, and The Geological Society of America.

Publishing Good Metadata

Finally, publication of good metadata minimizes inappropriate use, another concern that scientists have about giving up their data into a system. The highest priority, though, in terms of doing environmental assessments, is to develop good directories of environmental data to help us find the information that is already out there.

Previous | Next

 


Questions: Email us or Call (215) 893-1561

Copyright © 2003 NFAIS. All rights reserved. No part of this product or service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written consent.

Privacy Policy