Preprints of the
Metadiversity
Conference
Proceedings
Session 3: The Challenge in Earth Observation, Ecosystem
Monitoring, and Environmental Information
Environmental
Metainformation in the Work Program of the European
Environmental Agency
STEFAN JENSEN,
Project Leader, European Topic Centre/Catalogue of Data
Sources (ETC/CDS), European Environment Agency
|
ABSTRACT
In 1996 ETC/CDS started
operation under the task and vision to build a
metainformation system on the environment on the
European scale on behalf of the European
Environment Agency (EEA). Following the guidance of
G7 metadata initiative, the ETC/CDS advisory
committee and the EEA representatives involved -
building on thorough experiences in developing a
metainformation system on the environment for
Austria and Germany - the CDS fields and data model
was developed and agreed upon. Technically, it is
based on the GELOS standard - adding some optional
fields through national demand. To meet the need
for a multilingual environmental thesaurus, GEMET
was built by merging existing European thesauri and
adding translations for missing languages. The main
purpose of the thesaurus was defined as indexing
metainformation. Customers of the systems are seen
to be the EIONET, the general public, and national
initiatives. From this ground, software development
results in a flexible input tool (WinCDS) and a
state-of-the-art retrieval tool (WebCDS). For
thesaurus purposes a maintenance tool as well as a
simple "thesaurus browser" (ThesShow) is available.
Metainformation collection started in 1997 with
other ETCs and EEA, following a supply-driven
approach (to register available and used sources).
Only after agreeing on the selection criteria early
in 1998 could they be used to follow a
demand-driven collection approach. Interpreting the
vision as the call for supplying - sooner or later
- a seamless access to all kinds of environmental
information through a catalogue, the GELOS+ fields
are shaped according to recommendations from
American and European standardizing bodies (both
currently merging into the international ISO
15046-15). This enables appropriate description of
spatial environmental data. This goes parallel with
the implementation of spatial visualization and
query of metadata in the CDS software. Collection
by now results in a database with 1280 data sources
and 550 addresses. Taking for granted that filling
and updating of the catalogue will be a core
activity of the ETC/CDS in the years to come,
several crucial decisions need to be made. The
maintenance of these sources is currently hampered
by the lack of binding updating obligations and
some changing policies.
The EEA's strategy of a
"European Reference Centre on Environmental
Information"must be used to overcome these
shortcomings by: clearly defining the role of CDS
as the entry point (catalogue) for retrieval of
quality-assured environmental information;
identifying the integration with EEA data warehouse
and EEA GELOS server; involving the ETCs in a
constant process intensively using the selection
criteria for national data collection. The interest
of the majority of the member states in the usage
of the CDS or a similar approach shows the
opportunity to use it as a harmonizing approach to
manage both their own business case in
metainformation and their European reporting
obligations. To further support this, reporting
obligations from EU legislation are currently added
to the database. Beyond this, the CDS system will
develop into a distributed environmental
information system that forms an entry point to
various environmentally related sources, located at
distributed providers - no matter if these are
information from space or from earth science, from
mapping authorities or from monitoring networks -
bridging the gap between public science and
administration, between Europe and its regions, as
well linking to global services. |
My name is Stefan Jensen. I am
the project leader of one of the nine European Topic Centres
set up by the European Environment Agency. The European
Environment Agency (EEA) was installed in 1994 in order to
do reporting on the state of the environment.
The Organization
The European Observation and
Information Network was established through the work of the
EEA. As I mentioned, the network consists of nine Topic
Centres. Most of them are subject-oriented. For example, one
deals with nature conservation (this is where the
biodiversity aspect would fit in). One deals with air
pollution, one with soil, and so on. All the major topics
are covered. Our Topic Centre–the Catalogue of Data Source
(CDS) Topic Centre–deals with the information aspects of the
network. One of our principal tasks is to gather
metainformation, or metadata, in order to facilitate access
to information collected by other partners in the network.
This Topic Centre is an organization consisting of nine
active partners from four European countries–Austria,
Germany, Italy, and Sweden.
The work functions like this:
There are 15 member countries in the European Union. But we
are working with 18 countries (extended by Iceland,
Liechtenstein, and Norway) at the moment. These countries
named 18 National Focal Points. In addition, for each of the
Topic Centres you find in the member states a so-called
National Reference Centre (NRC). The NRCs carry out the work
in individual topic areas. So the core of the network
consists of about 200 contacts involved in the work.
Then there are other
institutions, such as scientific organizations, that are
named by the member countries and that play an important
role in environmental reporting. These other institutions
are also part of the network and, to different degrees, they
are involved in the current work.
The Task
Our task is, speaking on the
meta-level, to create a European-wide metainformation system
on the environment.
We began this task in 1996 by
conceptualizing and implementing a common data model, a
common language. The next thing was to promote the new data
model to institutions that were not involved in this
process. This initial effort also addressed the issue of
existing national environmental information systems–metainformation
systems–within the member states (which are, to a certain
degree, already available, although the vast majority of the
member states do not have them yet).
Next came the development of
some pieces of software--first for data collection and
second for the retrieval of data. For example, an important
issue in Europe is the fact that there are a total of 13
languages that need to be addressed. So it was thought to be
beneficial that building a multilingual environment be a
part of the work.
The development of selection
criteria was another issue we had to address. When we
started data collection, we had not yet set selection
criteria. As a result, we had to define some selection
criteria based on the kinds of sources we were using.
We are continuing to collect
data. We also have to maintain the meta-database, and we
have to supply access to distributed systems, which we are
now only starting to do. So at the moment, we have no
distributive system yet. But like the Global Change Master
Directory in the U.S., we have one database that is
currently used.
The User Groups
Who are the user groups? Some of
the user groups, including the EEA, the Topic Centres, and
the National Focal Points, are pretty obvious as core users.
But other users are not so clear. They include institutions
running national systems (national metainformation
initiatives), other institutions working in the field, and
the "general public." There certainly are various other
institutions that might be interested in these kinds of
data, but we are still in a learning process about them and
other potential users.
The Data Models
What is our data model? Where
are we building on? We are building on the Global
Environmental Information Locator System (GELOS) described
earlier in this conference by Eliot Christian. We took the
GELOS element set and had member states add certain fields,
which were not made mandatory. Neither are the fields we
added mandatory. However, we do have certain mandatory
fields, and we encourage our users to fill in the mandatory
fields. We also encourage the use of these mandatory fields
in the construction of the software. Still it is possible to
register the entries without mandatory fields, but I have
seen this in only a couple of applications. If the
information is really too thin–if, say, you have only three
or four fields filled in–then it might not be very useful,
and what you get out of the system may not be what you
thought you were going to get.
We are also conducting various
standardization initiatives, which allow us to at least meet
Level I requirements (Level I requirements mean that you
only cover first entries). We use other standards to build a
thesaurus.
As with GELOS, Z39.50 will also
be our protocol for accessing distributed systems. We are
running profiles such as GELOS on it. But at the moment we
are using GELOS not for a distributed system but for a
description of elements we use in one database. SGML is the
current data exchange format. I can see from the discussion
here that XML is probably the next-generation format for
such things, but SGML is a good start in moving toward XML.
Metainformation
I would like to reflect briefly
on our experiences with metadata and why we introduced this
kind of metainformation. I think that metainformation is
relevant to a pyramid full of sources, including databases,
stations, documents, maps, images, tools, and projects. The
top of the pyramid is the locator system–the entry to this
information–the tip of the iceberg. The bottom of the
pyramid is founded by the access to the data themselves.
CDS-Based Harmonization
Our system is called
CDS–Catalogue of Data Sources. We see that there are some
national metainformation systems around in Europe that have
a very high level or degree of detail but still do not cover
everything. This is why we concentrate on a common subset to
all of them.
Harmonization is, therefore, an
issue. What we have achieved in this area is that member
states are adopting the data model for the design of their
national systems. You can imagine that each country has its
own specialties and, like some people working with
biodiversity, I imagine each country will have its own
specific ideas. For example, some would like to have
specific fields about plants or specific fields about
beetles, and so on, just to describe the individuality of
the source. Something like this is happening here as member
states build on the CDS, including GELOS.
However, there are some
countries that stick very closely to what we are doing. They
are building their national metainformation systems on our
software, which is based on MS ACCESS. They can easily
change and adopt the software to their needs.
The CDS also is used in some
supranational projects. One example is the Alpine
Convention, which can be described as a biodiversity
convention for the Alpine region.
Tools
We are also building various
tools. One tool I mentioned already is a thesaurus, which
can be used for indexing and retrieving metainformation. The
one we are building is just a general thesaurus, so it is
not a thesaurus on biodiversity. If you look into it you
might find some terms out of your field, but it will
probably not meet all the specific needs of individuals and
scientific domains. However, the general thesaurus is a
starting point and includes quite a number of terms (5,400!)
in–at the moment–11 languages. (Greek and Islandic are
currently missing but should be included by April 1999, by
which time we want to finish the thesaurus.)
Software developments have
resulted in a flexible input tool–WinCDS. We also have a
state-of-the-art retrieval tool–WebCDS. This WebCDS tool is
based on JAVA, which many of our clients are not able to use
effectively because of firewall problems. Win CDS allows the
usage of Structured Query Language (SQL) databases, and it
has an easy search interface for HTML customers.
Criteria and Priorities for
Collection
Now about the data that is in
such a system: It was decided by the EIONET group, by the
member states, and by the European Environment Agency that
we should have at least a small central catalogue with core
information where a certain level of quality control can be
applied. This catalogue will include the following
information:
The Directory of EIONET partners
Items produced by the EEA/EIONET
Data requested by the EEA/EIONET on a regular and scheduled
basis
Data deliveries to the EU as a result of legislative
reporting
Data requested by several international bodies
Environmental databases operated by international
organizations and environmental conventions
National State-of-the-Environment Reports
National Environmental Monitoring Programs
National Environmental Resource Libraries
National meta-databases or reference databases on the
environment
WebCDS Content
What is at the moment contained
in this catalogue system? Here are the environmental themes
and their percentages:
environmental policy (20%)
information (8%)
water (8%)
pollution (7%)
general (6%)
legislation (5%) |
biology (5%)
air (4%)
administration (4%)
natural areas, landscape,
ecosystems (4%)
rest (29%) |
Themes like those listed above,
which are the most popular themes, are a part of the general
thesaurus. The 5,400 thesaurus terms are assigned to 40
themes. The terms are used for the indexing; then you can
assign them to the themes. The themes identified here also
show that at this stage there is some focus on an
administrative catalogue, since it includes quite a bit of
environmental policy information.
Previous |
Next
Questions:
Email us or Call (215)
893-1561
Copyright © 2003 NFAIS. All rights
reserved. No part of this product or service may be
reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written consent.
Privacy
Policy |