Domenico Hahah Kakw

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Journal of Digital Information, 

Vol 2, No 4 (2002)

Thematic Real-time Environmental Distributed Data Services (THREDDS): Incorporating


Interactive Analysis Tools into NSDL
Ben Domenico, John Caron, Ethan Davis, Robb Kambic and Stefano Nativi*
Unidata Program Center, University Corporation for Atmospheric Research
P.O. Box 3000, Boulder, CO 80307, USA
*University of Florence - Polo di Prato, Piazza Ciardi, 25, 59100 Prato, Italy
Email: ben@unidata.ucar.edu

Abstract
The overarching goal of Unidata's Thematic Real-time Environmental Distributed Data Services (THREDDS) is to provide
students, educators and researchers with coherent access to a large collection of real-time and archived datasets from a
variety of environmental data sources at a number of distributed server sites. The datasets will be conveniently accessible
from a collection of THREDDS-enabled data analysis and display tools. THREDDS will provide real-time data delivery via
reliable, event-driven "push" technology as well as transparent access to datasets using "pull" systems that make it possible to
access data on remote servers as if they were on the user's own computer. The system will be built on a set of software
components and data servers that are already in operation or under development. The heart of THREDDS is metadata
contained in publishable inventories and catalogs (PICats). The creation, publication and distribution of PICats will be
facilitated by the discovery system and services provided by DLESE. For example, sites receiving real-time environmental data
can create PICats describing data products automatically as they arrive using decoders and crawlers. On the other hand, since
PICats do not have to reside on the server with the data, researchers will be able to create PICats for online publications that
point to datasets residing on several data servers. Similarly, educators will incorporate PICats of illustrative datasets into
modules that also include tools for data analysis and visualization, and students will be able to use PICats to point to datasets
related to their research projects, just as they now use URLs to point to relevant documents. This paper presents an overview
of THREDDS and an update on the current status.

1 Overview
In "The Absorbent Mind" Maria Montessori described education simply and elegantly: "It is not acquired by listening to words,
but in virtue of experiences in which the child acts on his environment". On a different level with more detail, the National
Science Education Standards describe a learning process based on inquiry: "Inquiry is a multifaceted activity that involves ...
using tools to gather, analyze, and interpret data; proposing answers, explanations, and predictions; and communicating the
results". These quotes capture the essence of the interactive data environment that Thematic Real-time Environmental
Distributed Data Services (THREDDS) will foster.

Each second of each day, observing systems around the globe are gathering data that provide snapshots of almost every
measurable aspect of our environment: satellites monitor cloud movements, atmospheric constituents and the temperature of
the land and ocean surfaces. Lightning strikes are recorded as they occur throughout the country. Global positioning system
and seismic sensors monitor tiny movements as well as major shifts of the planet's tectonic plates. Modeling programs are
being developed that use the current data to forecast future evolution on scales ranging from short-term weather forecasts to
very long-term climatic changes.

The goal of this work is to expand the means by which learners -- including students, educators, scientists and the general
public -- can use these vast resources to perform their own inquiries, i.e. to "act on their environment". Figure 1, a screen
dump from a prototype of one of the THREDDS interactive data analysis and display applications, illustrates a few of the ways
in which users can interact with environmental datasets that are accessed from remote servers as if they were on local disks.
In this particular instance, the display is a 3D rendering of the jet stream as predicted by a supercomputer model dataset on a
server at the National Center for Atmospheric Research (NCAR).
Figure 1. Interactive data analysis and display application (the screen image above was created by software engineer Stuart
Wier of the Unidata Program Center MetApps project)

Data collections are a cornerstone of the scientific research and education environment. While the amount and variety of earth
system data are increasing daily, the systems for making these data readily available and useful to the academic community
have not kept pace. We envision a framework -- a scientific data web -- that will allow faculty and students to search (in the
vocabulary of their particular discipline) for available data and to find them, regardless of where the data reside. Just having
the data is not enough, however. Even the many spectacular pictures generated from datasets available on the Web present
an essentially passive view of what is happening. To interact with the environmental phenomena represented by the data,
users need specialized visualization and analysis tools that enable them to manipulate and examine the datasets themselves.
They need to create their own visual images, and they must be able to manipulate those images in 3D space and perhaps
even "fly" through and around them. It should be possible to move a probe around in the image to see how the temperature
or pressure changes with depth in the ocean or height in the atmosphere at different points on the globe. Moreover, it is
important to overlay images of data from different sources. For example, at the time of a severe thunderstorm, one might ask
how the information about rainfall from a nearby radar site correlates with measurements of stream flows in the local river
basin. If those measurements indicate a problem is arising, it would be valuable to overlay predictions from forecast
(meteorological and hydrological) models. Ultimately it may be important to include demographic information about
populations in threatened areas.

As a two-year project with limited resources, THREDDS clearly will not do all of this. However, our goal is to build key
components that will make such a system possible and to incorporate them into a working prototype that includes a large
number of data providers, a group of interactive tool builders, metadata experts, and representatives of the digital library
community. The broad access to data and analysis tools envisioned in the prototype scientific data web will enable educators
to work with data in classrooms, scientists to examine and incorporate data from other disciplines, and students to explore
and test their ideas using the yardstick of data. Indeed, in the end, anyone with Internet access will be able to incorporate
scientific data into their everyday lives more easily.

2 Strategy: a Variety of Tools and Data Sources Bound by Metadata Catalogs

2.1 Interactive Data Analysis and Display Tools


The strategic goal of THREDDS is to provide students, educators and researchers with coherent access to a large collection of
real-time and archived datasets from a variety of environmental data sources at a number of distributed server sites. The
datasets will be conveniently accessible from a collection of THREDDS-enabled data analysis and display tools. The arsenal of
tools includes Web-based "thin" clients" that allow the learner to browse and manipulate data using the processing power on
the servers; interactive data analysis applets that can be embedded directly into html educational documents; full "thick"
client applications that harness the computing power and flexibility of the user's own workstation while accessing data from a
collection of remote servers.

2.1.1 "Thin" Client Browser-based Analysis and Display Systems

On a superficial level, the browser-accessible data analysis and display tools look similar to the more traditional Web sites that
offer a display of images generated from data. There is one important difference: namely, these thin clients enable the user to
interact directly with the data by using a set of analysis tools that run on the server. An example of this powerful server-based
approach resides at the Climate Data Library of the International Research Institute (IRI) for Climate Prediction at Lamont
Doherty Earth Observatory (LDEO). The Climate Data Library enables interactive analysis of datasets on the server via the
INGRID system developed by Benno Blumenthal. A second example is the Live Access Server (LAS) which was developed at
the Pacific Marine Environment Laboratory (PMEL) under the direction of Steve Hankin.

2.1.2 Interactive Data Analysis Applets Embedded in Educational Materials

The screen shot in Figure 2 is part of a Web page from the collection of interactive WeatherWise (WXWise) applets developed
by a team led by Tom Whittaker and Steve Ackermann for use in courses at the University of Wisconsin-Madison. This
particular applet accesses a current infrared satellite image and allows the learner to see how a portion of the image would
change if the temperature were higher or lower than it actually is. The learner is then asked to respond to questions at the
bottom of the page. It is an illustration of an embedded Java applet that allows for direct interaction with real-time
environmental data stored on THREDDS servers. You can activate the WeatherWise applet in a Java-enabled browser by
clicking on the image.

Figure 2. Interactive applet embedded in educational module Web page (click on the image to activate the applet)

2.1.3 Fully Interactive "Thick" Client Applications

This animated loop in Figure 3 is a series of screen dumps from a prototype application of the Unidata MetApps project. The
loop shows how the user can interact with data on a remote server. The panels on the left show the parameters available in
the dataset under investigation -- along with a set of options for viewing the data. The specific data that have been selected
for the 3D rendering are views of the jet stream predicted by a supercomputer forecast model run at the National Centers for
Environmental Prediction and delivered to a THREDDS server at NCAR via Unidata's Internet Data Distribution (IDD) system.
Using the Distributed Ocean Data Systems (DODS) client-server protocol, the application was able to bring across only the
subset of the data needed for the visualization. The loop illustrates several aspects of the image that were generated by the
user manipulating the 3D image with her mouse.
Figure 3. Fully interactive "thick" client application (the image above is another screen dump by Stuart Wier of the Unidata
Program Center MetApps project)

2.1.4 Embedding Interactive Data Analysis Applications into Publications

In the long term, the intention is to develop THREDDS capabilities to the point where one can embed pointers to datasets and
tools into online publications such as this one. In the meantime, it is still necessary to install some client-side software
components on your own computer. If you're interested this can be done for the current beta test version of at least one of the
client applications. There are two approaches to this. One is to get the full Java application running on your own computer. The
other is to use a Java applications startup facility called WebStart. Both approaches are described by Stuart Wier at
http://www.unidata.ucar.edu/staff/wier/index.html.

2.2 Distributed Data Sources


The schematic in Figure 4 shows how a user running a THREDDS client on a local workstation can access data from a number
of distributed servers, each of which has its own emphasis or "theme". Many of the servers are in turn populated with
environmental data in real time via the IDD system that has been delivering data to nearly 100 universities for the last seven
years. A few of these servers already exist, others are being built, and a couple (the streamflow and demographic data
servers) are still in the formative idea stage.
Figure 4. Client data access from distributed data servers

Figure 5 shows how data from a set of servers can be plotted together in an interactive application. Only the required portions
of the datasets are transmitted over the network and the application can allow for the wide variety of spatial and temporal
resolutions for each data element. This particular screen image is one frame from an animation showing the evolution of the
data over time.

Figure 5. Interactive analysis and visualization of data from distributed servers (The screen image above was created by Don
Murray lead software engineer on the Unidata Program Center MetApps project. The prototype application that generated the
image was developed by Unidata in collaboration with the Atmospheric Technology Division at the National Center for
Atmospheric Research)

2.3 Metadata Catalogs


At the heart of THREDDS is metadata contained in publishable inventories and catalogs. Based on XML, these inventories and
catalogs can be created in many different ways. Data providers receiving real-time environmental data are instrumenting
decoders to create entries describing data products as they arrive and become part of the data server inventory. Crawlers are
being implemented to create inventories by traversing existing retrospective data collections. Since catalogs do not have to
reside on the data servers, researchers will be able to create specialized or personal catalogs for research publications that
point to datasets residing on several data servers. Educators will incorporate catalogs of illustrative datasets into educational
modules that also include tools for data analysis and visualization. Just as they now use URLs to point to relevant documents,
students will eventually be able to reference datasets and analysis tools related to their research projects. Since the
inventories and catalogs are text-based, they can be "harvested" and indexed into Digital Library for Earth System Education
(DLESE) and other digital libraries.

The screen shot in Figure 6 is also from a prototype client data analysis application, part of the Unidata MetApps development
project. The screen illustrates key aspects of THREDDS data catalog access from within a client application. First, the pop-up
"Choose DODS Dataset" window enables access to several catalog servers on different machines on the Internet. The lower
part of the pop-up window shows a menu of data items available on one of the servers. This particular catalog has dataset
entries arranged three different ways: by variable, by model, and by experiment. The details of the individual catalog entries
are not important, but one should note that the words associated with each dataset or collection of datasets can be chosen by
the creator of the catalog and that the catalog itself can refer to datasets and collections of datasets on a variety of data
servers.
Figure 6. Searching distributed data catalogs from within applications programs

Figure 7 is a screen shot from another MetApps client which depicts a catalog that is automatically generated as real-time
weather forecast model data arrives at the motherlode server at NCAR. In this case, the main menu items are the names of
the various models and one of the model collections, SST-A, has been opened to show the individual datasets available on the
server. In essence, the hierarchical list in this case comprises an inventory of the model output datasets available on the
server at the time.

Figure 7. Data server inventory listing as seen in analysis and display tool (click on the image to see the current version of the
catalog - needs an up-to-date version of Internet Explorer)

Figure 8 is a different view of the same catalog shown in Figure 7, seen from within an application accessing the catalog. The
view below shows the actual XML code for the catalog as seen from within the Internet Explorer browser. If you are viewing
this page with a recent version of Internet Explorer, you should be able to look at the current version of the catalog by clicking
on either Figure 7 or Figure 8.

Figure 8. Data server catalog in native XML form (click on the image to see the current version of the catalog - needs an up-
to-date version of Internet Explorer)

3 Teams
THREDDS is a highly collaborative project, and this section lists of the partners working on the three main areas of THREDDS
development: a set of data provider sites; a group of software developers working on systems for data analysis and display;
and a set of metadata experts relating to Earth system data collections.

3.1 Data Providers


The following institutions have agreed to be data-server partners:
The National Climatic Data Center, NCDC, including the NOAA Operational Model Archive and Distribution System
NOMADS
The National Geophysical Data Center, NGDC
The Space Science and Engineering Center, SSEC,at the University of Wisconsin-Madison for GOES satellite data
The International Research Institute/Lamont Doherty Earth Observatory, IRI/LDEO
The Pacific Marine Environment Laboratory, PMEL
The National Center for Atmospheric Research, NCAR
The Climate Diagnostic Center, CDC
Fleet Numerical Meteorological and Oceanographic Center, FNMOC
George Mason University/Center for Oceans Land Atmosphere GMU/COLA
University of Alabama Huntsville for satellite and hydrology data
The Unidata community of 90 universities via their Abstract Data Distribution Environment (ADDE) servers.

Note that NCAR and SSEC will serve as testbed sites for server-side software. As the project progresses and the common
underpinnings are tested at the initial sites, additional sites will be added. Sites under consideration are:
Incorporated Research Institutes for Seismology Data Management Center, IRIS DMC
University of Oklahoma for radar data
Atmospheric Radiation Measurement, ARM
University of Florence Interoperability System for supporting the Italian Scientific Community

working in the Earth Observation from the Space (SINOTS) for European satellite data.

It is not possible in this article to provide a detailed description of the content of each of these sites. Some are large national
data centers. To give a sense of the magnitude and breadth of a typical THREDDS server, the prototype systems at NCAR are
initially targeted to handle about 1 terabyte of data online. This will hold several months of data arriving at the site at a rate of
about 10 gigabytes each hour. During busy hours, more than 1 gigabyte of data arrives at the server, with several products
each second. The products range from satellite images and the output of numerical weather prediction models that are
hundreds of megabytes to 80-character reports from individual weather reporting stations from around the world. In between
the product list includes lightning strike data; images and four-dimensional volume scans from NEXRAD radar sites;
atmospheric data recorded by commercial aircraft in flight; and vertical profiles taken by weather balloons. By the end of the
project, we hope to find resources to be able to store a full year of data on the prototype server. The reader is encouraged to
visit the sites to get a more detailed understanding of the holdings.

3.2 Client Analysis and Display Tools


The THREDDS prototype will provide examples of a wide variety of working applications that use our metadata framework to
find, analyze and display data from server sites. This will demonstrate an end-to-end system for data access and visualization.
The following developers will incorporate our client-side data-access components (class libraries and metadata access) into
their own data manipulation tools:
Live Access Server (LAS, PMEL, Steve Hankin). LAS illustrates the use of a Web-based (thin) client with the bulk of the
analysis and display generation done on the server side.
Ingrid (IRI/LDEO, Benno Blumenthal). This is another example of a system enabling analysis and display of data via a
Web browser.
WXWise applets (the University of Wisconsin-Madison, Tom Whittaker). These applets illustrate the use of Java to embed
data-analysis and display tools directly into educational modules on a Web site.
Virtual Geophysical Exploration Environment (VGEE, formerly The Virtual Exploratorium, the University of Illinois, West
Chester State, DLESE, and NCAR, Don Middleton). This application incorporates the educational functions directly into the
data analysis and display tool itself.
Data Discovery Toolkit and Foundry based on EDMI (Earth Data Multimedia Instrument, New Media Studio, Bruce Caron).
These are a set of data-analysis and display tools based on IDL and Macromedia Director. They can be used to generate
elaborate educational modules.
Meteorological Applications (MetApps) (Unidata Program Center, Don Murray). A set of pure Java, platform-independent,
two- and three-dimensional data-analysis and display tools-based on the VisAD infrastructure.
Visualization for Algorithm Development (VisAD) infrastructure from SSEC (Bill Hibbard of the University of Wisconsin-
Madison in conjunction with the Unidata Program Center).
Others: Some software packages (MatLab, Interactive Data Language (IDL), Man-computer Interactive Data Access
System (McIdas) have already been adapted to acquire remote data via DODS or ADDE. Even if these systems are not
adapted to take direct advantage of Catalogs or other THREDDS advances, their users will benefit from data available on
THREDDS servers.

3.3 Metadata Expertise


As noted earlier, the technological core of this initiative, the crucial component now under development, is a system for adding
the semantic description of scientific datasets necessary for data manipulation and discovery. It must interoperate with data
providers, data servers, data clients, catalog servers, discovery systems and other middleware components. Investigators will
select key scientific datasets and semantic descriptions developed for an end-to-end demonstration of the utility of this
approach. Unidata staff will work closely with DLESE to ensure that the resulting metadata system will interoperate effectively
with the National STEM (Science Technology Engineering Math) Digital Library (NSDL).

Partners with whom we will consult on matters of metadata and interoperability are:
The Earth System Markup Language (ESML, University of Alabama-Huntsville);
The DIstributed MEtadata System (DIMES, George Mason University);
The aggregation data catalog that is part of DODS (University of Rhode Island, Unidata);
Digital Library for Earth System Education, DLESE;
The University of Florence (Italy). Prof. Stefano Nativi is acting as a liaison with the international metadata standards
community.

4 Conclusions
In perhaps a different way than Maria Montessori originally envisioned, THREDDS will provide a way in which we can learn by
"acting on our environment". Much work remains to be done to achieve the long range THREDDS mission of developing an
environmental data web that allows learners of all ages to find and interact with datasets that illustrate the current state of the
global environment, but we have designed the system and have begun construction. This article provides a glimpse of the
interactivity that will be possible, a sense of the range of data types and partners involved in the effort, and a basic
understanding of architecture of the system and the approach being taken to make it a reality.

Acknowledgements
The authors wish to thank the National Science Foundation Division of Undergraduate Education for making this work possible
as part of the NSDL initiative under the direction of Lee Zia. THREDDS is a highly collaborative project, so thanks are in order
to all the individuals and organizations who are working with us as collaborative partners. These partners have been cited
individually in the article.

You might also like