Choosing an appropriate DAP server

Title

Choosing an Appropriate DAP Server

Names and Contact Information

James Gallagher <jgallagher@opendap.org>

Date

4 April 2006

Type of Recommendation

Best Practice

Recommendation

Choose a DAP2-compliant server based on the type of data you intend to serve and the server infrastructure you want to use.

Description of standard or best practice

There are several different general data servers which currently support DAP2:

  1. The DAP server from OPeNDAP
  2. The DRDS from OPeNDAP
  3. pyDAP developed by Roberto De Almeida
  4. GrADS Data Server (GDS) from COLA
  5. Ferret Data Server (FDS) from PMEL
  6. Ingrid from the University of Columbia, LDEO
  7. THREDDS Data Server (TDS) from Unidata

Each of these servers has different characteristics which make it best suited to different kinds of data. The following table summarizes the types of data and server technology used by each. Also presented is a more detailed description of each server's capabilities.

Summary or DAP-compliant Server Characteristics

Name Type(s) of data Server technology Notes
OPeNDAP's server NetCDF3; HDF4; FreeForm; DSP; JGOFS; Others CGI (Perl and C++) Extensible using handlers written using OPeNDAP's libdap; The server is available as RPM packages from RPMFIND and other sites.
DRDS SQL Servlet (Tomcat)
pyDAP NetCDF3; Matlab (version?); CSV files; SQL CGI; Twisted; mod_python; ISAPI extension under IIS The pyDAP distribution also includes a client library. Written completely in Python using the Draft NASA/ESE RFC and funded by Google's Summer of Code program.
GDS GRIB; Binary; NetCDF; HDF4; BUFR and GrADS station data Servlet using the Anagram framework Provides server-side analysis which allows complex operations to be performed, results stored and then queried. Uses the GrADS analysis tool to perform those operations.
FDS GRIB, Binary, TMAP-formatted data, ASCII data, NetCDF, remote data provided by other DAP servers and virtual dataset generated by Ferret journal script file. Servlet using the Anagram framework Provides server-side analysis similar to the GDS but uses Ferret as the analysis tool. This server is specialized to perform regridding operations so that discordant variables from several data sets can be easily compared.
Ingrid Local files (binary,GRIB,netcdf,geotiff,PostGres/PostGIS ... usually aggregated), Any DAP server C-based Server cached with Squid Provides analysis capabilities in a partial-execution dataflow framework. Ingrid is part of the IRI/LDEO Data Library and it can provide a number of services beyond data access including datafile,image, and animation generation.
TDS NetCDF, HDF5, GRIB, NEXRAD, any DAP server Servlet The TDS can be configured to aggregate gridded data. Other transport protocols supported: Bulk HTTP and WCS (gridded data sets only).

Description of each Server

OPeNDAP's DAP2 Server
This server understands netCDF3, HDF4 and FreeForm using 'handlers' provided by OPeNDAP. There are also handlers available for a number of other formats, either in source-form only or with limited binary support, from both OPeNDAP and other sources. These include: U of Miami's DSP satellite image processing system; JGOFS; and HDF5. Other groups provide handlers for CEDAR, FITS, et cetera. This server is implemented as a CGI program which runs with a web daemon such as Apache. The CGI is configured to pass specific requests to the appropriate handler program. The server supports HTTP 1.1 caching, although it does not support HTTP/1.1 keep-alives (so each request to the server requires a separate connection).
While the JGOFS and FreeForm handlers can be used to serve table or relational data, they cannot be interfaced to a relational database server. The primary focus of this server is on raster data, even though it is capable of serving in situ data as well.
OPeNDAP's DRDS
The DRDS (DODS Relational Database Server) is implemented as a Java Servlet and uses JDBC to communicate to a relational database. Tables and/or views in the database are selectively made visible as individual data sets (DAP2 DDS objects). In many cases a single (logical) data set is made up of many tables that are usually combined using a SQL JOIN operation. In the DRDS there is no way to specify a JOIN. Instead, the DRDS depends on data providers adding a view to the database which can then be configured in the DRDS. This view is what the DRDS serves. This is good in the sense that that database administrator has complete control over how the database is seen from the outside, but it's less than desirable since the DBA must be involved in configuring the DRDS. Also, older versions of MySQL lacked Views. As of version 5 of MySQL views are supported.
pyDAP
The server comes with four plugins which can be used to serve CSV (Comma Separated Values) files, NetCDF files and Matlab files. A fourth plugin can be used to serve data from a Relational database using the python database connection module. The pyDAP SQL plugin can be used to serve data spread across several tables by joining them in the pyDAP configuration (which is different from the DRDS since pyDAP does not require anything be added to the RDB itself). Here's a description of how to perform that configuration: Joins using pyDAP's SQL plugin.
From Roberto's documentation: The server is implemented as a WSGI application, and can be run on a variety of servers: as a CGI script; with Twisted and mod_python; or even as a ISAPI extension under IIS.
GDS
From the GDS home page: The GrADS Data Server (GDS, formerly known as GrADS-DODS Server) is a stable, secure data server that provides subsetting and analysis services across the internet. The core of the GDS is OPeNDAP (also known as DODS), a software framework used for data networking that makes local data accessible to remote locations. GDS services can be provided for any GrADS-readable dataset: GRIB, Binary, NetCDF, HDF, BUFR, and GrADS station data format. The GDS unifies all these data formats into a NetCDF framework. The GDS subsetting capability allows users to retrieve a specified temporal and/or spatial subdomain from a large dataset, eliminating the need to download everything simply to access a small relevant portion of a dataset. The GDS analysis capability allows users to retrieve the results of an operation applied to one or more datasets on the server. Examples of analysis operations include basic math functions, averages, smoothing, differencing, correlation, and regression; the GDS supports any operation that can be expressed in a single GrADS expression. The GDS is based on Anagram, a modular framework for high-performance scientific data servers. For more information, please see the Anagram home page.
FDS
From the FDS home page: The Ferret Data Server (FDS) is a data server that provides data sharing, subsetting and analysis services across the internet. FDS is based on the Anagram framework and adopts a structure similar to GDS which is developed by IGES.
The FDS provides access control including an 'abuse' filter and differing privileges for different IP number ranges (but no username/password logins). The FDS is designed to work with the Live Access Server, which is a kind of web portal for Earth Science data that supports data fusion operations. These operations allow data from several sources to be regridded 'on the fly' so their variables can be compared. The FDS uses various catalogs to configure which data sets can be manipulated and it can return a THREDDS catalog of those data sets.
Ingrid
From the Ingrid/LDEO home page: The IRI/LDEO Climate Data Library contains over 300 datasets from a variety of earth science disciplines and climate-related topics. It is a powerful tool that offers the following capabilities at no cost to the user:
  1. access any number of datasets;
  2. create analyses of data ranging from simple averaging to more advanced EOF analyses;
  3. monitor present climate conditions with maps and analyses in the Maproom;
  4. create visual representations of data, including animations;
  5. download data in a variety of commonly-used formats, including GIS-compatible formats.
TDS
From the THREDDS home page: The THREDDS Data Server (TDS) is a web server that provides metadata and data access for scientific datasets, building on and extending a number of existing technologies:
  1. THREDDS Dataset Inventory Catalogs are used to provide virtual directories of available data and their associated metadata. These catalogs can be generated dynamically or statically.
  2. The Netcdf-Java library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other binary formats such as GRIB and NEXRAD into a "Common Data Model" (CDM). This is an abstract data model that the netCDF (Unidata), HDF5 (NCSA) and OPeNDAP (University of Rhode Island) developers are using to converge their respective data models towards. The CDM also adds "Georeferencing Coordinate Systems" and specialized "Scientific Data Type" layers, which provides the semantics needed to convert datasets to other protocols and formats such as those required by GIS systems. The library adds this information by parsing well known "attribute conventions", and by using THREDDS metadata to add missing coordinate system information and other metadata.
  3. An integrated server provides OpenDAP access to any datasets that can be read through the Netcdf-Java library. OpenDAP is a widely used, subsetting data access method built on the HTTP (web) protocol.
  4. An integrated server provides bulk file access through the HTTP protocol.
  5. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Coverage Service (WCS) protocol for any "gridded" dataset whose coordinate system information is complete. Users can add missing information to a dataset where needed, in order to make this work.
Also notable is that the TDS can aggregate gridded data using one of three schemes. These aggregations are configured using THREDDS catalogs.

Rationale/Justification

N/A

References

Current Usage

N/A