THREDDS Workshop Technical Notes (6-8 May 2002)
last update: May 23, 2002
Comments to
John Caron
or
THREDDS mailgroup
Intro
-
Logistics (Linda)
-
Intro's
-
Goals (Ben)
-
Digital Library overview (Ben)
THREDDS Tech Overview (John) slides
Discussion:
-
Real-time vs Archived
-
What does real time mean?
-
Realtime can move into Archive category, and appear multiple times.
-
QC issues
-
URL Permanence
-
What is a dataset?
-
Same datasets from real-time server and in archive
-
Same dataset, multiple locations, multiple representations, etc
-
Granularity
-
Descriptive metadata or Data Model?
Data Provider Tools
-
Overview (John)
-
ADDE Cataloger (John)
-
Catalog Generator (Ethan) slides
-
Dynamic Catalog Gen (Robb) slides
Discussion:
-
Identity of datasets
-
Catalog Server
-
What should the query language be?
-
What subsetting capabilities are needed?
-
Structure of catalog: data provider vs data user model
-
multiple models for users: for scientists; for educators; etc (depending
on the users role)
-
encapsulate multiple models in metadata or in client
-
hierarchy vs ontology (e.g., mass store needs one view, data provider needs
another, data user needs another, ...)
-
Annotation
-
Ontology building/google indexing
-
XML is representation for communication - seperate from backend data storage
Data Client
-
IDV (Don)
-
THREDDS java library (John)
Discussion:
-
Suggestion: Command line client for THREDDS catalog access
-
Convention discussion:
-
COARDS, CF, ...
-
3rd Party Ancillary info: correct metadata in data file
-
Is the THREDDS protocol (XML, etc) a metadata representation of the data?
For discovery only or part of the data model? How does it fit in with correcting
the metadata in a data file?
Discovery Centers
-
Overview (Ben)
-
DLESE and NSDL (John Weatherly)
-
GCMD Connection (Robb)
Participant Presentations
-
Gene Major (GCMD)
-
GCMD overview
-
GCMD/NASA: 11,000 datasets indexed, uses DIF
-
4 tier parameter hierarchy: category - topic - term - variable standardized
vocabulary
-
has HTTP and RMI API for programmatic access.
-
Benno Blumenthal (IRI/LDEO) - Ingrid Overview
-
Climate data Library
-
Both a web-based client and a DODS server
-
Data semantics a “level above DODS”
-
Need: standard catalogs with variable granularity
-
Can use nested catalogs (issue: select collection as dataset?)
-
Also needs attributes in collection, dataset element
-
Dan Holloway (DODS)
-
generate catalogs on the fly, need to be standardized
-
AIS: 3rd party metadata
-
Aggregation needed for InSitu / Ocean Station Data
-
scanner/crawler needs to be able to show other collection/dataset organizations
than local or DODS-dir directory
-
DODS "wizard" for selecting dataset - integrate THREDDS catalog.
-
WMS interface
-
John Weatherley (DLESE)
-
resources that go into their catalog should be documents.
-
keyword search available on metdata, site pages
-
focus on educational resource documents now; ready for direct data cataloging
in 2-4 years;
-
uses OAI transfer protoc
-
Roland Schweitzer (CDC)
-
Climate data archives, DODS/Grads, COARDS NetCDF
-
LAS web client
-
MySQL database has metadata extracted from netcdf
-
Wants to Aggregate (AggServer or GDS) most data
-
Doesn't need CatGen tool, will spit it out from DB
-
How organize
-
Wants the ability to create a variety of standard metadata representations
(LAS XML, FGDC, THREDDS XML, etc) from his DB (catalog?) and the existing
metadata in datasets
-
Tie in with Don Denbo's MetaArchitect: reads netcdf, output FGDC
-
Luca Cinquini (ESG & UCAR DMWG)
-
Earth System Grid / GLOBUS = Grid computing, provides reliability, security,
resource management
-
DMWG will collaborate on catalog format, metadata standards
-
want to use same cataloging structure or at least be interoperable with
metadata
-
Joe Wielgosz (COLA/GrADS/GDS [& GMU?])
-
GDS is a DODS server implemented as a Java Servlet, with GrADS as back
end. Can handle netcdf, GRIB, HDF, other binary; working on BUFR, in-situ
data.
-
Has an internal Catalog class.
-
Has created a servlet framework, factoring out the GDS specific stuff.
-
concerned about profusion of THR datatypes.
-
GDS as THR client: use command line tool
-
ability to have dataset aliases, so you dont have to maintain info in multiple
places in the catalog
-
ability for one dataset to have multiple access methods (eg DODS &
FTP)
-
Phil Sharfstein (FNMOC & GODAE)
-
Realtime in-situ oceanographic data, ocean model products, etc
-
FNMOC, NoGAPS, COAMPS, NAVO satellite
-
Many diverse formats: GRIB,HDF-EOS
-
Access: DODS, FTP, HTTP
-
Uses LAS, GrADS
-
Need: standardized catalogs, browsing interface
-
Want consistent interface: browse and retrieve
-
Stefano Nativi (SINOTS)
-
Add THREDDS catalog as a data source
-
ISO 19115, FGDC metadata standards
-
Ted Habermann (NGDC)
-
global topography, ecosystems
-
have FGDC metadata in RDB
-
will use LAS/DODS to serve with DC metadata
-
how to get DCed (= Dublin Core with educ. extensions) from FGDC?
-
Have DODS Grids -> ArcIMS working (ArcGMS)
-
Advocates RDB technology
-
Blend data and metadata
-
ArcIMS - LAS connection
-
Chris Klaus (ARM)
-
Atmospheric data from the Atmospheric Radiation Measurement (ARM) Program
(DOE)
-
real-time visualization: Web client with IDL backend
-
DODS (AggServer?)
-
NSDL partner
-
Mark Laufersweller (CAPS/CRAFT, UofOk) -
-
32 88D Radar level 2 data served via LDM
-
just starting to think about providing pull, what should we use?
-
Compressed data is factor of 20 or more
-
What is best way to organize and catalog?
-
Tom Whittaker (SSEC) -
-
GOES East and West Satellite data, using ADDE
-
Have a MODIS direct receiver serves via DODS
-
Will put THREDDS Server at SSEC, close to ingest, using both DODS and ADDE
-
also author of educational applets, with Steve Ackerman
-
Harry Edmon (UofW)
-
MM5 data, I-90 Snoqualmi pass data, Ferry data
-
Lots of unique datasets, including MM5 regional model runs
-
No data servers yet, how do we get started?
-
Marty Landsfeld (Planet Earth Sci/New Media Studio)
-
Using MacroMedia's Director (multimedia development tool) to develop educational
materials.
-
Using IDL for the data display backend.
-
The IDL/Director combo is available to NSDL
-
develop and share IDL code and educational modules.
-
Steve Hankin (PMEL, LAS/Ferret)
-
Described LAS as "traffic cop", can use Grads, Ferret as back end.
-
LAS has a metadata DB using XML for editing and interchange. Several sections
of metadata: dataset, UI, product, visualization.
-
Possibilities for THREDDS/LAS: Convert LAS XML config files into THREDDS
catalogs; LAS use THREDDS for config; return products pointed to by THREDDS
catalogs (i.e., use THREDDS to catalog LAS products)
-
Rob Raskin (NASA/JPL)
-
Involved in ESIP Federation: OpenGIS WMS/WCS Viewer; WSDL, UDDI [NOTE:
WMS for images; WCS for data]
-
Semantic Web for Earth & Environment Terminology (SWEET)
-
Prototyping effort to build ontologies to demonstrate software agents
-
Ontologies:
-
Earth Sci Topics (based on GCMD)
-
Data Description (based on ESML, XDF, and Udunits)
-
Events
-
Data Services
-
Using RDF and DAML for ontology [knowledge space] definition
-
Linus Kamb (IRIS)
-
seismology data, special binary file format “SEED”
-
Not sure about serving SEED files – DODS?
-
Not sure how to organize/present datasets
-
Should add FTP as access protocol
-
Ken Keiser (UAH, ESML) -
-
Providing passive microwave datasets. - Using OpenGIS WMS/WCS; WSDL, UDDI
- EMSL
-
FTP, may need new data type
-
use ESML to allow readers to access files.
-
ESML describes data formats, it is not a data format.
-
Q: could you write a DODS server for ESML files?
-
Danny Briengar (NCDC, NOMADS)
-
Realtime and archived model data access
-
NCDC, NCEP, GFDL, et al
-
Archive vs Realtime: How expose the difference between the two in a catalog?
-
Expose the transition of data moving from realtime access to an archive.
-
Expose the different access speeds (speed will also depend on system/network
load
-
Some data doesn't get archived
-
Need:
-
GDS catalog convert to THREDDS catalog
-
Command line tools, use by ferret,GrDS
-
Extend metadata standards to add educ access
-
Transition from realtime to archive
-
20 min delay tape robot problem
Metadata Overview (Stefano)
THREDDS is trying to integrate the workings of two communities: digital
libraries and systems for geographic information (not just GIS). The two
communities require different sets of metadata. How can THREDDS wed these
two metadata requirements:
-
Contain both sets of metadata
-
Extend one set to include the other
-
Develop higher level metadata model that captures both DL and GI metadata
Can we convert between OpenGIS Web Catalog/Services and THREDDS catalogs?
Breakout Groups
Pre-breakout discussion:
-
Review issues list
-
How break up: along the client, provider, discovery boundaries; by the
connections between these three groups; randomly? Random groups.
Breakout Group 5 report:
-
different clients have different requirements
-
requirements vs optional information
-
optional means possibility of missing info
Breakout Group 4 report:
-
URL permanence - research is required
-
SourceForge facilities on NSDL
-
Access method type: realtime vs archived; online vs delayed, etc. Should
this be represented in THREDDS or not
Breakout Group 3 report:
-
More rather than less metadata
-
DC with edu extensions (discovery not use) not enough
-
FGDC use metadata not very well tested
-
3rd party metadata more trouble than it is worth
-
Registry
-
Edu target audience very desirable
-
dataType: What does it mean?
Breakout Group 2 report:
-
OpenGIS
-
Unified data model not first priority
-
Data provider tools top priority
-
Dataset Identification (unique) should be primary focus
-
Use metadata
-
Data discovery primary focus of THREDDS metadata
Breakout Group 1 report:
-
Current data access methods and their data models
-
netCDF/DODS - very abstract
-
ADDE/HDF-EOS - more specific (grid, point, swath, etc)
-
abstract can be used for more datasets but doesn't provide as much information
(need conventions); more specific requires more work to fit datasets in
(even extension of model) but once client gets data it knows how to do
more things with it
-
What does THREDDS "dataType" attribute mean?
-
Is it the beginnings of a data model?
-
An abstract representation of the data independent of the data representation
-
Just a way of typing the data
-
Should the dataType values be an ontology?
-
Should THREDDS metadata be restricted to discovery and description or should
it be a representation of a data model?
-
If discovery only, the client will loose a lot of information when crossing
the boundary between discovery and data access.
-
Multiple existing representations in discovery side (DC, FGDC, etc), multiple
representations in access side (DODS, ADDE, etc). Encompassing all this
big task.
-
Developing a general data model would require lots of work
-
A THREDDS data model, especially if required(?), would cause difficulty
for clients that already build a data model from the information available
through the data access protocol.
-
Conclusion: Probably best to restrict THREDDS to the discovery and description
end of things. Hold off on any mapping between these two realms and the
data access realm.
NSDL Community portal demo - Chris Klaus
Next steps: Participant 5min
-
Danny (NCDC, NOMADS):
-
Will use CatalogGen tool now
-
future, will work on serving archived data from a tape backend
-
future, desires a command line UI to catalog (GUI, batch)
-
Interested in adding edu metadata
-
THREDDS: keep focused
-
Ken (UAH, ESML):
-
Will catalog passive microwave datasets
-
Will investigate use metadata
-
Wants web service interace to THREDDS, e.g., query catalog
-
Linus Kamb (IRIS)
-
Will work on getting data into DODS (might be lossy)
-
What is data and what is metadata, where is the boundary?
-
Desired: Server type extensions, multiple access methods for same dataset
-
Rob Casey (IRIS)
-
Will work on?: tools for mapping data into THREDDS
-
Desired: clear descrip of THREDDS, examples
-
THREDDS should: stay focused for 1st stage: cataloging; discovery at many
different levels
-
Rob Raskin (NOAA/JPL)
-
WCS/WMS, etc
-
Data model should be expandable: not tied to DODS and ADDE but capable
of capturing others, e.g., WCS
-
Steve Hankin (PMEL, LAS/Ferret)
-
PMEL will work on LAS/Ferret access to THREDDS
-
PMEL will look into ADDE to DODS gateway
-
THREDDS should, given LAS config, build THREDDS catalog
-
THREDDS should extend CatalogGen tool to crawl DODS servers
-
THREDDS and GCMD should explore dataset uniqueness question
-
THREDDS should develop command-line tools
-
Roland (NOAA/CDC)
-
Will work on building THREDDS catalog
-
Desired: THREDDS validator that checks if DODS link is working
-
Command line tool
-
Dan H. (DODS)
-
Working on DODS directory services, would like them to be interoperable
with THREDDS
-
Moving towards web services
-
THREDDS needs version control for catalog development
-
Need more discussion on Aggregation and granularity issues
-
Marty
-
Need use metadata: discovery center, THREDDS, DAP
-
Need ability to understand how to use any data
-
Integration of all data, including WMS, WFS, WCS, etc
-
Catalog of datasets, each dataset has multiple services/access methods
-
Tom W (SSEC)
-
Need catalog creator for ADDE
-
Will work on : ADDE access software able to use catalog rather than asking
ADDE server
-
Will work on: Edu applet access to THREDDS catalogs
-
Will work on: Viewer tool that integrats different data and generates products
-
Point data is lowest level data, all other data types can be devolved to
points
-
Mark (UofOK, CAPS/CRAFT)
-
Will work on serving and cataloging CRAFT data
-
Needs help (and docs) with serving and cataloging
-
Chris Klaus (Argonne, ARM)
-
Will help client developers with ARM data
-
Ted (NGDC)
-
THREDDS: careful not to build tools that aren't needed
-
Simple catalog vs lots of info in catalog
-
Science vs digital library
-
Wants: ability to produce FGDC, DC, OGC, THREDDS from each other and from
DB backend
-
Stefano
-
Will work on SINOTS integration with THREDDS
-
Wants: access to dataset w/o DAP
-
THREDDS should: aim for flexibility, openness, and extensibility; evaluate
standards for extensibility and openness
-
Benno
-
THREDDS server
-
Will work on moving Ingrid metadata into THREDDS
-
Wants: XML verifier that supports hierarchy of THREDDS catalogs
-
THREDDS client: wants clear standard and good set of examples
-
THREDDS catalog needs some flexibility in terms of adding attributes and
then elevating them into THREDDS standard
-
Gene (GCMD)
-
What level goes into GCMD?
-
Will work on THREDDS portal in GCMD
-
Dataset uniqueness problem
-
Harry Edmon
-
Need help: new to data provider role
-
Portal access: recommendations like Amazon ("Other people who bought this
book, also bought ...", reviews, etc)
-
John Weatherly (DLESE)
-
DL users: teacher(end-user)/material developers
-
What level is cataloged? What tools/clients/services can be used?
-
Discussion - Should THREDDS:
-
Focus on GCMD for direct data cataloging?
-
Work w/ edu material developers to get stuff into DLESE?
-
Future: alow for data direclty in DLs?
-
Phil (FNMOC & GODAE)
-
Wants: Good cataloging; thin client; Non-DODS/ADDE DAP
-
Work with NVODS as cataloging partner ($)
-
Joe W (COLA/GrADS)
-
Can offer:
-
Support THREDDS catalogs from GDS
-
Think about how to use CatalogServer (GCMD?)
-
Want:
-
Command line UI tool
-
Way to represent that multiple datasets are actually same dataset (i.e.,
dataset link in catalog)
-
THREDDS focus:
-
Catalog model (not data model)
-
Should collection be a dataset
-
Wait on structured metadata: leave as unstructured, have metadata type?
[NOTE: What does this mean? That we should drop datasetDesc efforts?]
-
one dataset link to another, i.e. multiple references to same dataset
-
THREDDS Long term:
-
Structured metadata
-
namespace: FGDC, DC, etc
-
Other catalog services
-
Web services
Issues
-
Real-time vs Archived data (Danny)
-
What is a dataset?
-
Granularity.
-
Difference between dataset and collection.
-
Identity. Uniqueness. THREDDS and GCMD should explore dataset uniqueness
question
-
Catalog model
-
Should a collection be a dataset?
-
Allow a catalog to represent multiple models (views) of a single collection
-
Need ability to represent a single dataset that has multiple access methods
(e.g., DODS & FTP)
-
Need dataset links/aliases (datasetRef)
-
Allow attributes in more places
-
Data model vs Discovery/Description metadata only
-
Catalog Services
-
Query language
-
Subset
-
Tool: GUI that spits out query language representation of query just performed
(similar to a DODS URL builder)
Summary of Topics
Catalogs
-
Types: 1) Dynamically generated, 2) real-time (need poll/notify), 3) static
-
Allow multiple server/access type per dataset (then Dataset URL will need
to be associated with server access element).
-
Add “alias” element, so datasets can appear multiple times without maintaining
other info in 2 places
-
Add FTP, WSDL, OpenGIS server types
-
Addition information about datasets, time to expire, when new data will
be available, when dataset will be archived
Catalog Server
-
Provide search and gateways to Discover Centers
-
Needs a database
-
Use SOAP/WSDL?
Client development
-
EDMI/Planet Earth uses IDL, DODS enabled
-
Need C library eventually (could be just catalog access for now)
-
Command line tool: GrADS, Ferret, DODS import wizard
-
Clients want more metadata
-
IDV/Metapps high-functioning THREDDS client
-
Thredds Data Viewer prototype THREDDS client.
Command Line interface tool
-
GrADS, Ferret, DODS inport wizard
-
Batch or GUI, lots of options
-
Plain text or XML output
-
Single or multiple URL list
-
Capture script after finding with a GUI, run in batch
-
Audit the network usage, so scrtipt runners know if they are hmmering server
-
Use for auto debugging and testing
-
“find most recent data”
-
batch process multiple datasets, without having to hand specify each one.
Dataset Identity / Definition / Granularity
-
URL permanence
-
Dataset in multiple locations. What if its modified?
-
Can URL change? Can data that URL refers to change?
-
Realtime archived data needs seamless transition.
-
Expiration date: “time-to-live”
-
Use LDM headers to create unique dataset ID.
-
Granularity most important for client, not server.
-
Tape robot has 20 min delay – should this be exposed to clients?
-
Response time: online, jukebox, tape archive, offline
-
Unique IDS are a nightmare
-
Could same URN have different DatasetDesc metadata?
-
THR assigned ID, implies dataset info was reviewed or approved
-
URN related to FGDC lineage field
Data Model
-
OpenDAP (DODS) could be THR data model
-
Concern that THR is duplicating dataset subsetting capabilities of protocol.
-
THR should be able to provide different models for different users; data
provider vs. client, scientist vs DL user.
-
Concern if there is a profusion of data types.
-
IRIS may need special type: station time series
Discovery Centers
-
GCMD/NASA: 11,000 datasets indexed, uses DIF; 4 tier parameter hierarchy;
has HTTP/RMI API for programmatic access
-
DLESE: focus on educational resource documents now; ready for direct data
cataloging in 2-4 years; uses OAI transfer protocol.
-
ESIP Federation has search tool based on Z39.??
-
THR should use search capabilities of DC, which return XML catalogs.
-
NSDL using ADEPT
-
Idea: Use search capabilities of Discovery Centers, return XML catalog,
app uses to populate widgets.
DODS
-
Ingris creates a data abstraction a level above DODS
-
Dodsdir, Sitedir can use THR Catalog.
-
AIS – same idea as 3d party metadata
-
AggServer needs extensions to handle InSitu Ocean data.
-
Can you configure LAS or GrADS from THR Catalog?
-
Don Denbo’s “meta architect” is an automatic metadata harvester.
Educational resources
-
Embed data in documents
-
Data mining for “events”
-
NSDL annotation service
GIS
-
OpenGIS WMS – maps/gifs
-
OpenGIS WCS – data : image, grids
-
OpenGIS WFS – features (polygons)
-
“THR should get involved in OpenGIS standards”
-
they are creating specific data types (e.g. adding time dimension), but
not in a general way
-
we need a gazetteer name translation -> location service (ADL has this)
-
AMS has added a GIS track next meeting
-
Many scientific communities (e.g. data assimilation) rarely use GIS, but
used in economic impacts, useful for thin clients
-
PostGres has a spatial DB
GrADS
-
Works on gridded data: GRID, HDF, NetCDF, other binary. Working on BUFR,
in-situ; controlled access via IP
-
Working on factoring our GDS stuff from servlet framework
-
In Situ model: seq, seq, levels, reports
-
Need a command line interface to THR, batch and GUI
LAS
-
Is a “product server” = service, ferret back end
-
Could be generic thin client interface to THR data
-
LAS protocol
-
“Virtual site” : share metadata between sites
-
Use a THR catalog to create a LAS configuration.
-
Use a LAS config, build a THR catalog.
-
Should LAS be a server protocol?
Metadata
Metadata Standards
-
Extend Standards (like FGDC) to add educational access.
-
Provide uniform metadata extraction to be sent to DL
-
Pull out “all” metadata into THREDDS advantages 1) some clients just want
metadata, don’t need to know how protocol 2) can be amended
-
EMSL describes data, is not a data format itself: ascii, binary, HDF-EOS
-
SWEET/NASA – semantic web for earth and environment terminology
-
GCMD keywords, services
-
Events (from where?)
-
Data desc: EMSL, XDF, udunits
-
RDF, DAML
-
GCMD DIF will use ISO 19115/TC211 when ready
-
What is relationship between THREDDS standard quantities and existings
ones
-
Investigate SQL extensions in Xalan for dbms metadata backends.
Search and Discovery Features
-
Knowledge-directed searching
-
Google-like indexing
-
Metadata subsetting == db query
-
Capabilities for dbms mining for related queries
-
Include clients programs with metadata
-
Crosswalks to THREDDS metadata
-
Discovery Centers with multiple entry points
-
Use cases needed for user methods
-
Investigate GCMD API
-
Consider Thesaurus directory structure
Server Types
-
Add FTP, WSDL, WMS, WCS, WFS.
If
only use for discovery, doesn’t have to be controlled.
Desired by Participants
-
The ability to create a variety of standard metadata representations
(LAS XML, FGDC, THREDDS XML, etc) from existing DB and the existing metadata
in datasets
-
Catalog Server Query tool. GUI that spits out query that
browse just performed
-
Consistent interface: browse and retrieve
-
Web service interace to THREDDS, e.g., query catalog
-
THREDDS validator that checks if DODS link is working
Action Items
-
Review catalog format (all)
-
xlinks to more metadata: GCMD reference, FGDC, DC, Netcdf/DODS/ ADDE, DatasetDesc
-
add attributes
-
collection can be selected?
-
Multiple views of the same dataset
-
Synch with Dan, Benno, Steve, Joe, Luca
-
GCMD (robb)
-
Investigate HTTP API
-
Add URLtype = “XML Catalog”
-
Get GCMD keywords into DD as SQ
-
Put URL into the “related URL” field, typed as “THR Catalog”
-
ADEPT/ADL (john)
-
MetaArchitect from Don Denbo (ethan)
-
DODS import wizard – make THR choosing component. (Dan H/john)
-
Extend Validator tool, make into servlet using Client API (robb).
-
Add converter (robb)
-
THREDDS catalog link checker in client library (robb/john)
-
Simple version of crawler (ethan)
-
Other tools TDB
-
Command line tool.
-
stand alone catalog browser and chooser, send URL to stdout.
-
Could also provide HTTP server.
-
Make work with Ferret and GrADS as first clients.
-
AggServer (john)
-
Put config into catalog
-
Synch with JoeW
-
Use catalog name in CatalogDatasetChooser instead of URL; put at top of
tree (john)
-
JDOM vs JAXB, timing & flexibility (john)
-
Catalog Generator/Scanner (ethan)
-
Add <netcdf attribute> directive in XSLT
-
Test XSLT with large dataset generation
-
To transform LAS config into THREDDS catalog
-
To crawl DODS servers and DODS file servers
-
To access DODS datasets and netCDF datasets to extract metadata
-
Ethan - Look at Wiki version control
-
Ethan - Look at Xalan SQL extensions for CatGen with DB backend
-
Robb - Validator and link checker, support catalogRef
-
Robb - Investigate SQL extensions in Xalan for dbms metadata backends
Go to: THREDDS
Home Page