With regard WODB, the users want to be able to search not just using physical coordinates like lat-lon bounding box, but also using other attributes like institution, project, platform etc which are usually not represented using coordinate variables. So grouping the data based on physical coordinates might cause performance issues when searching using other criteria. Would it be possible to have some additional structures, something like a database index, but without a full-fledge DBMS? Does TDS allow such structures?


Since the UAF meeting in Seattle I have been giving some thought about how to serve some large, important datasets, such as the raw ICOADS observations or the WODB observations. While reading over the PointObservation Conventions proposal on the CF site, while the proposal makes it clear how I might put data into a netcdf file, it doesn't make clear what the interplay might be with a service in TDS, and how a possible service might be affected by a very large dataset without further structure.

So it seems pretty clear that the ICOADS would be points. From the example:

  obs = 1234 ;

  double time(obs) ;
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
  float lon(obs) ;
    lon:long_name = "longitude of the observation";
    lon:units = "degrees_east";
  float lat(obs) ;
    lat:long_name = "latitude of the observation" ;
    lat:units = "degrees_north" ;
  float alt(obs) ;
    alt:long_name = "vertical distance above the surface" ;
    alt:standard_name = "height" ;
    alt:units = "m";
    alt:positive = "up";
    alt:axis = "Z";

  float humidity(obs) ;
    humidity:long_name = "specific humidity" ;
    humidity:coordinates = "time lat lon alt" ;
  float temp(obs) ;
    temp:long_name = "temperature" ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lat lon alt" ;

  :CF\:featureType = "point";
Now I am assuming that in a TDS implementation of a service, I will be able to select on the coordinate variables, is that correct? Even so, for something like ICOADS, obs is quite large and that extract could be quite slow unless either there is additional structure or the TDS pre-fetches the coordinate variables much as the present Dapper server does.

Other options would be to say have a file for each 10-degree block, and then have TDS aggregate over the files - would this be possible. Then the search would a lot faster when people want time series in a region as opposed to more synoptic extractions. Would the TDS service be supporting such an option? Or, as netcdf-4 supports groups, to have 10-degree groups with 2-degree subgroups, which would work as far as netcdf-4 is concerned, but that is not the same as TDS knowing what to do with the hierarchy or to take advantage of the structure.

My questions for Profiles (that is for the WODB) are pretty much the same. I assume that the TDS service will be able to search on the coordinate variables, is that correct? And I have the issue with the fact that the profile dimension variable will get quite large and without further structure the search could be very slow. Adding the same types of structures mentioned above would provide possible solutions, but only if TDS, as opposed to netcdf4, supported them.

As you may have guessed, these are not theoretical questions - I would really like to see ICOADS and WODB served as part of the year 2 UAF effort. So now is a good time to start thinking about how to do it correctly and what the service will be able to do.




