Re: [thredds] WMS and F-TDS virtual datasets [was: Re: aggregation / arithmetic between variables]

Hi Roland,

Yes, I agree, semantics in the ID is not a great solution. An API would
be nice. Though I'm a bit worried that the set of relevant questions
will be difficult to decide on, difficult for clients to make use of,
and may be difficult for servers to answer.

Given that, I wonder about a more abstract API that gives hints on the
relationship between performance and data access patterns. For instance,
it would be great if we could abstract and extend the semantics that
netCDF-3 captures in the order of a variables dimensions (and if one of
the dimensions is unlimited); extending it to cover both index space and
coordinate space and encapsulate aggregation, over-the-wire protocols,
compression and chunking.

OK. Now that I've written it down, that sounds much harder than starting
to add some relevant questions to the API. Also, though a more abstract
API might be easier for a client to interpret, I'm not sure it would be
any easier for a server to implement.

Ethan

On 4/29/2011 1:01 PM, Roland Schweitzer wrote:
> Ethan,
> 
> I guess we're thinking along the same lines, because after reading Jon's
> email my first thought was I would return a string like "F-TDS - OPeNDAP
> - netCDF".  This would say a lot to me, in that I would know that the
> data flowing out might be the result of some sort of server side
> calculation, delivered via OPeNDAP and originally read from a netCDF
> file.  But, without some sort of agreed upon convention then it's just a
> String.
> 
> I think loading up these ID's with a bunch of semantics seems kind of
> like a bad idea.  An invitation for trouble and confusion.
> 
> Don't we really want an API where we can ask the relevant questions like
> Are you compressed? Are you coming over the wire as OPeNDAP?  What is
> your underlying storage?
> 
> Roland
> 
> On 04/29/2011 11:20 AM, Ethan Davis wrote:
>> Hi Roland,
>>
>> The fileTypeId wasn't really designed with virtual datasets in mind or
>> with any particular semantics in mind. They are similar to using
>> software version numbers to indicate capabilities, you just have to know
>> what they mean.
>>
>> Since virtual datasets may change the characteristics expected from a
>> dataset with a given fileTypeId, perhaps we should extend fileTypeIds to
>> allow for multi-layer names. Maybe something like "ncAggregation -
>> GRIB2".
>>
>> Ethan
>>
>> On 4/29/2011 7:41 AM, Roland Schweitzer wrote:
>>> Hi Jon,
>>>
>>> Thanks for the code link and information.  Unfortunately I'm still
>>> confused.
>>>
>>> It seems that  you want to distinguish between reading local data files
>>> and OPeNDAP data sources and distinguish between uncompressed and
>>> compress local files.  Makes sense to want to know this and to code
>>> accordingly.   However, Ethan said that TDS promotes the underlying
>>> FileTypeId from the data files when returning the FileTypeId for a TDS
>>> aggregation. And you suggest I do the same for my virtual data sets.  It
>>> seems to me that this will give you exactly the wrong information for
>>> how you want to classify the data.  An F-TDS data source is by
>>> definition an OPeNDAP data source, but the data type of the underlying
>>> data will most of the time be a local netCDF file or type netCDF.
>>>
>>> So if I follow the suggestion and return won't you get wrong
>>> optimization by looking the FileTypeId?
>>>
>>> Roland
>>>
>>> On 04/29/2011 03:45 AM, Jon Blower wrote:
>>>> Hi Roland,
>>>>
>>>> The fileTypeId is used by ncWMS to decide on what algorithm to use to
>>>> extract data.  Compressed data (e.g. NetCDF4) and data read over
>>>> OPeNDAP, have very different performance characteristics to
>>>> uncompressed, local data (e.g. NetCDF3, HDF4).
>>>>
>>>> See the code here:
>>>> http://www.resc.rdg.ac.uk/trac/ncWMS/browser/trunk/src/java/uk/ac/rdg/resc/edal/cdm/CdmUtils.java#L269
>>>>
>>>>
>>>>
>>>> So I guess the fileTypeId of your virtual dataset should match the
>>>> underlying file type.  If this isn't easy, then from the WMS point of
>>>> view you can put any old string as the fileTypeId and the WMS will be
>>>> conservative and won't assume that data-reading is "cheap".
>>>>
>>>> This is an example of the adage "all abstractions are leaky"...
>>>> performance concerns are notorious for messing up nice clean
>>>> abstractions.
>>>>
>>>> HTH,
>>>> Jon
>>>>
>>>> ----------------------------------------------------------------------
>>>>
>>>> Message: 1
>>>> Date: Wed, 27 Apr 2011 17:09:13 -0500
>>>> From: Roland Schweitzer<Roland.Schweitzer@xxxxxxxx>
>>>> To: thredds@xxxxxxxxxxxxxxxx
>>>> Subject: Re: [thredds] WMS and F-TDS virtual datasets [was: Re:
>>>>      aggregation / arithmetic between variables]
>>>> Message-ID:<4DB89409.5000701@xxxxxxxx>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>> Jon and Ethan,
>>>>
>>>> Help me understand the best way forward with these FileTypeId in the
>>>> IOSP.  Questions below...
>>>>
>>>> On 04/15/2011 03:37 PM, Ethan Davis wrote:
>>>>> Hi Roland,
>>>>>
>>>>>> What should these [getFileTypeId] values be, Ethan?  Is there an
>>>>>> official enumeration I can reference for know values for these?
>>>>>> I just grabbed them off the Web page:
>>>>>> http://www.unidata.ucar.edu/software/netcdf-java/formats/FileTypes.html.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It makes sense to me to use netCDF since it is the intent of the IOSP
>>>>>> to act like netCDF OPeNDAP in every case.
>>>>> The ID values should uniquely identify the "file type". The web page
>>>>> enumerates the values we know. We encourage everyone that implements
>>>>> an IOSP to select an ID not on the list and let us know so we can
>>>>> update the list.
>>>>> So, I think rather than use "netCDF" you should decide on an ID unique
>>>>> to your IOSP. Or, if all the datasets behind one of your virtual
>>>>> datasets are always going to be the same type, you could use the type
>>>>> of the backing datasets. (That is what the CDM Aggregation class does,
>>>>> it uses the "file type" of the aggregations "typical dataset").
>>>> Jon,  what does the ncWMS do with the FileTypeId?   The best decision
>>>> from my point of view for what to return seems to depend on how the
>>>> value is being used by clients.  I thought the point of the the CDM
>>>> was everything looks like netCDF.  If folks are making optimizations
>>>> based on the FileTypeId then for some cases like F-TDS it seems like
>>>> they might miss out if the FileTypeId was something other than netCDF.
>>>>
>>>> Roland
>>>>> Let us know if you decide on a new unique ID and we'll add it to the
>>>>> list .
>>>>>
>>>>
>>>> End of thredds Digest, Vol 27, Issue 34
>>>> ***************************************
>>>>
>>>> _______________________________________________
>>>> thredds mailing list
>>>> thredds@xxxxxxxxxxxxxxxx
>>>> For list information or to unsubscribe,  visit:
>>>> http://www.unidata.ucar.edu/mailing_lists/
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/