[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN, _FillValue and the FMRC



On 4/27/2010 2:50 PM, Steve Hankin wrote:
Hi John,

No surprises here.  The "gotta have" that comes to mind is simply:
  • if the aggregation server substitutes a new _FillValue, then a new _FillValue attribute should be created that documents the new value
yes, getting the _FillValue or missing_value documented correctly would be good. I also have a bug from Kevin I havent yet fixed.


The "it would be nice" (but not essential) is
It would be nice if FMRC was (to the degree feasible) "just another aggregation".  Stated another way, an FMRC generates a number of time aggregations in a single ncML configuration step.  Each of the time aggregations that are created should ideally behave very much like that single time aggregation would have behaved if it had been hand-configured in ncML.  This would imply that the rules governing scale and offset would be handled the same for FMRC as for other aggregations.

It should be possible to set the global behavior for scale/offset/missing. Per-dataset behavior still a maybe.

Because FMRC needs grid info, it has to find coord systems using CoordSysBuilder, which sometimes modifies the file (constructs axes, etc). Generally this is a good thing, but it means that in some cases an FMRC isnt "just another aggregation". Hopefully this is acceptable

Note that these issues are NOT blockers.  Presumably we can fill in the missing _FillValue attribute for the FMRC with hand-edited ncML.

    thanks for the quick answer - Steve

====================

John Caron wrote: all:

FMRC is different from other aggregations in that it operates on GridDatasets rather than NetcdfFiles. The default enhance mode of GridDataset is to add scale/offset and convert missing to NaNs. Currently that setting can only be changed globally. Also, theres a bug in the code allowing non-default enhancements in 4.1, that i may not be able to fix until 4.2.

Non-FMRC aggregations dont do any enhancements unless you ask for it in the NcML.

Whenever scale/offset is applied, the attributes scale_factor and add_offset are removed, so there shouldnt be the danger of applying twice. Since other (non-CDM) client libraries dont apply scale/offset, it seems like the best default is to convert on the server. Performance is typically (eg IDV) limited by latency, not bandwidth, so the 2X increase in size hasnt been a problem. You may have use cases where its more important.

Theres a lot of complexity around the enhancement code, and I will have to look carefully at what I can support. Id like to hear your "gotta have" needs, plus your "would be nice if it doesnt make the code unstable" wishes.

John

On 4/27/2010 10:33 AM, Rich Signell wrote:
Steve,

  
Will wait to hear from John.  My  inclination would be that the aggregation
process ought not to cause _FillValue, scale_factor, add_offset and data
type to be presented differently than they would have been in the original
unaggregated files.
    
And I would agree.  I've been burned when people applied the wrong
scale_factor and add_offsets, and all I knew was that the values
looked funny (I didn't know they even had scale_factor and
add_offsets, but it eventually came out on investigation).   And
doesn't it take twice as long to deliver data over opendap if the data
is float instead of short?

-Rich

  
     - Steve

If you look at these original files (for example)

http://rocky.umeoce.maine.edu:8080/thredds/dodsC/gompom/operational/201004/gomoos.20100427.cdf.html

you can see that the variable "temp" is a "short integer" with
"scale_factor","add_offset" and "missing_value", while the FMRC "best
time series"

http://rocky.umeoce.maine.edu:8080/thredds/dodsC/gomoos/operational_model/UMaine_GoMOOS_cirulation_model_best.ncd.html

now the variable "temp" is a "float" with none of those attributes,
only NaN values.

I'm CC'ing John Caron, just to make sure I've got this right.


-Rich

On Tue, Apr 6, 2010 at 7:22 PM, Kevin O'Brien<Kevin.M.O'address@hidden>
wrote:


Hi Rich -

I added some of your USGS best time series data to the UAF clean catalog at:

   http://ferret.pmel.noaa.gov/geoide/geoIDECleanCatalog.html

I think we'll find that this is also a case where "NaN" is used for the
missing value, but not specified in the variable attributes from the best
time series. Maybe we should ask John Caron why the missing value of NaN
doesn't get set by default as an attribute in those datasets.   At any rate,
I'll be interested to hear what you have to report as far as performance
goes on those COAWST data...

Bob - I'm cc'ing you because you wanted to know when the catalog was
changed.   I've also added some NOAA coastwatch aggregations to the
catalog...

Let me know if there are any questions..

Kevin

--
Kevin O'Brien                   UW/JISAO
Research Scientist              NOAA/PMEL/TMAP
206-526-6751                    http://www.pmel.noaa.gov

"The contents of this message are mine personally and do
not necessarily reflect any position of the Government
or the  National Oceanic and Atmospheric Administration."





    


  


-- 
Steve Hankin, NOAA/PMEL -- address@hidden
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744

"The only thing necessary for the triumph of evil is for good men
to do nothing." -- Edmund Burke