[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN, _FillValue and the FMRC



Hi all:

FMRC is different from other aggregations in that it operates on GridDatasets rather than NetcdfFiles. The default enhance mode of GridDataset is to add scale/offset and convert missing to NaNs. Currently that setting can only be changed globally. Also, theres a bug in the code allowing non-default enhancements in 4.1, that i may not be able to fix until 4.2.

Non-FMRC aggregations dont do any enhancements unless you ask for it in the NcML.

Whenever scale/offset is applied, the attributes scale_factor and add_offset are removed, so there shouldnt be the danger of applying twice. Since other (non-CDM) client libraries dont apply scale/offset, it seems like the best default is to convert on the server. Performance is typically (eg IDV) limited by latency, not bandwidth, so the 2X increase in size hasnt been a problem. You may have use cases where its more important.

Theres a lot of complexity around the enhancement code, and I will have to look carefully at what I can support. Id like to hear your "gotta have" needs, plus your "would be nice if it doesnt make the code unstable" wishes.

John

On 4/27/2010 10:33 AM, Rich Signell wrote:
Steve,

Will wait to hear from John.  My  inclination would be that the aggregation
process ought not to cause _FillValue, scale_factor, add_offset and data
type to be presented differently than they would have been in the original
unaggregated files.
And I would agree.  I've been burned when people applied the wrong
scale_factor and add_offsets, and all I knew was that the values
looked funny (I didn't know they even had scale_factor and
add_offsets, but it eventually came out on investigation).   And
doesn't it take twice as long to deliver data over opendap if the data
is float instead of short?

-Rich

     - Steve

If you look at these original files (for example)

http://rocky.umeoce.maine.edu:8080/thredds/dodsC/gompom/operational/201004/gomoos.20100427.cdf.html

you can see that the variable "temp" is a "short integer" with
"scale_factor","add_offset" and "missing_value", while the FMRC "best
time series"

http://rocky.umeoce.maine.edu:8080/thredds/dodsC/gomoos/operational_model/UMaine_GoMOOS_cirulation_model_best.ncd.html

now the variable "temp" is a "float" with none of those attributes,
only NaN values.

I'm CC'ing John Caron, just to make sure I've got this right.


-Rich

On Tue, Apr 6, 2010 at 7:22 PM, Kevin O'Brien<Kevin.M.O'address@hidden>
wrote:


Hi Rich -

I added some of your USGS best time series data to the UAF clean catalog at:

   http://ferret.pmel.noaa.gov/geoide/geoIDECleanCatalog.html

I think we'll find that this is also a case where "NaN" is used for the
missing value, but not specified in the variable attributes from the best
time series. Maybe we should ask John Caron why the missing value of NaN
doesn't get set by default as an attribute in those datasets.   At any rate,
I'll be interested to hear what you have to report as far as performance
goes on those COAWST data...

Bob - I'm cc'ing you because you wanted to know when the catalog was
changed.   I've also added some NOAA coastwatch aggregations to the
catalog...

Let me know if there are any questions..

Kevin

--
Kevin O'Brien                   UW/JISAO
Research Scientist              NOAA/PMEL/TMAP
206-526-6751                    http://www.pmel.noaa.gov

"The contents of this message are mine personally and do
not necessarily reflect any position of the Government
or the  National Oceanic and Atmospheric Administration."