[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unidata Support: 960923: ncrob



No reply needed, this is forwarded just to get it in reply database ...

------- Forwarded Message

Date:    Mon, 23 Sep 1996 07:53:05 -0600
From:    Unidata Support <address@hidden>
To:      address@hidden
Subject: 960923: ncrob

>To: John Sheldon <address@hidden>
>cc: Jeff Ploshay <address@hidden>,
>cc: GRANEK Henry <address@hidden>,
>cc: address@hidden
>From: Harvey DAVIES <address@hidden>
>Subject: Re: ncrob
>Organization: .
>Keywords: 199609230732.AA08747

Hi John,

On Thu, 19 Sep 1996, John Sheldon wrote:

> > It should not involve copying since neither the header nor the "non_recs 
> > data"
> > increase in size. 
> 
> FYI - I noticed that the file is, indeed, copied during
> this process - I guess it's not just "growing" info that invokes the
> copy, but any change in the size of the metadata.

I was wrong & you are right!  I have clarified the situation after examining
the code, email communication with Glenn Davis & Russ Rew at Unidata & doing
some tests with FAN.  A new file is created (by copying) if the size of the
header DECREASES (as well as increases) in size.  So my suggested change of
the type of valid_range from float to short decreased the header size & thus a
new file was created.

So, to avoid this copying, I now suggest you change the values, but not the
type of valid_range by:

echo -32766 2934 | text2nc -t float hgt_ncks.nc 'hgt:valid_range'

It is necessary to specify the type as float because numeric attribute data
defaults to type double (even if it already exists as some other type). 
I plan to fix this in the next version.

> My misunderstanding, I think.  Let me see if I now understand what
> ncrob is doing...
> 
> As ncrob looks at a SHORT value, it looks to see if either (a) it
> matches the "missing_value", or (b) it is outside of "valid_range".
> This comparison is done, as it should be, before trying to do any
> unpacking.  But if the "valid_range" is given as a FLOAT, it promotes
> the SHORT data value to FLOAT to do the comparison (or maybe it
> promotes both of them to DOUBLE to do it?).  In any case, with the data
> we were working on, the data value then appears to be outside the
> "valid_range".  The "unpacked" value is then given the value of
> FILL_xxx (xxx=SHORT or FLOAT or... whatever).
> 
> Right?
> 
> Now, what would NCROB do if a SHORT data variable had both a
> "missing_value" and a "_FillValue" attribute, which were different?  A
> user might do this on purpose so that he could identify portions of
> data that were never written AND portions of data that were known not
> to hold any legitimate data (eg, 1000mb air temperature under the
> Rockies).  I don't know if there is a "right" answer for this one,
> but I'd like to get your ideas.

There are some differences between versions - following applies to version 
2.0.2.  Relevant code is in files get_valid_range.c & get_missing_value.c.
Based on idea of defining valid range, outside of which values are considered
missing.  Following paragraph is extracted from get_valid_range.c:

 *  Uses attributes valid_range, valid_min, valid_max
 *  If these do not define both min & max then call function get_missing_value
 *  to define missing value (which depends on attributes missing_value
 *  & _FillValue).  If possible use this missing value to define min or max
 *  based on principle that missing value must be outside valid range.

So if attribute valid_range is defined (as with NCEP data) then 
attributes missing_value & _FillValue are ignored.

See netCDF User's Guide section 8.1 (version 2.4.3) on valid_range,
missing_value & _FillValue (I was responsible for revising these, so they
correspond  to FAN.  (Well not exactly in FAN version 2, but next version will.)

> At the risk of becoming a pest, I need to ask you about another error
> we've run into from ncrob.  This same user was trying to extract 500mb
> data at every other time period from the NCEP data file (with a
> corrected "valid_range"):
> 
>   ncrob '/archive/bgs/reanalysis/ncep/79hgt.nc hgt[time=1:1460:2, level=5]' / 
> 79hgt500.nc
> 
> It runs and then gives an error message:
> 
>   'calloc: Unable to allocate space at line 89 of ncrob.c'
> 
> The header of the resultant file: 79hgt500.nc looks ok, but the values
> of hgt are all '_, _,...'
> 
> Now, if he runs with "[time=1:100:2, level=5]' / 79hgt500.nc", it works
> OK.  Is there a size/space problem here?  We tried it with "-b 4096"
> (and other values), to no avail.

Unless my memory is playing tricks, ncrob buffering only works for input, not
output.  So it always tries to allocate enough memory for the whole output.
The only work-around I can suggest is to use a shell script with a loop which
copies chunks small enough to fit in memory.  Obviously you would not use
ncrob to create output file - use say ncgen.

> One last usage question, which may turn into a suggestion for the next
> version.  This same user (a busy guy, obviously :-) was taking an
> average over a certain span of times.  "ncrob -r am" worked just fine,
> except that the variable it produced lost all sense of where it lay
> along the "time" axis; ie, it did not utilize the "time" dimension.
> After successfully generating averages for Jan, Feb, ..., these
> individual files could not be "nccat"-ed into one variable/file with
> "time" as an axis.
> 
> Is there any way to force ncrob to retain some sense of "time", or to
> re-impose "time" in a subsequent step?  In my routines here, I force the
> caller to tell me the start and end times of the averaging period, and
> the time coordinate he wants to give it (typically the midpoint of the
> period, but not always).  How would you do this?

Here again, do not use ncrob to create output file/variable - use say ncgen.

> Thanks for all the help.  I hope we're not taking too much of your time, 

It has taken time, but it has been worthwhile.  In fact one of our users here
is also trying to read NCEP data & having some of same problems (including
valid_range problem).  I appreciate feedback from experienced users like you -
it gives me ideas for improvements (not to mention making me aware of bugs
like the -t option bug I mentioned above!)

Hope this is useful,
Harvey

------- End of Forwarded Message