Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
Thanks, Russ.I thought I had my problem solved by in fact defining a list of "unique" dimension IDs, where the "key" it's the ID, but now I found a test case where I generated a netCDF file that has *duplicated* dimension IDs !
So, I need some additional insight into this, please...ncks is both a "display only" program like ncdump , but also a selective copy program, like nccopy, that is, it can generate a new file from selected variables.
Here's the test case, partial CDL syntaxI generated a command that selects the following variables to the output file
netcdf in_grp { //root dimensions: time=unlimited; lev=3; vrt_nbr=2; variables: float ilev(lev,vrt_nbr); //coordinate variable (/lev) float lev(lev); //coordinate variable (/time) double time(time); group: g8 { dimensions: lev=3; vrt_nbr=2; variables: //coordinate variable (/g8/lev) float lev(lev); //coordinate variable (/g8/vrt_nbr) float vrt_nbr(vrt_nbr); float ilev(lev,vrt_nbr); } // end g8 group: g10 { variables: float two_dmn_rec_var(time,lev); }// end g10 This test case has 5 dimensions 3 in root time=unlimited; lev=3; vrt_nbr=2; and 2 dimensions in group /g8 group: g8 { dimensions: lev=3; vrt_nbr=2;and some variables that use the dimensions, some the "local" ones, some on the root
Note that the /g8 dimensions have the same relative name as the ones on the root
These variables are written the following way: 1) Obtain the dimension IDs for the variable (void)nco_inq_vardimid(grp_in_id,var_in_id,dmn_in_id_var); 2) Loop dimensions for variable for(int dmn_idx=0;dmn_idx<nbr_dmn_var;dmn_idx++){3) I now defined a list for dimensions where the "key" is the unique dimension ID in the input file; this key returns an object that has all the information about the dimension (path, number of coordinate variables, etc)
for the case I only need the full name (path), this is in the *input* file, the above CDL
I need to know if that dimension name is defined for the output group (for simplicity, let's consider that the output group has the same location of the input)
In a netCDF3 case, all dimensions are in the same group, things can be done with
nc_inq_dimid(nc_id,dmn_nm,dmn_id); , that is, simply inquire if the dimension exists 4) Since I am defining the output, I have to check if the group was created /* Test existence of group and create if not existent */ if(nco_inq_grp_full_ncid_flg(nc_out_id,grp_out_fll,&grp_dmn_out_id)){ 5) then obtain its dimensions /* Check output group (only) dimensions */ (void)nco_inq_dimids(grp_dmn_out_id,&nbr_dmn_out_grp,dmn_out_id_grp,0);6) Loop group dimensions and match the *relative* name with the *relative* name of the variable
If a match , this tells me that the dimension was defined in that group, and I store the ID, to pass later to the variable definition
/* A relative name for variable and group exists for this group...the dimension is already defined */
if(strcmp(dmn_nm_grp,dmn_nm) == 0){/* Assign the defined ID to the dimension ID array for the variable */
dmn_out_id[dmn_idx]=dmn_out_id_grp[dmn_idx_grp];This works well... I do get *distinct* IDs if I print them following the above calls
this function returns me the dimension ID /* Define dimension and obtain dimension ID */ (void)nco_def_dim(grp_dmn_out_id,dmn_nm,dmn_sz,&dmn_id_out);Here are the IDs for this case, as you can see they go from 0 to 4 ( 5 in total)
ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=0 index [0]:</time> with size=10 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=1 index [1]:</lev> with size=3 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=2 index [0]:</g8/lev> with size=3 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=3 index [1]:</g8/vrt_nbr> with size=2 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=4 index [1]:</vrt_nbr> with size=2
In the variable definition, I get the assignment I expect, for example </g10/two_dmn_rec_var> is defined with /time and /lev dimensions from root (IDs 0 and 1) , like in the input file
ncks: INFO nco_cpy_var_dfn() DEFINING variable </g10/two_dmn_rec_var> with dimension IDS = #0 #1
So, all seems OK, so farBut...when I try to read the generated file, all things go terribly wrong, because I do have duplicated IDs now in the generated file...
I changed all my model assuming that dimension IDS are unique... in this case, the output file, does *not* have unique IDs, so I get the wrong dimension while getting the variable's
datahere they are the dimensions in the ouput file, from my generated unique dimension list
the symbol # stands for ID, then full path starting with /, and dimension size in ()
(#0/time) record dimension(10) (#1/lev) dimension(3) (#4/vrt_nbr) dimension(2) (#0/g8/lev) dimension(3) (#1/g8/vrt_nbr) dimension(2)Just to make sure, I went to see what ncdump was telling me about these IDs, by reading the generate file;
ncdump does not print dimension IDs, but I put this call in line 1375 of ncdump.c printf("#%d,",dimids_grp[d_grp]); print_name(dims[d_grp].name); in the loop where you print the dimension name Sure enough, I do get the same ID numbers netcdf out { dimensions: #0,time = UNLIMITED ; // (10 currently) #1,lev = 3 ; #4,vrt_nbr = 2 ; variables: float ilev(lev, vrt_nbr) ; float lev(lev) ; double time(time) ; group: g10 { variables: float two_dmn_rec_var(time, lev) ; } // group g10 group: g8 { dimensions: #0,lev = 3 ; #1,vrt_nbr = 2 ; variables: float ilev(lev, vrt_nbr) ; } // group g8 }ncdump is a "print on the fly" tool, that is , reads and prints things , as the groups are iterated, I do get the correct data in the generated file,
because IDs are not stored and used other than to read on the momentBut in my new ncks model, these dimension IDs are stored in the lists I mentioned before, from my previous comments:
The fact that group dimension IDs are in fact unique makes possible to match them with dimension IDs for variables...
But only if I have a list of The full path of all dimensions for each variableI already had this. I constructed my "path only" model by recursively iterating the file, starting at root, and for every group I store the current path passed as a parameter to the recursive function.
The API gets me all local info for variables, for the current group, including dimensions for variables and dimensions for groups.
The additional step is to store for each group, the dimension ID, and for every variable dimension, its ID.
Then match them. So, why do I get these duplicated dimension IDs on the generated file ? To note All the above ID printout , from 0 to 4ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=0 index [0]:</time> with size=10
was done while in *define* modeAfter that routine for defining groups and dimensions, that prints those values, define mode is ended and then the data for the variables written
Fom my understanding of the netCDF intermediate layer between the public API and the HDF5 layer, things like HDF5 datasets are not actually "defined" until the define mode is ended, for example , to allow to assemble the dataset with chunk, compression, etc..
Could it be that somehow I need to leave define mode (but where ?) (every *time* I define a new dimension ? ) so that those dimension IDs are "flushed" ?
Or, am I wrong in assuming that the dimension IDs are in fact "unique"? Can a netCDF4 have duplicated dimension IDs, yes, or no ?
If no, then the API should have complained somewhere on my file generation? Thanks Pedro ------ Pedro Vicente, Earth System Science University of California, Irvine http://www.ess.uci.edu/----- Original Message ----- From: "Russ Rew" <russ@xxxxxxxxxxxxxxxx>
To: "Pedro Vicente" <pvicente@xxxxxxx> Cc: <netcdfgroup@xxxxxxxxxxxxxxxx> Sent: Monday, March 04, 2013 5:27 AMSubject: Re: [netcdfgroup] How to find the full dimension names (pathswithgroups) for a variable?
Hi Pedro, You're right that it would be useful to have additional public netCDF functions to make it easy to get the absolute netCDF name from a dimension ID and the reverse. There is code for this in the source for ncdump and nccopy. The ncdump utility outputs the absolute dimension name when there is an ambiguity, for example one of the test cases for ncdump outputs this variable declaration for a case where a variable uses several dimensions named "dim" in different groups (see ncdump/ref_tst_group_data.cdl): float var2(/dim, /g2/dim, dim) ; The code for figuring out these names is in ncdump/ncdump.c, preceded by this comment /* Subtlety: The following code block is needed because * nc_inq_dimname() currently returns only a simple dimension * name, without a prefix identifying the group it came from. * That's OK unless the dimid identifies a dimension in an * ancestor group that has the same simple name as a * dimension in the current group (or some intermediate * group), in which case the simple name is ambiguous. This * code tests for that case and provides an absolute dimname * only in the case where a simple name would be * ambiguous. */ The 20 or so subsequent lines of code that implement this should be captured in a separate function, so other developers don't need to rediscover how to do it with the current public API. For the other direction, there is this function in ncdump/utils.c that would also be useful in the public API: /* Missing functionality that should be in nc_inq_dimid(), to get * dimid from a full dimension path name that may include group * names */ int nc_inq_dimid2(int ncid, const char *dimname, int *dimidp) { ... We have some plans to provide API additions such as these for developers of generic netCDF tools in a future version. Thanks for pointing out the need for these. --RussThis is a multi-part message in MIME format. --===============0004655921== Content-Type: multipart/alternative; boundary="----=_NextPart_000_0013_01CE1883.8BB35050" This is a multi-part message in MIME format. ------=_NextPart_000_0013_01CE1883.8BB35050 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable ok, I think I found the solution... The fact that group dimension IDs are in fact unique makes possible to = match them with dimension IDs for variables... But only if I have a list of 1) The full path of all dimensions in the file 2) The full path of all dimensions for each variable I already had this. I constructed my "path only" model by recursively = iterating the file, starting at root,=20 and for every group I store the current path passed as a parameter to = the recursive function. The API gets me all local info for variables, for the current group, = including dimensions for variables and dimensions for groups. The additional step is to store for each group, the dimension ID, and = for every variable dimension, its ID. Then match them. So, I take back my comment that "IDs are a recipe for disaster", for = dimensions they are actually the solution.I was thinking more of variable IDs, that can have duplicated values for =each group, somehow I missed this dimension ID issue. Here's my output with this patch applied ncks: INFO nco_bld_dmn_ids_trv() traversing variable = </g16/g16g2/lon1_var> match <8> for var dim </g16/lon1> and group dim </g16/lon1> In summary 1) the API does not get me the full dimension path for each variable, = but it's possible to construct them. 2) I don't need variable IDs and group IDs Pedro ------ Pedro Vicente, Earth System Science University of California, Irvine http://www.ess.uci.edu/
netcdfgroup
archives: