[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AWT-862217]: nccopy chunking argument



Mark,

> I've been looking closer at the cause of the second problem, and have a 
> hypothesis. When you look at how nccopy iterates through a variable when 
> making the copy (ie. in up_start_by_chunks() in nciter.c), it goes in reverse 
> order of the dimensions. e.g. for CHL1_mean[date,lon,lat] it scans first 
> through lat first, then lon, then date. However, this can be very memory 
> inefficient in the situation where you are trying to make the rearrangement 
> along the date dimension - you essentially have to load the entire file to 
> get enough data to write an entire date chunk....
> 
> I could see two solutions.
> 
> 1. automagically work out which dimension to scan in (hard to implement 
> robustly)
> 2. infer the scan direction from the -c argument i.e. if you only specify 
> date/5186 (and nothing else), and you have a variable with 
> date/1,lat/30,lon/30, then the most efficient way to rechunk it would be to 
> read along the date dimension first, then the lon and lats.....
> 
> Hmmm. I'm not sure that makes any sense - it's kind of hard to explain. Can 
> you follow my logic?

Yes, but I see some complications that make my head hurt.

If you want to rechunk a variable, it's not clear whether it's better
to access the input one input chunk at a time to write the output in an 
inefficient order, or to access the input in an inefficient order so that
you can write the output one output chunk at a time.

Currently the nc_next_iter() function in nciter.c does the former, but it 
sounds like you think it would be better if it did the latter. I think you
can construct examples where either strategy is efficient or horribly 
inefficient, depending on the shapes of chunks in the input and output 
files.

I think the right thing to do would be to determine, from the chunk shapes
of input and output, which strategy to implement, or even whether to use
a hybrid strategy involving multiple passes and an intermediate file or
in-memory structure.  I tried to determine whether this research has
already been done, but couldn't find a paper that provided a clear solution.

Maybe it's easier than I'm making it out to be, and there's a clear and simple
solution.  If so, I'd like to implement it!

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AWT-862217
Department: Support netCDF
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.