Re: [netcdfgroup] [netCDF #CAE-726196]: optimal chunking and cache for interpolating sparse data in a large grid

To: Armin Corbin <corbin@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdfgroup] [netCDF #CAE-726196]: optimal chunking and cache for interpolating sparse data in a large grid
From: Julian Kunkel <juliankunkel@xxxxxxxxxxxxxx>
Date: Fri, 24 Jul 2020 15:05:37 +0100

Dear Armin,
your question is not simple to answer, let me try to give you some general
advice that might help you to analyze and improve your use case.

The chunk dimension "1x19x18x18" mean you have 6156 elements in a
chunk, which is rather small and likely causing more overhead than
benefit as you encountered.

Overall, the file size for DEN and ZG is each about 320 MB.


It is important to know details about the system you perform the experiment.

Some questions:

- Where is your data stored? It looks likely to me it is on a shared
global file system and not a local machine.

- Are you accessing the data using Python?

- Do you  always access th same 8 height profiles in ZG? If so you
could theoretically rewrite the file to only contain those.


However, I would initially ignore *any* NetCDF optimization initially
and copy the NC files to /dev/shm which means it is stored in main
memory.

Then access it from there and see if that leads already to acceptable
runtime => it should.

If not, I would do the same experiment but call
nc_set_var_chunk_cache(ncid, varid, ((size_t) 1024)*1024*1024, 144,
0.0); // hence use 1 GB cache which fits all data, 144 chunks you have


Feel free to share some performance numbers (you can PM me).


Best,

Julian

References:
- [netcdfgroup] [netCDF #CAE-726196]: optimal chunking and cache for interpolating sparse data in a large grid
  - From: Armin Corbin

2020 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: