[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[python #HBO-649201]: IO bounds with large netCDF



Nathan,

Are you only doing this using Python? Or have you tried the netCDF-C library as 
well?

I'm wondering if there's some issues of how the data are chunked at play:

https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters
https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_choosing_shapes
https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf_chunking.html

Ryan

> Ryan,
> 
> Patrick Marsh here at SPC has pointed me in your direction to ask about an
> issue I am having. I have a netCDF file that has a variable in it that is
> 8784*2502*5852 in size. I have been using packages like h5netcdf to take
> advantage of parallel IO. The calculations I do are basically taking 2D
> arrays from each level of the first dimension and also taking 1D arrays at
> each point for all time (again, the first axis). What I am running into is
> that my analysis is IO bound. I have attempted to alleviate some of this
> problem by making temporary arrays that store a chunk of data before
> writing the results back to disk. This helps some, but it would still take
> a day or more to process. Are you aware of anything I could look into to
> try and make this more efficient or have someone else I should ask about
> this? I appreciate whatever advice you might have.

Ticket Details
===================
Ticket ID: HBO-649201
Department: Support Python
Priority: Low
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.