[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[python #SOS-815998]: Opening Large netCDF files on HPC



Greetings!

There are many reasons for the difference on HPC vs. locally:
1. An issue with how the data are stored on the HPC system
2. Difference in versions of netcdf4-python, libnetcdf (netcdf-C), or HDF5 on 
your local system vs. HPC--if the libraries are not configured correctly on the 
HPC system that could create major issues

The first test I would run is see how this runs on the HPC system vs. locally:

    fobj = open('path/to/netCDFdata.nc', 'rb')
    while buf := fobj.read(1024 * 1024):
        continue

This reads the full file in 1 megabyte chunks and can help narrow down where 
the problem is.

Out of curiosity, in this command:

    nc4.Dataset(“path/to/netCDFdata.nc”, mode="r", format="NETCDF4", 
diskless=True)

Why are you opening an existing file on disk using "diskless" mode? I believe 
the normal use for that option is for creating a new netcdf dataset without 
writing the data to disk.

Cheers!

Ryan

> Hi,
> 
> I am working with a 6GB netCDF file.
> 
> When I try to open the netCDF data on an a gpfs HPC system, it keeps loading 
> for a long time.
> However, I am able to open the data on my local computer.
> 
> Do you know of anything I can do to be able to load the dataset easily on an 
> HPC?
> 
> Below is the code I use to open the file:
> nc_data = nc4.Dataset(“path/to/netCDFdata.nc”, mode="r", format="NETCDF4", 
> diskless=True)


Ticket Details
===================
Ticket ID: SOS-815998
Department: Support Python
Priority: Low
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.