[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[python #NQM-429600]: Possible bugs on THREDDS?



Greetings Ying,

I've driven your email into our support system and will respond here, that
way it won't get lost in my inbox.

The issue is that you have made a different time dimension subset request
against the variable time as compared to the variable sithick. Specifically, 

time[0:1:1979]

vs

sithick[0:1:1917][68:1:89][0:1:143]'

0:1:1979 vs 0:1;1917.

So, netCDF-C had to create a new dimension to hold the 0:1:1917 slice on the
sithick variable. If you make the time dimension subset the same length between
those two variables, then you are good:

ds = 
Dataset("https://esgf.nccs.nasa.gov/thredds/dodsC/CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.SImon.sithick.gn.sithick.20180827.aggregation.1?time[0:1:1917],lat[68:1:89],lon[0:1:143],sithick[0:1:1917][68:1:89][0:1:143]";)

ds.dimensions
OrderedDict([('lat',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 22),
             ('lon',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144),
             ('time',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 
1918)])


That said, the normal way of working with OPeNDAP from python that I've seen is 
to
open the dataset without any subsetting parameters first, and then allow the 
library
to subset variables as you slice them. So, something like:

ds = 
Dataset("https://esgf.nccs.nasa.gov/thredds/dodsC/CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.SImon.sithick.gn.sithick.20180827.aggregation.1";)

At this point, only very basic metadata has been transferred (no actual data), 
so
it's very fast.

Then, when you access a variable and make a slice, only the data associated 
with the
slice is returned. So, if we look at the variable sithick, we see:

ds.variables["sithick"].shape
Out[12]: (1980, 90, 144)

To subset this one variable, just slice it like a numpy array - it's only at 
this
point that data are actually transferred. For example, to request 
sithick[0:1:1917][68:1:89][0:1:143], simply do something like:

sithick =  ds.variables["sithick"][0:1918,68:90,0:144]

Then, you will see that you've made the subset:

sithick .shape
Out[15]: (1918, 22, 144)

Note that I've added 1 to the end each part of the slice, because OPeNDAP 
indexing
is inclusive on the end of the slice, whereas numpy is not inclusive on the end 
of the
slice.

Cheers,

Sean

> Hi Sean
> 
> I wonder if you can help me out to find out if there is a bug or there is
> something wrong with my code.
> 
> I am trying to open a subset of aggregated dataset via THREDDS (this example
> is for the seaIce thickness, I only wanted to have latitude 45 North to the
> North Pole.  So in my python code
> 
> URL = 
> 'https://esgf.nccs.nasa.gov/thredds/dodsC/CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.SImon.sithick.gn.sithick.20180827.aggregation.1?time[0:1:1979],lat[68:1:89],lon[0:1:143],sithick[0:1:1917][68:1:89][0:1:143]'
> 
> nc1 = netCDF4.Dataset(URL)
> print(nc1)
> 
> ...... omitted ....
> 
> dimensions(sizes): lat(22), lon(144), time(1980), time_1(1918)
> variables(dimensions): float64 lat(lat), float64 lon(lon), float64 
> time(time), float32 sithick(time_1,lat,lon)
> 
> Where is time_1 coming from?  the variable sithick should have dim like 
> (time,lat,lon)
> but it changed to time_1 which was not in the original dataset.
> 
> if I don't specify a subset (from 45 north to the pole)
> 
> URL = 
> 'https://esgf.nccs.nasa.gov/thredds/dodsC/CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.SImon.sithick.gn.sithick.20180827.aggregation.1'
> 
> nc2 = netCDF4.Dataset(URL)
> 
> print(nc2)
> 
> ............... omitted...............
> 
> 
> dimensions(sizes): bnds(2), lat(90), lon(144), time(1980)
> 
> variables(dimensions): float64 lat(lat), float64 lat_bnds(lat,bnds), float64 
> lon(lon), float64 lon_bnds(lon,bnds), float64 time(time), float64 
> time_bnds(time,bnds), float32 sithick(time,lat,lon)
> 
> 
> So something strange (wrong) of using subset which the whole purpose of
> opendap is designed for, or perhaps I made a wrong request of subset ?
> 
> Thanks
> 
> Ying


Ticket Details
===================
Ticket ID: NQM-429600
Department: Support Python
Priority: Low
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.