Siphon 0.5.0 has been released with a few improvements and features:
- The datasets and catalog references can now be grabbed from their collections by position (index) (as well as by name).
- Collections of datasets and catalog references now have helper functions that allow extracting a time range or item closest to a time, assuming the entries have appropriately formatted times in the names.
- Datasets gained functions that simplify setting up access over various TDS services
- A catalog with a latest dataset now has a
latestattribute that points directly to this dataset
Full releases notes are available on the GitHub Release page
Specific examples of new APIs
Two of the main improvements to the Siphon API are access to the collection of datasets directly by numeric index and simplified methods for using different data access methods. So before in Siphon one might do:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/' 'NCEP/GFS/Global_0p25deg/catalog.xml') ds = list(cat.datasets.values()) ncss = NCSS(ds.access_urls['NetcdfSubset'])
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/' 'NCEP/GFS/Global_0p25deg/catalog.xml') ds = cat.datasets ncss = ds.subset()
Similarly, for OPeNDAP or CDMRemote access, you now can do:
nc = ds.remote_access()
nc is a netCDF4-python
Dataset object (or similar for CDMRemote).
By default this uses CDMRemote where available (since it's built into Siphon),
but will fall-back to OPeNDAP (or can be manually selected).
There is also support for getting a file-like object for accessing the raw data using HTTP, or just downloading the file locally:
fobj = ds.remote_open() # Download locally ds.download('local/file/path')
Siphon has also simplified access to the automatically resolved latest dataset identified on THREDDS servers. Previously, this involved manually finding the latest within the collection of datasets, or using the helper function as:
latest_opendap = get_latest_access_url('http://thredds.ucar.edu/thredds/catalog/grib/' 'NCEP/GFS/Global_0p25deg/catalog.xml', 'OPENDAP') nc = Dataset(latest_opendap)
This now becomes:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/' 'NCEP/GFS/Global_0p25deg/catalog.xml') nc = cat.latest.remote_access()
Siphon has also gained the ability to filter particular datasets from those in the catalog using dates and times. This relies on extracting times from the names using an assumed time format (defaults to YYYYMMDD_HHMM). So now users can do:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/' 'NCEP/GFS/Global_0p25deg/catalog.xml') # Find the run closest to 6 hours ago time = datetime.utcnow() - timedelta(hours=6) ds = cat.filter_time_nearest(time) # Find all runs from the last day end = datetime.utcnow() start = end - timedelta(days=1) datasets = cat.filter_time_range(start, end)
It is possible to pass a custom regular expression to support other time formats.