The Future of NetCDF


Russ Rew

NetCDF Annual Update

2012-10-26

Overview

Short- and long-term plans for netCDF and other data access infrastructure development

 

Tentative plans for netCDF-4.3 and beyond

 

Speculations about the future of scientific data access ...

Goals for Unidata data access infrastructure


The next 5 years will be challenging for Unidata's data access infrastructure efforts.


Our efforts will be focused on incremental innovations to:

Down arrow

  • Manage a graceful transition from a simple data model (netCDF-3) to the enhanced Common Data Model of netCDF-4
  • Provide better support for remote access and server-side data analysis
  • Respond to the need to faithfully represent observational data as well as gridded data
  • Scale up to handle larger volumes of data efficiently
  • Serve a larger user community wishing to integrate satellite products, geospatial data, observations, and model outputs from growing archives

Near-term plans for netCDF


We are constrained by backward compatibility commitments:

  • Don't break archives: new versions must be able to access existing netCDF data
  • Don't break programs: new libraries must support previous APIs

 

Plans for the next year are fairly fluid. Follow changing plans on our projects site.

 

Tentative plans:

Down arrow

C-4.3 plans:

  • CMake support for Windows VS
  • bug and documentation fixes

 

Fortran-4.3 plans:

  • addition of a few missing functions
  • Fortran-2003 C-interoperability support ?
  • CMake support for Windows VS ?
  • bug and documentation fixes

Longer Term Plans

 

  • Finish documentation conversion to Doxygen
  • "Lazy open" for data from many large files
  • Improve compression to GRIB2 levels
  • Client support for DAP4 protocol
  • Automatic packing/unpacking in library
  • Support array slice query notation
  • Big test data collection for tool developers
  • Support high-level chunking policies
  • Provide guidance on chunking & compression
  • Refactor into more & smaller utilities
  • Support asynchronous I/O for remote access

Even Longer Term Plans

 

Some of these may just be crazy talk ...

  • Support data access by coordinates instead of indexes
  • Make more netCDF-Java advanced functionality available from C
  • Implement standard requests for server-side analysis
  • Keep up with HDF5 advances for high-performance computing
  • Develop and implement intelligent chunking & compression
  • Space Filling Curves!
  • Make library updates easy for users

Speculations


  • I/O bottlenecks for high-performance computing will worsen
  • Use of massively parallel shared-nothing file systems will grow
  • Data will be generated too fast to store, filtering will become a priority
  • Multi-resolution wavelet representations will get more popular
  • Non-volatile memory technologies will replace most spinning disks and change programming
  • Lack of organizational support will lead to losses of Valuable Data
  • Format-independent conventions will continue to evolve too slowly

We appreciate feedback on netCDF plans!

  • Other speculations?
  • Questions?
  • Feedback?