NetCDF

Status Report: September 2014 - February 2015

Ward Fisher, Dennis Heimbigner, Russ Rew

Strategic Focus Areas

The netCDF group's activities support the following goals described in Unidata's Strategic Plan:

  1. Enable widespread, efficient access to geoscience data
    by developing netCDF and related cyberinfrastructure solutions to facilitate local and remote access to scientific data.
  2. Develop and provide open-source tools for effective use of geoscience data
    by supporting use of netCDF and related technologies for analyzing, integrating, and visualizing multidimensional geoscience data; enabling effective use of very large data sets; and accessing, managing, and sharing collections of heterogeneous data from diverse sources.
  3. Provide cyberinfrastructure leadership in data discovery, access, and use
    by developing useful data models, frameworks, and protocols for geoscience data; advancing geoscience data and metadata standards and conventions; and providing information and guidance on emerging cyberinfrastructure trends and technologies.
  4. Build, support, and advocate for the diverse geoscience community
    by providing expertise in implementing effective data management, conducting training workshops, responding to support questions, maintaining comprehensive documentation, maintaining example programs and files, and keeping online FAQs, best practices, and web site up to date; fostering interactions between community members; and advocating community perspectives at scientific meetings, conferences, and other venues.

 

Activities Since the Last Status Report

New Features, Performance Enhancements, and Bug Fixes

We use Jira and GitHub tools for C, Fortran, and C++, interfaces to provide transparent feature development, handle performance issues, fix bugs, deploy new releases, and collaborate with other developers. We currently have 91 open issues for netCDF-C, 18 open issues for netCDF-Fortran, and 3 open issues for netCDF-C++. The Unidata CDM/TDS group maintains the netCDF Java interface, also using Jira and GitHub, and we collaborate with external developers in maintaining the Python interface.

  • In the netCDF group, progress has been made in the following areas since the last status report:
    • Integrate and test new floating-point compression plug-in technologies for use with netCDF-4
    • Improve ease of building Fortran interface
    • Fix organization of on-line documentation
    • Support continuous integration for development
  • Dependencies, challenges, problems, and risks include:
    • Small group of developers for supporting large project
    • Dependency on HDF5, controlled by external group
    • Slow progress in user adoption of netCDF-4 features

Planned Activities

Ongoing Activities

We plan to continue the following activities:

  • Provide support to a large world-wide community of netCDF developers and users
  • Continue development, maintenance, and testing of source code for multiple language libraries and generic netCDF utility programs
  • Improve organization of Doxygen-generated documentation for the netCDF-C and Fortran libraries

New Activities

Over the next three months, we plan to organize or take part in the following:

  • Prepare material for the Unidata Python training workshop in July
  • Respond to Naval Research Lab patent application for "System and Method for Importing NetCDF Data"
  • Incorporate support for 64-bit-everything netCDF format from parallel netCDF project at Argonne and Northwestern
  • Transition to new netCDF project head (Ward Fisher replacing Russ Rew)

Over the next twelve months, we plan to organize or take part in the following:

  • Submit an abstract for a netCDF update talk at annual AMS meeting
  • Deploy a release with compression competitive with GRIB2
  • Participate in development of new CF 2.0 conventions for climate and forecast simulation output and observational data in netCDF-4 form
  • Continue to encourage and support use of netCDF-4's enhanced data model by third-party developers

Beyond a one-year time frame, we plan to organize or take part in the following:

  • Implement DAP-4 client support in netCDF C library
  • Provide thread-safety for netCDF C library
  • Improve scalability to handle huge datasets and collections

Areas for Committee Feedback

Community Services is requesting your feedback on the following topics:

  1. Are there any HDF5 features that you wish netCDF supported?
  2. If netCDF compression were better than GRIB, would you still have uses for GRIB?
  3. Should netCDF be ported to and maintained for any other programming languages or development environments?

 

Relevant Metrics

There are currently about 140,500 lines of code in the netCDF C library source.

The Coverity estimate for defect density (the number of defects per thousand lines of code) in the netCDF C library source has been reduced slightly from 0.36 six months ago to 0.35 today. According to Coverity's analysis of over 250 million lines of open source projects that use their analysis tools, the average defect density with 100,000 to 500,000 lines of code is 0.50.

There were a record number of downloads this year (over 135,000), and a monthly record for downloads in February 2015 (13,265).

Google hits reported when searching for a term such as netCDF-4 don't seem very useful over the long term, as the algorithms for quickly estimating the number of web pages containing a specified term or phrase are proprietary and seem to change frequently. However, this metric may be useful at any particular time for comparing popularity among a set of related terms. Currently, Google hits, for comparison, are:

  • 528,000 for netCDF-3
  • 511,000 for netCDF-4
  • 375,000 for HDF5
  • 132,000 for GRIB2
  • 273,000,000 for "Taylor Swift"

Google Scholar hits, which supposedly count appearances in peer-reviewed scholarly publications, are:

  • 213 for netCDF-3
  • 312 for netCDF-4
  • 5,260 for HDF5
  • 428 for GRIB2
  • 2,540 for "Taylor Swift"

Google Patent hits, computed by searching both filed and published patent applications, are:

  • 1,450 for netCDF-3
  • 1,350 for netCDF-4
  • 284 for HDF5
  • 3 for GRIB2
  • 42 for "Taylor Swift"

So, we finally found a metric where netCDF beats Taylor Swift, and by a large margin.