Unidata Python Efforts

Status Report: October 2013 - March 2014

Sean Arms, Julien Chastang, Ben Domenico, Ward Fisher, Ryan May, Russ Rew

Python has been embraced by the earth science community for analysis, visualization and data exploration. Geoscience professionals are replacing collections of poorly integrated software tools and languages with this general purpose programming language that can handle remote data requests, statistics, analysis, and visualization. As a result, the Unidata 2018 Proposal highlights the Python programming language and ecosystem as an area where Unidata should focus efforts to benefit the core community. To that end, we have initiated Python training and software projects centered around existing Unidata technology.

Strategic Focus Areas

Python activity at Unidata supports the Unidata strategic goals in the following ways:

  1. Enable widespread, efficient access to geoscience data. Python can facilitate data-proximate computations and analyses through IPython Notebook technology. In particular, IPython Notebook web servers can be co-located to the data source for analysis and visualization through web browsers. This capability in turn, reduces the amount of data that must travel across computing networks.

  2. Develop and provide open-source tools for effective use of geoscience data. Our current and forthcoming efforts in the Python arena will facilitate analysis of geoscience data. This goal will be achieved by continuing to develop Python APIs tailored to Unidata technologies. For the fall 2013 Unidata training workshop, we developed an API to facilitate data access from a THREDDS data server. This effort was later encapsulated with the new pyUDL (a collection of Python utilities for interacting with Unidata technologies) project. In addition, a project is underway by Unidata staff and collaborators to develop a pyUDL API to access satellite imagery from ADDE servers for subsequent analysis, visualization and integration with other datasets. Moreover, Python technology coupled with HTML5 IPython Notebook technology has the potential to address "very large datasets" problems. In particular, an IPython Notebook can be theoretically co-located to the data source and accessed via a web browser thereby allowing geoscience professionals to analyze data where the data reside without having to move large amounts of information across networks. This concept fits nicely with the "Unidata in the cloud" vision. Lastly, as a general purpose programming language, Python has the capability to analyze and visualize diverse data in one environment through numerous, well-maintained open-source APIs.

  3. Provide cyberinfrastructure leadership in data discovery, access, and use. The TDS catalog crawling capabilities found in pyUDL will facilitate access to data remotely served by the Unidata TDS, as well as other TDS instances around the world. The desired goal of pyCDM is to construct a geoscience focused data model in Python, based heavily on the netCDF-Java implementation of the Common Data Model (CDM). pyCDM is anticipated to provide a simple, pythonic API to the higher level functionality of the FeatureType layer of the CDM.

  4. Build, support, and advocate for the diverse geoscience community. Based on grassroots interest from the geoscience community, Unidata hosted a one day training workshop aimed at leveraging Python to obtain and analyze data from the THREDDS data server. This training workshop was filled with the maximum number of possible workshop attendees. Because of this promising start, we plan on expanding this training workshop to more broadly explore "Python with Unidata technology" over two days. In addition, we are now hosting Dr. Jeff Whitaker's netCDF-Python API on Unidata's GitHub account. Our aim is to raise the visibility of this project and foster increased code contributions from the geoscience open-source community.

Activities since last fall

TDS Python Workshop

Organized a well attended workshop on Python with TDS technology . This workshop was aimed at geoscience analysis, and visualization centered around Unidata technology and in particular, netCDF and THREDDS.

netcdf4-python

  • Designated Jeff Whitaker's netcdf4-python library as the Python language bindings Unidata will recommend for the NetCDF library.

  • Migrated the project from Google Code to GitHub under the Unidata organization.

  • Moved the project from subversion to git as well as imported old issues into GitHub's issue tracker

  • Unidata plans to help with support and open source management of the project, including hosting release downloads.

  • Shortly after moving to GitHub, the project already had pull requests (code contributions in git version control parlance) from the community, including one to enable automated testing.

Foundation work for Skew-T support in Matplotlib

  • Finally incorporated long term pull request #1664 into matplotlib which paves the way for support of Skew-T plots in matplotlib.

  • This feature should appear in matplotlib's 1.4.0 release.

  • An example use of this new feature is shown below:

SkewT

pyUDL Library

  • The library is currently focused on TDS access. It was originally spun off from TDS Python workshop and later encapsulated into its own library.

  • There is functionality in place to interface McIDAS ADDE servers. This work is in progress but will eventaully enable satellite data viewing capability in a Python and IPython environment.

  • Pyudl is now hosted on GitHub.

pyCWT

Cloud-based collaborative python development

  • Wakari is a software vendor that provides web-based Python data analysis. As part of Unidata's first training workshop on software development using Python, we began to experiment with the Wakari cloud-hosted development solution. The objective here to enable server-side data-proximate analysis as well as to facilitate the Python software installation process for our user community. This work has continued albeit at a slower pace due to the departure of one of the main contributing software engineers.

Planned Activities

Ongoing Activities

We plan to continue the following paths of development and community engagement:

  • netcdf4-python

    • Help develop full support of the netCDF-4 data model.

    • Expose the ability to access data from the TDS using the CDM Remote access protocol.

  • OWSlib and Brokering

    • Since the training workshop, the cloud-based development has been focused on using community supported OWSlib tools for accessing data from OPeNDAP servers via a brokering layer that makes the data available via other standard interfaces, especially Web Map Service (WMS) and Web Coverage Serviced (WCS).

    • This collaborative effort continues as resources allow.

    • Unidata has been invited to participate in the Research Data Alliance (RDA) as a member of the brokering middleware governance working group.

New Activities

We plan to contribute to the Python ecosystem with the following effort:

  • pyCDM

    • Create an implementation of the Common Data Model (CDM) in python.

    • Starting work on a proposal in anticipation of future RFPs.

    • Looking for collaborators.

Relevant Metrics

14 new issues created for netcdf4-python in 14 days after moving to GitHub, compared to 217 in the total life of the project. This represents a significant increase in community participation in the project.