Unidata Python Efforts

Status Report: April 2013 - September 2014

Sean Arms, Julien Chastang, Ben Domenico, Ward Fisher, Ryan May, Russ Rew

Python has been embraced by the earth science community for analysis, visualization and data exploration. Geoscience professionals are replacing collections of poorly integrated software tools and languages with this general purpose programming language that can handle remote data requests, statistics, analysis, and visualization. As a result, the Unidata 2018 Proposal highlights the Python programming language and ecosystem as an area where Unidata should focus efforts to benefit the core community. To that end, we have initiated Python training and software projects centered around existing Unidata technology.

Strategic Focus Areas

Python activity at Unidata supports the Unidata strategic goals in the following ways:

  1. Enable widespread, efficient access to geoscience data. Python can facilitate data-proximate computations and analyses through IPython (now Jupyter) Notebook technology. In particular, IPython Notebook web servers can be co-located to the data source for analysis and visualization through web browsers. This capability in turn, reduces the amount of data that must travel across computing networks. There are also external providers such as Wakari and coLaboratory that help to promote the use of this technology as a cloud service.

  2. Develop and provide open-source tools for effective use of geoscience data. Our current and forthcoming efforts in the Python arena will facilitate analysis of geoscience data. This goal will be achieved by continuing to develop Python APIs tailored to Unidata technologies. For the summer 2013 Unidata training workshop, we developed an API to facilitate data access from a THREDDS data server. This effort was later encapsulated with the new pyUDL (a collection of Python utilities for interacting with Unidata technologies) project. Moreover, Python technology coupled with HTML5 IPython Notebook technology has the potential to address "very large datasets" problems. In particular, an IPython Notebook can be theoretically co-located to the data source and accessed via a web browser thereby allowing geoscience professionals to analyze data where the data reside without having to move large amounts of information across networks. This concept fits nicely with the "Unidata in the cloud" vision. Lastly, as a general purpose programming language, Python has the capability to analyze and visualize diverse data in one environment through numerous, well-maintained open-source APIs.

  3. Provide cyberinfrastructure leadership in data discovery, access, and use. The TDS catalog crawling capabilities found in pyUDL will facilitate access to data remotely served by the Unidata TDS, as well as other TDS instances around the world. The desired goal of pyCDM is to construct a geoscience focused data model in Python, based heavily on the netCDF-Java implementation of the Common Data Model (CDM). pyCDM is anticipated to provide a simple, pythonic API to the higher level functionality of the FeatureType layer of the CDM.

  4. Build, support, and advocate for the diverse geoscience community. Based on grassroots interest from the geoscience community, Unidata, as part of its annual training workshop, will host a two day session to explore "Python with Unidata technology". Also, to try to help the use of NetCDF in Python, Unidata has promoted Jeff Whittaker's NetCDF4-python project, including hosting its repository under Unidata's GitHub account.

Activities since last spring

SciPy 2014

Ryan May and Julien Chastang attended the SciPy 2014 conference in Austin, TX. The atmospheric and oceanic sciences community continues to grow its presence, which is promising to see. Another common thread throughout the conference was the use of IPython, especially the notebook interface. It is clear that this technology is a vital and vibrant platform for development. Other notes:

  • The IRIS and Cartopy projects continue to be active and well-supported. Cartopy seems to be an excellent replacement for Basemap, with an API that better fits in with matplotlib. IRIS seems promising for working with data, but we lack experience with it in-house to fully evaluate and understand its capabilities.

  • The IPython project has been rebranded as "Project Jupyter" (Julia, Python, R). The goal of the rebrand is to place emphasis on its language-agnostic capabilities (many different languages can actually be used for computational kernels) and try to foster a greater community around the concept fundamental to the project: a distributed, collaborative, and reproducible research environment. Part of the short-term work on this project will be to separate any remaining python-specific parts of the core from the IPython kernel itself.

  • coLaboratory is based on Project Jupyter and provides an environment to collaborate on IPython notebooks through Google Drive.

  • With scientific reproducibility as a goal, "Conda" technology is meant to solve the somewhat bleak Python packing problem. It is a system level package manager that is cross-platform (Linux, OS X, Windows). Conda is Python agnostic. It does not require administrator privileges. Conda installs binaries (no compilation required).

  • Binstar is the mechanism by which users and organizations share Conda packages.

  • Biggus is another Scitools project (e.g., IRIS and Cartopy ) that has emerged from the British Met Office. Biggus is for lazily handling very large arrays that cannot fit exclusively into memory.

  • Julien Chastang presented a birds of a feather (BoF) on emacs and Python.

Unidata Python Workshop

Organizing the Unidata Python Workshop . This workshop aims at geoscience analysis, and visualization centered around Unidata technology and in particular, netCDF and THREDDS. In addition, the workshop tries to introduce important concepts from the scientific Python stack, such as git version control, numpy, and matplotlib. We continue to refine the materials used in the last workshop to make improvements and adjust their scope.

Cloud-based collaborative python development

Wakari is a software vendor that provides web-based Python data analysis. As part of Unidata's first training workshop on software development using Python, we began to experiment with the Wakari cloud-hosted development solution. The objective here to enable server-side data-proximate analysis as well as to facilitate the Python software installation process for our user community. This work has continued albeit at a slower pace due to the departure of one of the main contributing software engineers.

2014 Student Summer Internship

The Unidata Student Summer Internship program concluded its second year in August 2014. This year, two students participated in the program. One student, Florita Rodriguez from Texas A&M University, focused on using python and the interactive widgets from IPython to interact with current and archived tropical storm and hurricane data from the National Hurricane Center. The project is open source, and can be found under Unidata's github account. More information can be found in Florita's blog post on the Unidata Developers Blog.

Planned Activities

Ongoing Activities

We plan to continue the following paths of development and community engagement:

  • netcdf4-python

    • Continue to supplement Jeff's user support as resources allow.

    • The move to GitHub has continued to yield increased community participation in terms of issues reported and submitted pull requests.

    • Help develop full support of the netCDF-4 data model.

    • Expose the ability to access data from the TDS using the CDM Remote access protocol.

  • OWSlib and Brokering

    • Since the training workshop, the cloud-based development has been focused on using community supported OWSlib tools for accessing data from OPeNDAP servers via a brokering layer that makes the data available via other standard interfaces, especially Web Map Service (WMS) and Web Coverage Service (WCS). Very recently, experimentation has begun with SOS in the context of the ncSOS extension to TDS.

    • This collaborative effort continues as resources allow.

    • Unidata has been invited to participate in the Research Data Alliance (RDA) as a member of the brokering middleware governance working group.

  • matplotlib

    • Previously contributed code to enable Skew-T plots has been released with 1.4.0.

    • Plan to enhance animation support in matplotlib to add control toolbar. This request has been made by many in the matplotlib community, including Dr. Alex DeCaria of Millersville.

  • MetPy

    • Completed Nexrad Level 2 / Level 3 decoders in support of testing these formats in netCDF-java

    • Need to develop consistent internal data model (PyCDM?) for library

New Activities

We plan to contribute to the Python ecosystem with the following effort:

  • pyCDM

    • Create an implementation of the Common Data Model (CDM) in python.

    • Starting work on a proposal in anticipation of future RFPs.

    • Looking for collaborators.

    • In June, we met with Martin Schultz and Snehal Waychal from Forschungszentrum Julichas. They wanted to share with us their beginning development of a pyCDM library to facilitate their project. They graciously shared their code with us.

  • IPython webGL-based visualization

    • Using IPython (without the notebook interface) we can interface Python analysis code on the server with javascript (and WebGL) code for visualization in the client.

Relevant Metrics

33 issues and 22 pull requests created for netcdf4-python since 1 April 2014.