Upload and Download Support for TDS

For version 5.0.0, it is possible to configure TDS to support the uploading and downloading of files into the local file system using the "/thredds/download" url path. This is primarily intended to support local File materialization for server-side computing. The idea is that a component such asJupyter can materialize files from TDS to make them available to code being run in Jupyter. Additionally, any final output from the code execution can be uploaded to a specific location in the TDS catalog to make it available externally.

Note that this functionality is not strictly necessary since it could all be done on the client side independent of TDS. It is, however, useful because the client does not need to duplicate code already available on the TDS server. This means that this service provides the following benefits to the client.

  1. It is lightweight WRT the client
  2. It is language independent

Assumptions

The essential assumption for this service is that any external code using this service is running on the same machine as the Thredds server,or at least has a common file system so that file system operations by thredds are visible to the external code.

An additional assumption is that "nested" calls to the Thredds server will not cause a deadlock. This is how access to non-file datasets (e.g. via DAP2 or DAP4 or GRIB or NCML) is accomplished. That is, the download code on the server will do a nested call to the server to obtain the output of the request. Experimentation shows this is not currently a problem.

Supported File Formats

Currently the dowload service supports the creation of files in two formats:

  1. Netcdf classic (aka netcdf-3)
  2. Netcdf enhanced (aka netcdf-4)

Download Service Protocol

A set of query parameters control the operation of this service. Note that all of the query parameter values (but not keys) are assumed to be url-encoded (%xx), so beware. Also, all return values are url-encoded.

Request and Reply

Invoking this service is accomplished using a URL pattern like this.

http://host:port/thredds/download/?key=value&key=value&...

In all cases, the reply value for the invocation will be of this form.

key=value&key=value&...

The specific keys depend on the invocation.

Defined Requests

The primary key is request. It indicates what action is requested of the server.

The set of defined values for the request key are as follows.

  • download
  • inquire

Request Keys Specific to "request=download"

  • format -- This specifies the format for the returned dataset; two values are currently defined: netcdfd3 and netcdf4.

  • url -- This is a thredds server url specifying the actual dataset to be downloaded.

  • target -- This specifies the relative path for the downloaded file. If the file already exists, it will be overwritten. Any leading directories will be created underneath downloaddir (see below).

Reply Keys Specific to "request=download"

  • download -- The absolute path of the downloaded file. In all cases, it will be under the downloaddir directory.

Request Keys Specific to "request=inquire"

  • inquire -- This specifies a semi-colon separated list of keys whose value is desired. Currently, the only defined key is downloaddir, which returns the absolute path of the download directory. All downloaded files will be placed under this directory.

Reply Keys Specific to "request=inquire"

  • downloaddir -- The absolute path of the directory under which all downloaded files are placed.

Upload Service Protocol

File upload is not handled directly by calling the Thredds server. Rather, it is handled by creating a directory that is to be scanned by the Thredds server to be made available at a specific point in the standard catalog.

Thredds Server Configuration

In order to activate upload and/or download, one or both of the following Java -D flags must be provided to the Thredds server.

  • -Dtds.download.dir -- Specify the absolute path of a directory into which files will be downloaded.
  • -Dtds.upload.dir -- Specify the absolute path of a directory into which files may be uploaded.

Security concerns (see below) must be addressed when setting the permission on these directories.

In order to complete the establishment of an upload directory, the following entry must be added to the catalog.xml file for the Thredds server.

<datasetScan name="Uploaded Files" ID="upload" location="${tds.upload.dir}" path="upload/">
    <metadata inherited="true">
      <serviceName>all</serviceName>
      <dataType>Station</dataType>
    </metadata>
</datasetScan>

Optionally, if one wants to make the download directory visible, the following can be added to the same file.

<datasetScan name="Downloaded Files" ID="download" location="${tds.download.dir}" path="download/">
    <metadata inherited="true">
      <serviceName>all</serviceName>
      <dataType>Station</dataType>
    </metadata>
</datasetScan>

Security Issues

It should be clear that providing upload and download capabilties can introduce security concerns.

The primary issue is that this service will cause the Thredds server to write into user-specified locations in the file system. In order to prevent malicious writing of files, the download directory (specified by tds.download.dir) should be created in a safe place. Typically, this means it should be placed under a directory such as "/tmp" on Linux or an equivalent location for other operating systems.

This directory will be read and written by the user running the Thredds server, typically "tomcat". The best practice for this is to create a specific user and group and set the download directories user and group to those values. Then the appropriate Posix permissions for that directory should be "rwxrwx---". Finally, the user "tomcat" should be added the created group.

Corresponding concerns apply to the upload directory and so its owner, group, and permissions should be set similarly to the download directory.

The url used to specify the dataset to be downloaded also raise security concerns. The url is tested for two specific url patterns to ensure proper behavior.

  1. The pattern".." is disallowed in order to avoid attempts to escape the thredds sandbox.
  2. The pattern"/download/" is disallowed in order to prevent an access loop in which a download call attempts to call download again.

In order to provide additional sandboxing, the url provided by the client is modified to ignore the host, port and servlet prefix. They are replaced with the "<host>:<port>/thredds" of the thredds server. This is to prevent attempts to use the thredds server to access external data sources, which would otherwise provide a security leak.

Finally, it is desirable that some additional access controls be applied. Specifically, Tomcat should be configured to require client-side certificates so that all clients using this service must have access to that certificate.

Examples

Example 1: Download a file (via fileServer protocol)

request:

http://localhost:8081/thredds/download/?request=download&format=netcdf3&target=nc3/testData.nc3&url=http://host:80/thredds/fileServer/localContent/testData.nc&testinfo=testdirs=d:/git/download/tds/src/test/resources/thredds/server/download/testfiles

reply:

download=c:/Temp/download/nc3/testData.nc3

Note: the encoded version of the request:

http://localhost:8081/thredds/download/?request=download&format=netcdf3&target=nc3%2FtestData.nc3&url=http%3A%2F%2Fhost%3A80%2Fthredds%2FfileServer%2FlocalContent%2FtestData.nc&testinfo=testdirs%3Dd%3A%2Fgit%2Fdownload%2Ftds%2Fsrc%2Ftest%2Fresources%2Fthredds%2Fserver%2Fdownload%2Ftestfiles

Example 2: Download a DAP2 request as a NetCDF-3 File

request:

http://localhost:8081/thredds/download/?request=download&format=netcdf3&target=testData.nc3&url=http://host:80/thredds/dodsC/localContent/testData.nc&testinfo=testdirs=d:/git/download/tds/src/test/resources/thredds/server/download/testfiles

reply:

download=c:/Temp/download/testData.nc3

Implementing Multi-threaded IO for the NetCDF-3 format

The issue of multi-threaded read and write of NetCDF files has repeatedly arisen in the netcdf mailing list (netcdfgroup@unidata.ucar.edu). Read on for a discussion of some options.

[Read More]

High Performance Netcdf-4 Proposal

This documents outlines a proposal to create an alternate Netcdf-4 file format targeted to high-performance, READ-ONLY, access.

[Read More]

Simple Plotting in Python with matplotlib

In this series, we work on some simpler tasks:

  1. Making a line plot using matplotlib
  2. Downloading a time-series of data from a THREDDS server
  3. Plotting the data using matplotlib
[Read More]

Requesting Grid Parameters and Levels with python-awips

The Python AWIPS Data Access Framework can be used to query available grid parameters and levels if given a known Grid name (as of AWIPS 15.1.3 we can not query derived parameters, only parameters which have been directly decoded).

[Read More]
Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Recent Entries:
Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« August 2016
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
25
26
27
28
29
30
31
   
       
Today