Re: [thredds] How to download bulk datasets?

To: John Caron <caron@xxxxxxxxxxxxxxxx>
Subject: Re: [thredds] How to download bulk datasets?
From: Heiko Klein <Heiko.Klein@xxxxxx>
Date: Mon, 10 May 2010 11:17:35 +0200

Hi John,

I played a bit more with the catalog.xml. This works well with wget. I
managed now to download all the netcdf-files from a directory:

wget -nc -r -l2 -A.nc   -I /thredds/fileServer/,/thredds/catalog/
'http://dev-vm188/thredds/catalog/osisaf/met.no/ice/'

I use here the existing datasetScan catalog.xml file, and fetch all
nc-files up to two links away. Beside the nc-file, I get the
catalog-file of the nc-file (e.g.
http://dev-vm188/thredds/catalog/osisaf/met.no/ice/catalog.html?dataset=met.no/ice/ice_conc_nh_200911261200_CF.nc),
too.

A catalog-file in the fileServer would be saver, since the 2-levels
(parent and child) might include other information, but at least I can
offer our users something already now.


Best regards,

Heiko

On 2010-05-06 21:31, John Caron wrote:
> Hi Heiko:
> 
> We use catalog.xml exactly because theres no standard html index format.
> A simple java GUI app could make this easy to do, but Im not clear if
> that would help your case.
> 
> John
> 
> On 5/6/2010 3:16 AM, Heiko Klein wrote:
>> Hi John,
>>
>> I don't think there is a standard format for directory index / listings.
>> Looking at the different implementations (Tomcat (DefaultServler,
>> listing = true), Jetty (dirAllowed = true), Apache (mod_dir,
>> DirectoryIndex)) the common pattern is, that they all have links to all
>> (non-hidden) files in the directory, and not much more (possibly parent
>> directory and some gifs/png differing between file and directory).
>> Thredds listings of 'datasetScan' look very similar to the tomcat
>> listings, except that they link to the dataset-overview page, and not to
>> the fileServer page.
>>
>> RAMMADDA looks like a solution for a completely different type of users,
>> except for the embedded ftp server.
>>
>> Best regards,
>>
>> Heiko
>>
>>
>> On 2010-05-05 01:28, John Caron wrote:
>>   
>>> Hi Heiko:
>>>
>>> TDS specializes in the logical subsetting of datasets, so we havent
>>> thought much about file downloading.
>>>
>>> The index is provided by THREDDS catalogs, eg
>>>
>>> view-source:http://thredds.met.no/thredds/catalog/data/met.no/ice-drift/catalog.xml
>>>
>>>
>>>
>>> If it was me, I would write a nice little client app to make it easy to
>>> select files and download. Perhaps we will throw one together.
>>>
>>> If  there is some standard format for "index.html" that works with wget
>>> and other clients, perhaps we can provide that.
>>>
>>> Otherwise, RAMMADDA is another good solution.
>>>
>>> John
>>>
>>> On 5/3/2010 3:47 AM, Heiko Klein wrote:
>>>     
>>>> Hi,
>>>>
>>>> we are moving more and more from our ftp-solutions to thredds with http
>>>> and opendap enabled.
>>>>
>>>> Some users complain about this solution, since it is no longer possible
>>>> to download bulk datasets, that is, all files in one directory. Our
>>>> ftp-server supported 'ls' and several ftp-clients have support for that
>>>> so e.g.
>>>> ftp ftp.my.server
>>>> $ cd directory
>>>> $ mget *.nc
>>>> worked well.
>>>>
>>>> There are some http-downloader which support mirroring of a directory
>>>> which would be comparable, but this requires a proper directory-listing
>>>> for the http-download.
>>>>
>>>> An example:
>>>> http://thredds.met.no/thredds/catalog/data/met.no/ice-drift/
>>>> contains daily files of several years. To clicks further
>>>> http://thredds.met.no/thredds/fileServer/data/met.no/ice-drift/ice-drift_ice_drift_nh_polstere-625_multi-oi_200912311200-201001021200.nc
>>>>
>>>>
>>>> is one of those files.
>>>>
>>>> wget -r -l1 --no-parent -A.nc
>>>> 'http://thredds.met.no/thredds/fileServer/data/met.no/ice-drift/'
>>>> was my best try to get all netcdf-files in the ice-drift catalog.
>>>> Unfortunately, this requires a ice-drift/index.html (or
>>>> directory-listing) which doesn't exists.
>>>>
>>>>
>>>> Does anybody knows about a solution to download several (hundred) files
>>>> from a thredds-server in a simple way?
>>>> I even thought about aggregation, but as far as I see, this doesn't
>>>> work
>>>> with the http-downloader, but requires a opendap client (i.e. nco),
>>>> which might be to complicated, and might lead to errors if products
>>>> change of the years (better resolution, updated metadata...)
>>>>
>>>> Best regards,
>>>>
>>>> Heiko
>>>>
>>>> _______________________________________________
>>>> thredds mailing list
>>>> thredds@xxxxxxxxxxxxxxxx
>>>> For list information or to unsubscribe,  visit:
>>>> http://www.unidata.ucar.edu/mailing_lists/
>>>>
>>>>        
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>      
>

Follow-Ups:
- Re: [thredds] How to download bulk datasets?
  - From: Pan, Jerry Yun

References:
- [thredds] How to download bulk datasets?
  - From: Heiko Klein
- Re: [thredds] How to download bulk datasets?
  - From: John Caron
- Re: [thredds] How to download bulk datasets?
  - From: Heiko Klein
- Re: [thredds] How to download bulk datasets?
  - From: John Caron

2010 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: