Re: [thredds] How to download bulk datasets?

I too wish there is a simple link on the server side to allow our users to use 
wget for downloading all files of a collection. 

I think it might be the case that every user would want to do a bulk file 
download from a thredds server, and I think it might be the case that a 
provider should have the burden to have a simple link for bulk download all 
granules. 

We can do some configurations or add a servlet for this, like Heiko has done, 
although I think it would be a nice to have feature directly from TDS software. 
It would seem to me that this is can be implemented as a dynamic URL at 
collection level to return a list of HTTP download URLs of the files 
('fielServer') for the simplest case.

Comments?

Thanks,
-Jerry

> -----Original Message-----
> From: thredds-bounces@xxxxxxxxxxxxxxxx 
> [mailto:thredds-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Heiko Klein
> Sent: Monday, May 10, 2010 5:18 AM
> To: John Caron
> Cc: thredds@xxxxxxxxxxxxxxxx
> Subject: Re: [thredds] How to download bulk datasets?
> 
> Hi John,
> 
> I played a bit more with the catalog.xml. This works well 
> with wget. I managed now to download all the netcdf-files 
> from a directory:
> 
> wget -nc -r -l2 -A.nc   -I /thredds/fileServer/,/thredds/catalog/
> 'http://dev-vm188/thredds/catalog/osisaf/met.no/ice/'
> 
> I use here the existing datasetScan catalog.xml file, and 
> fetch all nc-files up to two links away. Beside the nc-file, 
> I get the catalog-file of the nc-file (e.g.
> http://dev-vm188/thredds/catalog/osisaf/met.no/ice/catalog.htm
> l?dataset=met.no/ice/ice_conc_nh_200911261200_CF.nc),
> too.
> 
> A catalog-file in the fileServer would be saver, since the 
> 2-levels (parent and child) might include other information, 
> but at least I can offer our users something already now.
> 
> 
> Best regards,
> 
> Heiko
> 
> On 2010-05-06 21:31, John Caron wrote:
> > Hi Heiko:
> > 
> > We use catalog.xml exactly because theres no standard html 
> index format.
> > A simple java GUI app could make this easy to do, but Im 
> not clear if 
> > that would help your case.
> > 
> > John
> > 
> > On 5/6/2010 3:16 AM, Heiko Klein wrote:
> >> Hi John,
> >>
> >> I don't think there is a standard format for directory 
> index / listings.
> >> Looking at the different implementations (Tomcat (DefaultServler, 
> >> listing = true), Jetty (dirAllowed = true), Apache (mod_dir,
> >> DirectoryIndex)) the common pattern is, that they all have 
> links to 
> >> all
> >> (non-hidden) files in the directory, and not much more (possibly 
> >> parent directory and some gifs/png differing between file 
> and directory).
> >> Thredds listings of 'datasetScan' look very similar to the tomcat 
> >> listings, except that they link to the dataset-overview 
> page, and not 
> >> to the fileServer page.
> >>
> >> RAMMADDA looks like a solution for a completely different type of 
> >> users, except for the embedded ftp server.
> >>
> >> Best regards,
> >>
> >> Heiko
> >>
> >>
> >> On 2010-05-05 01:28, John Caron wrote:
> >>   
> >>> Hi Heiko:
> >>>
> >>> TDS specializes in the logical subsetting of datasets, so 
> we havent 
> >>> thought much about file downloading.
> >>>
> >>> The index is provided by THREDDS catalogs, eg
> >>>
> >>> 
> view-source:http://thredds.met.no/thredds/catalog/data/met.no/ice-dr
> >>> ift/catalog.xml
> >>>
> >>>
> >>>
> >>> If it was me, I would write a nice little client app to 
> make it easy 
> >>> to select files and download. Perhaps we will throw one together.
> >>>
> >>> If  there is some standard format for "index.html" that 
> works with 
> >>> wget and other clients, perhaps we can provide that.
> >>>
> >>> Otherwise, RAMMADDA is another good solution.
> >>>
> >>> John
> >>>
> >>> On 5/3/2010 3:47 AM, Heiko Klein wrote:
> >>>     
> >>>> Hi,
> >>>>
> >>>> we are moving more and more from our ftp-solutions to 
> thredds with 
> >>>> http and opendap enabled.
> >>>>
> >>>> Some users complain about this solution, since it is no longer 
> >>>> possible to download bulk datasets, that is, all files in one 
> >>>> directory. Our ftp-server supported 'ls' and several ftp-clients 
> >>>> have support for that so e.g.
> >>>> ftp ftp.my.server
> >>>> $ cd directory
> >>>> $ mget *.nc
> >>>> worked well.
> >>>>
> >>>> There are some http-downloader which support mirroring of a 
> >>>> directory which would be comparable, but this requires a proper 
> >>>> directory-listing for the http-download.
> >>>>
> >>>> An example:
> >>>> http://thredds.met.no/thredds/catalog/data/met.no/ice-drift/
> >>>> contains daily files of several years. To clicks further 
> >>>> 
> http://thredds.met.no/thredds/fileServer/data/met.no/ice-drift/ice-
> >>>> 
> drift_ice_drift_nh_polstere-625_multi-oi_200912311200-201001021200.
> >>>> nc
> >>>>
> >>>>
> >>>> is one of those files.
> >>>>
> >>>> wget -r -l1 --no-parent -A.nc
> >>>> 'http://thredds.met.no/thredds/fileServer/data/met.no/ice-drift/'
> >>>> was my best try to get all netcdf-files in the ice-drift catalog.
> >>>> Unfortunately, this requires a ice-drift/index.html (or
> >>>> directory-listing) which doesn't exists.
> >>>>
> >>>>
> >>>> Does anybody knows about a solution to download several 
> (hundred) 
> >>>> files from a thredds-server in a simple way?
> >>>> I even thought about aggregation, but as far as I see, 
> this doesn't 
> >>>> work with the http-downloader, but requires a opendap 
> client (i.e. 
> >>>> nco), which might be to complicated, and might lead to errors if 
> >>>> products change of the years (better resolution, updated 
> >>>> metadata...)
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Heiko
> >>>>
> >>>> _______________________________________________
> >>>> thredds mailing list
> >>>> thredds@xxxxxxxxxxxxxxxx
> >>>> For list information or to unsubscribe,  visit:
> >>>> http://www.unidata.ucar.edu/mailing_lists/
> >>>>
> >>>>        
> >>> _______________________________________________
> >>> thredds mailing list
> >>> thredds@xxxxxxxxxxxxxxxx
> >>> For list information or to unsubscribe,  visit:
> >>> http://www.unidata.ucar.edu/mailing_lists/
> >>>      
> > 
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 
> 


  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: