Re: [thredds] THREDDS Data Server serving from Amazon S3

To: Robert Casey <rob@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [thredds] THREDDS Data Server serving from Amazon S3
From: Jeff McWhirter <jeff.mcwhirter@xxxxxxxxx>
Date: Tue, 14 Jul 2015 14:00:15 -0600

Glacier could be used for storage of all that data that you need to keep
around but rarely if ever access  - e.g., level-0 instrument output, raw
model output,  etc. If your usage model supports this type of latency then
the cost savings (1/10th) are significant

This is where hiding the storage semantics behind a file system breaks
down. The application can't be agnostic of the underlying storage as they
need to support delays in staging data, communicating to the end-user,
caching, etc.

-Jeff



On Tue, Jul 14, 2015 at 1:35 PM, Robert Casey <rob@xxxxxxxxxxxxxxxxxxx>
wrote:

>
> Hi Jeff-
>
> Of note, Amazon Glacier is meant for infrequently needed data, so a
> call-up for data from that source will require something on the order of a
> 5 hour wait to retrieve to S3.  I think they are developing a near-line
> storage solution that is a bit more expensive to compete with Google's new
> near-line storage, which provides retrieval times on the order of seconds.
>
> -Rob
>
> On Jul 14, 2015, at 10:10 AM, Jeff McWhirter <jeff.mcwhirter@xxxxxxxxx>
> wrote:
>
> On this note -
> What I really want is a file system that can transparently manage  data
> between primary (SSD), secondary (S3) and tertiary (Amazon Glacier)
> stores.  Actively used data would migrate into primary storage. Old
> archived data moves off into cheaper tertiary storage. I've thought of
> implementing this at the application level in RAMADDA but a file system
> based approach would be much smarter.
>
> How do the archive folks on this list manage these kinds of storage
> environments?
>
> -Jeff
>
>
>
>
> On Tue, Jul 14, 2015 at 10:44 AM, John Caron <caron@xxxxxxxx> wrote:
>
>> Hi David:
>>
>> At the bottom of the TDM, we rely on RandomAccessFile. Do you know if S3
>> supports that abstraction (essentially posix file semantics, eg seek(),
>> read()) ? My guess is that S3 only allows complete file transfers (?)
>>
>> Would be worth investigating if anyone has implemented a java
>> FileSystemProvider for S3.
>>
>> Will have a closer look when i get time.
>>
>> John
>>
>> On Mon, Jul 13, 2015 at 7:59 PM, David Nahodil <David.Nahodil@xxxxxxxxxxx
>> > wrote:
>>
>>> Hi all,
>>>
>>>
>>> We are looking at moving our THREDDS Data Server to Amazon EC2 instances
>>> with the data hosted on S3. I'm just wondering if anyone has tried using
>>> TDS with data hosted on S3?
>>>
>>>
>>> I had a quick back-and-forth with Sean at Unidata (see below) about this.
>>>
>>>
>>> Regards,
>>>
>>>
>>> David
>>>
>>>
>>> > > Unfortunately, I do not know of anyone who has done this, although
>>> we have had at lease one other person ask. From what I understand, there is
>>> a way to mount an S3 storage as a virtual file system, in which case I
>>> would *think* that the TDS would work as it normally does (depending on the
>>> kind of data you have).
>>>
>>>
>>> > We have considered mounting the S3 storage as a filesystem and running
>>> it like that. However, our feeling was that the tools were not really
>>> production ready and that we're really misrepresenting S3 by pretending it
>>> is a file system. So this is why we're investigating if anyone has used TDS
>>> with the S3 API directly.
>>>
>>>
>>> > > What kind of data do you have? Will your TDS also be in the cloud?
>>> Do you plan on serving the data inside of amazon to other EC2 instances, or
>>> do you plan on crossing the cloud/commodity web boundary with the data, in
>>> which case that could get very expensive quite quickly?
>>>
>>>
>>> > We have about 2 terabytes of marine and climate data that we are
>>> currently serving from our existing infrastructure. The plan is to move the
>>> infrastructure to Amazon Web Services so TDS would be hosted on EC2
>>> machines and the data on S3. We're hoping this setup should work okay, but
>>> we might still have a hurdle or two to come. :)
>>>
>>>
>>> > We have someone here who once wrote a plugin/adapter for TDS to work
>>> with an obscure filesystem that our data used to be stored on. So we have a
>>> little experience in what might be involved in what might be involved for
>>> doing the same with S3. We just wanted to make sure that if anyone had done
>>> some work already that we made use of that.
>>>
>>> > > We very, very recently (as in a day ago) got some Amazon resources
>>> to play around on, but we won't have a chance to kick those tires until
>>> after our training workshops at the end of the month.
>>>
>>>
>>>
>>> University of Tasmania Electronic Communications Policy (December,
>>> 2014).
>>> This email is confidential, and is for the intended recipient only.
>>> Access, disclosure, copying, distribution, or reliance on any of it by
>>> anyone outside the intended recipient organisation is prohibited and may be
>>> a criminal offence. Please delete if obtained in error and email
>>> confirmation to the sender. The views expressed in this email are not
>>> necessarily the views of the University of Tasmania, unless clearly
>>> intended otherwise.
>>>
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>>
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>

Follow-Ups:
- Re: [thredds] THREDDS Data Server serving from Amazon S3
  - From: Nathan Potter
- Re: [thredds] THREDDS Data Server serving from Amazon S3
  - From: Roy Mendelssohn - NOAA Federal

References:
- [thredds] THREDDS Data Server serving from Amazon S3
  - From: David Nahodil
- Re: [thredds] THREDDS Data Server serving from Amazon S3
  - From: John Caron
- Re: [thredds] THREDDS Data Server serving from Amazon S3
  - From: Jeff McWhirter
- Re: [thredds] THREDDS Data Server serving from Amazon S3
  - From: Robert Casey

2015 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: