Re: [bww-users] Community-managed cloud storage services

NOTE: The bww-users mailing list is no longer active. The list archives are made available for historical reasons.

  • To: Mohan Ramamurthy <mohan@xxxxxxxx>
  • Subject: Re: [bww-users] Community-managed cloud storage services
  • From: Carlos Maltzahn <carlosm@xxxxxxxxxxxx>
  • Date: Mon, 15 Feb 2016 22:21:30 -0800
Thanks for the response, Mohan,

Here is what I mean by “community-managed”:

> “community-managed” means that the cost of the cloud storage service as well 
> as its usage is managed by an institution serving a (scientific) community, 
> including very large communities such as earth sciences or smaller ones such 
> as numerical weather prediction

I’m looking for governance models that increase the probability that valuable 
data sets stored on commercially provided cloud storage remain available.

Let’s take the BWW ensemble as an example and assume every one of the ensemble 
collaborators is uploading their ensemble contribution to, say, Amazon S3. 
Given the amount of data we expect ensemble collaborators to contribute per 
day, this is easy to accomplish and easily scales up to a large number of 
collaborators. But who pays for it? 

I can think of three models:

Contributor C pays. If C stops paying, the data set contributed by C disappears.
Contributor C pays. If contributor C stops paying, the data set contributed by 
C remains available because payments to Amazon are now charged to a 
community-funded pool. Availability remains until the pool runs out of money.
Contributor C pays. If contributor C stops paying, the data set contributed by 
C remains available because payments to Amazon are now charged to a pool funded 
by one or more users of the data set. Availability remains until the pool runs 
out of money. 

The second alternative has the advantage that the data remains available — 
until the pool runs out of money if it is not well managed. Discussing each 
case with members and community participants is a good way to start; but there 
might be many data sets, many authors, many users, and multiple commercial 
offerings. Eventually we will need a governance model with enforcable rules of 
how the pool gets financed and what data sets to select for preservation.

The third alternative involves data set usage metering and potentially many 
pools instead of one: users are billed in proportion of data set usage once the 
author stops paying for it. If there are no users, the data disappears. Usage 
metering comes for almost free once scientific workflows are fully integrated 
with distributed versioning infrastructures like git and github. This model 
might be less flexible than the second one, allows for more automation, but 
probably can be gamed.

I’m sure there exist many other models. I think it’s worthwhile to discuss 
these in this forum (and maybe even try out some) because I believe these 
models are key to how the Big Weather Web can leverage commercial cloud storage 
and other services and ultimately empower small scientific communities as well 
as large ones.

Carlos

> On Feb 14, 2016, at 5:52 AM, Mohan Ramamurthy <mohan@xxxxxxxx> wrote:
> 
> It is true that OCC is operating its own infrastructure and not leveraging 
> commercial cloud infrastructure. Not sure what you mean by community-managed, 
> but the projects that are hosted and the decisions on services, data sets, 
> etc. are based on discussions with members and community participants.
> 
> Mohan
> 
> On 2/13/16 9:09 PM, Carlos Maltzahn wrote:
>> Reading the About page <http://occ-data.org/about/>:
>> 
>>> To better understand our role, it is helpful to divide projects in these 
>>> areas into three groups:
>>> Individual researchers and small projects typically do not need much 
>>> computing infrastructure and can either operate their own or use a public 
>>> cloud service provider such as Amazon.
>>> The OCC is designed to serve medium to large size research projects by 
>>> managing and operating a cloud computing infrastructure that can be shared 
>>> across these projects.
>>> Very large research projects, such as the LHC, the LSST, and the OOI, 
>>> typically develop their own dedicated computing infrastructure.
>> It sounds like OCC is building their own cloud infrastructure instead of 
>> leveraging commercial cloud providers. Also, the membership benefits 
>> <http://occ-data.org/images/occ-fees-2016.pdf> do not include actual usage 
>> of the infrastructure and is more about participating in a standardization 
>> effort.
>> 
>> I’m interested in models of "community-managed” cloud storage services where 
>> the management involves cost and usage (as opposed to operating the hardware 
>> infrastructure). But I couldn’t find anything on the OCC web site that 
>> addresses that. 
>> 
>> Or do I miss something? 
>> 
>> Carlos
>> 
>>> On Feb 11, 2016, at 2:49 PM, Mohan Ramamurthy <mohan@xxxxxxxx 
>>> <mailto:mohan@xxxxxxxx>> wrote:
>>> 
>>> On 2/11/16 3:44 PM, Scott Collis wrote:
>>>> So is this along the same lines as AWS S3?
>>>> 
>>> Yes.
>>>> Does it still rely on a download and compute framework?
>>>> 
>>>> 
>>> At the moment, this is true but we are working to working to develop 
>>> data-proximate, server-side processing/analysis capabilities by moving our 
>>> wares (and client tools) to the cloud, and through the development and 
>>> implementation of DAP4 protocol that supports asynchronous computing 
>>> capabilities.
>>> 
>>> Mohan
>>>>> Mohan Ramamurthy <mailto:mohan@xxxxxxxx>   February 11, 2016 at 4:42 PM
>>>>> Carlos,
>>>>> 
>>>>> Unidata is working with Open Commons Consortium ( 
>>>>> <http://occ-data.org/>http://occ-data.org/ <http://occ-data.org/>), which 
>>>>> provides "community-managed" cloud storage and computing services. At the 
>>>>> moment, Unidata's collaboration with OCC is focused on the NOAA Big Data 
>>>>> project, but we expect that to grow beyond the scope of that project.
>>>>> 
>>>>> Mohan
>>>>> 
>>>>> On 2/11/16 2:34 PM, Carlos Maltzahn wrote:
>>>>> 
>>>>> _______________________________________________
>>>>> bww-users mailing list
>>>>> bww-users@xxxxxxxxxxxxxxxx <mailto:bww-users@xxxxxxxxxxxxxxxx>
>>>>> For list information, to unsubscribe, or change your membership options, 
>>>>> visit:  
>>>>> <http://www.unidata.ucar.edu/mailing_lists/>http://www.unidata.ucar.edu/mailing_lists/
>>>>>  <http://www.unidata.ucar.edu/mailing_lists/>
>>>>> Carlos Maltzahn <mailto:carlosm@xxxxxxxx>   February 11, 2016 at 4:34 PM
>>>>> All,
>>>>> 
>>>>> This is a request for examples of community-managed cloud storage 
>>>>> services where
>>>>> 
>>>>> “community-managed” means that the cost of the cloud storage service as 
>>>>> well as its usage is managed by an institution serving a (scientific 
>>>>> community), including very large communities such as earth sciences or 
>>>>> smaller ones such as numerical weather prediction, and
>>>>> “cloud storage services” are commercial, highly available “pay-as-you-go” 
>>>>> services that provide safe and economic storage of large amounts of data 
>>>>> and allow global sharing of that data controlled by the party who pays, 
>>>>> but disappear as soon as payment for these services stop.
>>>>> 
>>>>> Today commercial cloud storage services are readily available and 
>>>>> successfully hide the many technical challenges of highly available 
>>>>> long-term storage at very attractive cost. Cloud storage also provides an 
>>>>> excellent platform for naming and sharing large (and small) data sets 
>>>>> which is essential for collaboration and reproducibility in 
>>>>> data-intensive scientific disciplines. Yet science communities are slow 
>>>>> to adopt cloud storage. There are probably many reasons for that but one 
>>>>> that I repeatedly came across: the data stored in cloud storage 
>>>>> disappears when funding for the service runs out. 
>>>>> 
>>>>> If the availability of a particular data set depends on a single 
>>>>> community member's availability of funding, the likelihood of loosing 
>>>>> data can be quite high and makes cloud storage too brittle for a reliable 
>>>>> medium for scientific data. A better approach might be to make the 
>>>>> availability of all data sets depend on the availability of funding 
>>>>> within an entire community. Such an arrangement would benefit that 
>>>>> community by facilitating data sharing, collaboration, and maintaining 
>>>>> greater reproducibility of scientific results. 
>>>>> 
>>>>> But community-funded cloud storage has all the management challenges of a 
>>>>> commons. For example, how should the storage space be governed? How much 
>>>>> money should the community spend on cloud storage? How is the money 
>>>>> raised among the members of the community? How do communities prevent The 
>>>>> Tragedy of the Commons 
>>>>> <https://en.wikipedia.org/wiki/Tragedy_of_the_commons>?
>>>>> 
>>>>> Please let me know of any examples you are aware of. Who is working on 
>>>>> this? Do examples exist with somewhat different definitions of 
>>>>> "community-managed" or "cloud storage services”?
>>>>> 
>>>>> Thanks,
>>>>> Carlos
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Carlos Maltzahn
>>>>> Adjunct Professor 
>>>>> Computer Science Department
>>>>> University of California, Santa Cruz      
>>>>>  
>>>>> <http://users.soe.ucsc.edu/%7Ecarlosm/>http://users.soe.ucsc.edu/~carlosm/
>>>>>  <http://users.soe.ucsc.edu/~carlosm/>
>>>>> _______________________________________________
>>>>> bww-users mailing list
>>>>> bww-users@xxxxxxxxxxxxxxxx <mailto:bww-users@xxxxxxxxxxxxxxxx>
>>>>> For list information, to unsubscribe, or change your membership options, 
>>>>> visit:  
>>>>> <http://www.unidata.ucar.edu/mailing_lists/>http://www.unidata.ucar.edu/mailing_lists/
>>>>>  <http://www.unidata.ucar.edu/mailing_lists/>
>>>> 
>>> 
>>> _______________________________________________
>>> bww-users mailing list
>>> bww-users@xxxxxxxxxxxxxxxx <mailto:bww-users@xxxxxxxxxxxxxxxx>
>>> For list information, to unsubscribe, or change your membership options, 
>>> visit:  
>>> <http://www.unidata.ucar.edu/mailing_lists/>http://www.unidata.ucar.edu/mailing_lists/
>>>  <http://www.unidata.ucar.edu/mailing_lists/>
>> 
> 
> _______________________________________________
> bww-users mailing list
> bww-users@xxxxxxxxxxxxxxxx
> For list information, to unsubscribe, or change your membership options, 
> visit: http://www.unidata.ucar.edu/mailing_lists/

  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the bww-users archives: