Re: [thredds] Catalog example for AWS S3 resource?

  • To: "H. Joe Lee" <hyoklee@xxxxxxxxxxxx>
  • Subject: Re: [thredds] Catalog example for AWS S3 resource?
  • From: Sean Arms <sarms@xxxxxxxx>
  • Date: Tue, 17 Mar 2020 17:34:18 -0600
Greetings Joe!

Sorry for the delay.

>   Two more questions (or requests if they don't work yet) about THREDDS 
> catalog example:
>
>   1) Will NcML work on S3? The NcML points to s3:// url in "<netcdf 
> location=" value.
>   2) Can THREDDS scan a S3 path? For example, key value is not a specific 
> file (e.g., s3://unidata-bucket/testfiles/test.h5)  but a S3 path (e.g., 
> s3://unidata-bucket/testfiles/)

Aggregations (NcML or otherwise) and scans do not work on S3 just yet.
I need to write some code to be able to scan a bucket to make sense of
what granules are there. That can be tricky to do performance-wise,
and tracking updates will be perhaps more tricky. The naive approach
would be to scan the entire bucket each time the aggregation is built
(and try to be good about caching where we can). While that works
decently ok for small collections on disk (<100 files or so), I don't
think it will do well on S3...but that's what I'll do as a first pass.
Maybe I'll be surprised, but I doubt it.

Cheers!

Sean

>
>   Anyway, thank you so much for the useful new feature, fast response, and 
> bug fixes!
>
> --
> Datafy everything in HDF for faster AI.
>
>
>
>
> On Tue, Mar 10, 2020 at 9:31 AM Sean Arms <sarms@xxxxxxxx> wrote:
>>
>> Greetings Joe,
>>
>> Thank you for the report! I just merged a fix for the
>> NegativeArraySizeException issue, and have enabled ToolsUI's NcML tab
>> to work with S3 objects (you just need to make sure the "modes ->
>> NetcdfFile -> use builders" menu option is checked). Keep in mind this
>> is a first pass, and not optimized at all at this point.
>>
>> Cheers,
>>
>> Sean
>>
>>
>> On Mon, Mar 9, 2020 at 3:08 PM H. Joe Lee <hyoklee@xxxxxxxxxxxx> wrote:
>> >
>> >  Thanks, Sean!
>> >
>> >   Both .war file and sample catalog.xml worked like charm.
>> >   For example, I could visualize MOP03T v7 on S3 using Panoply via THREDDS 
>> > OPeNDAP.
>> >   Unidata Java team is amazing!
>> >
>> >   So far, I found two issues though:
>> >
>> >   1) toolsUI NcML tab doesn't work s3:// URL.
>> >   2) It can't open a huge (15G~40G) netCDF-4 file like TerraFusion [1] .
>> >
>> >  Here's the error message that I got when I opened TerraFusion:
>> >
>> > Error {
>> >     code = 500;
>> >     message = 
>> > "com.google.common.util.concurrent.UncheckedExecutionException: 
>> > java.lang.NegativeArraySizeException";
>> > };
>> >
>> >   Sincerely,
>> >
>> >
>> > [1] https://registry.opendata.aws/terrafusion/
>> > --
>> > Datafy everything in HDF for faster AI.
>> >
>> >
>> >
>> >
>> > On Mon, Mar 9, 2020 at 2:43 PM Sean Arms <sarms@xxxxxxxx> wrote:
>> >>
>> >> Greetings Joe,
>> >>
>> >> I recently split the netCDF-Java and TDS codebased into their own
>> >> repositories, and the repository holding the appropriate TDS code is
>> >> located at:
>> >>
>> >> https://github.com/Unidata/tds
>> >>
>> >> If you build the current master branch, you'll have everything you
>> >> need at this point. The most recent snapshot should work as well:
>> >>
>> >> https://artifacts.unidata.ucar.edu/repository/unidata-snapshots/edu/ucar/tds/5.0.0-SNAPSHOT/tds-5.0.0-20200308.175757-566.war
>> >>
>> >> (just be sure to rename it to thredds.war before deploying it).
>> >>
>> >> The sample catalog I added to our integration tests for the TDS can be
>> >> found here:
>> >>
>> >> https://github.com/Unidata/tds/blob/master/tds/src/test/content/thredds/tds-s3.xml
>> >>
>> >> Cheers,
>> >>
>> >> Sean
>> >>
>> >>
>> >> On Mon, Mar 9, 2020 at 8:39 AM H. Joe Lee <hyoklee@xxxxxxxxxxxx> wrote:
>> >> >
>> >> >   Thanks, Ethan!
>> >> >
>> >> >   It's so cool to see toolsUI can access NASA HDF-EOS5 on S3.
>> >> > I hope both IDV and Panoply can use the new netCDF-Java soon, too.
>> >> >
>> >> >   By the way, will the master branch of THREDDS use the latest 
>> >> > netCDF-java?
>> >> > If not, where should I modify in the THREDDS source code to build
>> >> >  THREDDS with netCDF-Java snapshot?
>> >> >
>> >> >   I'm very excited to try the new THREDDS catalog with S3 datasetRoot 
>> >> > path!
>> >> >
>> >> > Sincerely,
>> >> >
>> >> > --
>> >> > Datafy everything in HDF for faster AI.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Mar 4, 2020 at 10:52 AM Ethan Davis <edavis@xxxxxxxx> wrote:
>> >> >>
>> >> >> Hi Joe,
>> >> >>
>> >> >> [Sorry for the delayed response.]
>> >> >>
>> >> >> The S3 work moved to the Unidata/netCDF-java repo in PR #173 ("S3 
>> >> >> Support"). This PR got merged into master a week or so ago and is 
>> >> >> available in the netCDF-Java 5.3.0-SNAPSHOT release (and will be in 
>> >> >> the upcoming 5.3.0 release). The latest TDS code built with 
>> >> >> netCDF-Java 5.3.0-SNAPSHOT can be configured to serve an individual 
>> >> >> netCDF file stored as an S3 object using a datasetRoot configuration, 
>> >> >> e.g.
>> >> >>
>> >> >>
>> >> >> <?xml version="1.0" encoding="UTF-8"?>
>> >> >>
>> >> >> <catalog name="Test TDS S3"
>> >> >>
>> >> >>   
>> >> >> xmlns="https://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0";
>> >> >>
>> >> >>   xmlns:xlink="https://www.w3.org/1999/xlink";
>> >> >>
>> >> >>   xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance";
>> >> >>
>> >> >>   
>> >> >> xsi:schemaLocation="https://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
>> >> >>
>> >> >>     https://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.6.xsd";>
>> >> >>
>> >> >>
>> >> >>   <datasetRoot path="s3-test" location="s3://noaa-goes16" />
>> >> >>
>> >> >>
>> >> >>   <dataset name="Test GOES-16 S3" ID="testS3Grid"
>> >> >>
>> >> >>      
>> >> >> urlPath="s3-test/ABI-L1b-RadC/2019/363/21/OR_ABI-L1b-RadC-M6C16_G16_s20193632101189_e20193632103574_c20193632104070.nc"
>> >> >>
>> >> >>            dataType="Grid"/>
>> >> >>
>> >> >>
>> >> >> </catalog>
>> >> >>
>> >> >>
>> >> >> In this case, the datasetRoot location is the bucket name, and the 
>> >> >> urlPath is the datasetRoot path combined with the key. We rely on the 
>> >> >> AWS Java SDK (v2) to handle credentials, setting of region, etc. For 
>> >> >> now, you can set the region by creating a credentials file 
>> >> >> ~/.aws/credentials that looked like:
>> >> >>
>> >> >>
>> >> >> [default]
>> >> >>
>> >> >> region=us-east-1
>> >> >>
>> >> >>
>> >> >> Which is how netCDF-java knows which region to use for bucket access. 
>> >> >> We may look at other mechanisms to make that a bit more integrated 
>> >> >> into TDS configuration but for now that should work.
>> >> >>
>> >> >>
>> >> >> Once the netCDF 5.3.0 release comes out, TDS snapshot builds will be 
>> >> >> built with this capability. For now, you would need to build the TDS 
>> >> >> and explicitly tell it to build with netCDF-Java 5.3.0-SNAPSHOT.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Ethan
>> >> >>
>> >> >> On Tue, Feb 4, 2020 at 2:30 PM H. Joe Lee <hyoklee@xxxxxxxxxxxx> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>>
>> >> >>>   Is it possible to serve netCDF data on AWS S3 using THREDDS?
>> >> >>>   I think it seems possible based on the S3 feature branch [1].
>> >> >>>
>> >> >>>   If so, can someone share an example THREDDS catalog configuration?
>> >> >>>
>> >> >>>   Regards,
>> >> >>>
>> >> >>> [1] https://github.com/Unidata/thredds/tree/feature/s3+hdfs
>> >> >>>
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> NOTE: All exchanges posted to Unidata maintained email lists are
>> >> >>> recorded in the Unidata inquiry tracking system and made publicly
>> >> >>> available through the web.  Users who post to any of the lists we
>> >> >>> maintain are reminded to remove any personal information that they
>> >> >>> do not want to be made public.
>> >> >>>
>> >> >>>
>> >> >>> thredds mailing list
>> >> >>> thredds@xxxxxxxxxxxxxxxx
>> >> >>> For list information or to unsubscribe,  visit: 
>> >> >>> https://www.unidata.ucar.edu/mailing_lists/
>> >> >
>> >> > _______________________________________________
>> >> > NOTE: All exchanges posted to Unidata maintained email lists are
>> >> > recorded in the Unidata inquiry tracking system and made publicly
>> >> > available through the web.  Users who post to any of the lists we
>> >> > maintain are reminded to remove any personal information that they
>> >> > do not want to be made public.
>> >> >
>> >> >
>> >> > thredds mailing list
>> >> > thredds@xxxxxxxxxxxxxxxx
>> >> > For list information or to unsubscribe,  visit: 
>> >> > https://www.unidata.ucar.edu/mailing_lists/


  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: