Re: [thredds] Catalog example for AWS S3 resource?

  • To: Sean Arms <sarms@xxxxxxxx>
  • Subject: Re: [thredds] Catalog example for AWS S3 resource?
  • From: "H. Joe Lee" <hyoklee@xxxxxxxxxxxx>
  • Date: Tue, 10 Mar 2020 10:49:46 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=hdfgroup.org; dmarc=pass action=none header.from=hdfgroup.org; dkim=pass header.d=hdfgroup.org; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=D0CjCAWNMQepInj7XbMNS+1BnmwoxJiXjbh0pPp/49E=; b=hTqQLZ5I/IgbuwD/U/ITl4GX734bPReK7bUKGGhZoJgkc4JTQWlK8MgBpcSw3TcDfYO9GqUYht0zlW+BAj4YiGzY9VWWEtRm0sBqsYMRZJEJ6XIn3RdGJLCQ3seHQ+k8mxrhptzFqDpgpjbTaEhTibV4NPtL0ki5f0zH6fMSpL/uSKl04l0x+RwX7W8bHrpNGPrs0ul/vpNV4F2tWTDK7j57olbOcYYLfbmjL7Oo8gZK0oSmbhgjJqeOpqdukvQQgFzbO9BPXbI456YDFx9Ac2f6dBncgmWk8AdD+jCtxohKdXeq4380HdT917g39JQ9kvUdhqchQCpgOvLy8V4c9A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L3ge864CkdmIqX08kn0sd2lUzLXPJIwybcKsWiv4OmHculqBuInMXqZ7Nc2wOV2Gx36IrRBH9iZYSK/IqX5TJ9NzgCkXqzvZ5tPvF6pg5xCV92Bg5c7vxyuhCImdRQcckbdVtkyywsKxBcC8FlMB3/7hLbprT82AakykUjSuABeKQXKG/QxzemhyUm/7wBeIIqmDFGJcVwOOFbTg8QWbol3C5s5SBgEcw9UmddqaJAnNR6Sj3RrBIXK51A19AsEkwpNVIeEjTheK0on6sSzowMC1g/uV0HlRZtMtn1oHgAO7/NJyW82MfUzddq8bNzcQ4fdSV+r7Ivlq0yZhaL/n2A==
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=hyoklee@xxxxxxxxxxxx;
  Awesome, Sean!

  I could confirm that both are fixed. Unidata's work is simply amazing
since it's fast enough for a big file
  like TerraFusion and it can also support legacy HDF4.
  HDF4 on S3 support is missing in any other solution that I'm aware of -
  OPeNDAP, Apache Drill, and HDF5 Server all work only for HDF5 on S3.

  Two more questions (or requests if they don't work yet) about THREDDS
catalog example:

  1) Will NcML work on S3? The NcML points to s3:// url in "<netcdf
location=" value.
  2) Can THREDDS scan a S3 path? For example, key value is not a specific
file (e.g., s3://unidata-bucket/testfiles/test.h5)  but a S3 path (e.g.,
s3://unidata-bucket/testfiles/)

  Anyway, thank you so much for the useful new feature, fast response, and
bug fixes!

--
Datafy everything in HDF for faster AI.




On Tue, Mar 10, 2020 at 9:31 AM Sean Arms <sarms@xxxxxxxx> wrote:

> Greetings Joe,
>
> Thank you for the report! I just merged a fix for the
> NegativeArraySizeException issue, and have enabled ToolsUI's NcML tab
> to work with S3 objects (you just need to make sure the "modes ->
> NetcdfFile -> use builders" menu option is checked). Keep in mind this
> is a first pass, and not optimized at all at this point.
>
> Cheers,
>
> Sean
>
>
> On Mon, Mar 9, 2020 at 3:08 PM H. Joe Lee <hyoklee@xxxxxxxxxxxx> wrote:
> >
> >  Thanks, Sean!
> >
> >   Both .war file and sample catalog.xml worked like charm.
> >   For example, I could visualize MOP03T v7 on S3 using Panoply via
> THREDDS OPeNDAP.
> >   Unidata Java team is amazing!
> >
> >   So far, I found two issues though:
> >
> >   1) toolsUI NcML tab doesn't work s3:// URL.
> >   2) It can't open a huge (15G~40G) netCDF-4 file like TerraFusion [1].
> >
> >  Here's the error message that I got when I opened TerraFusion:
> >
> > Error {
> >     code = 500;
> >     message =
> "com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.NegativeArraySizeException";
> > };
> >
> >   Sincerely,
> >
> >
> > [1] https://registry.opendata.aws/terrafusion/
> > --
> > Datafy everything in HDF for faster AI.
> >
> >
> >
> >
> > On Mon, Mar 9, 2020 at 2:43 PM Sean Arms <sarms@xxxxxxxx> wrote:
> >>
> >> Greetings Joe,
> >>
> >> I recently split the netCDF-Java and TDS codebased into their own
> >> repositories, and the repository holding the appropriate TDS code is
> >> located at:
> >>
> >> https://github.com/Unidata/tds
> >>
> >> If you build the current master branch, you'll have everything you
> >> need at this point. The most recent snapshot should work as well:
> >>
> >>
> https://artifacts.unidata.ucar.edu/repository/unidata-snapshots/edu/ucar/tds/5.0.0-SNAPSHOT/tds-5.0.0-20200308.175757-566.war
> >>
> >> (just be sure to rename it to thredds.war before deploying it).
> >>
> >> The sample catalog I added to our integration tests for the TDS can be
> >> found here:
> >>
> >>
> https://github.com/Unidata/tds/blob/master/tds/src/test/content/thredds/tds-s3.xml
> >>
> >> Cheers,
> >>
> >> Sean
> >>
> >>
> >> On Mon, Mar 9, 2020 at 8:39 AM H. Joe Lee <hyoklee@xxxxxxxxxxxx> wrote:
> >> >
> >> >   Thanks, Ethan!
> >> >
> >> >   It's so cool to see toolsUI can access NASA HDF-EOS5 on S3.
> >> > I hope both IDV and Panoply can use the new netCDF-Java soon, too.
> >> >
> >> >   By the way, will the master branch of THREDDS use the latest
> netCDF-java?
> >> > If not, where should I modify in the THREDDS source code to build
> >> >  THREDDS with netCDF-Java snapshot?
> >> >
> >> >   I'm very excited to try the new THREDDS catalog with S3 datasetRoot
> path!
> >> >
> >> > Sincerely,
> >> >
> >> > --
> >> > Datafy everything in HDF for faster AI.
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Mar 4, 2020 at 10:52 AM Ethan Davis <edavis@xxxxxxxx> wrote:
> >> >>
> >> >> Hi Joe,
> >> >>
> >> >> [Sorry for the delayed response.]
> >> >>
> >> >> The S3 work moved to the Unidata/netCDF-java repo in PR #173 ("S3
> Support"). This PR got merged into master a week or so ago and is available
> in the netCDF-Java 5.3.0-SNAPSHOT release (and will be in the upcoming
> 5.3.0 release). The latest TDS code built with netCDF-Java 5.3.0-SNAPSHOT
> can be configured to serve an individual netCDF file stored as an S3 object
> using a datasetRoot configuration, e.g.
> >> >>
> >> >>
> >> >> <?xml version="1.0" encoding="UTF-8"?>
> >> >>
> >> >> <catalog name="Test TDS S3"
> >> >>
> >> >>   xmlns="
> https://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0";
> >> >>
> >> >>   xmlns:xlink="https://www.w3.org/1999/xlink";
> >> >>
> >> >>   xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance";
> >> >>
> >> >>   xsi:schemaLocation="
> https://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
> >> >>
> >> >>
> https://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.6.xsd";>
> >> >>
> >> >>
> >> >>   <datasetRoot path="s3-test" location="s3://noaa-goes16" />
> >> >>
> >> >>
> >> >>   <dataset name="Test GOES-16 S3" ID="testS3Grid"
> >> >>
> >> >>
> urlPath="s3-test/ABI-L1b-RadC/2019/363/21/OR_ABI-L1b-RadC-M6C16_G16_s20193632101189_e20193632103574_
> c20193632104070.nc"
> >> >>
> >> >>            dataType="Grid"/>
> >> >>
> >> >>
> >> >> </catalog>
> >> >>
> >> >>
> >> >> In this case, the datasetRoot location is the bucket name, and the
> urlPath is the datasetRoot path combined with the key. We rely on the AWS
> Java SDK (v2) to handle credentials, setting of region, etc. For now, you
> can set the region by creating a credentials file ~/.aws/credentials that
> looked like:
> >> >>
> >> >>
> >> >> [default]
> >> >>
> >> >> region=us-east-1
> >> >>
> >> >>
> >> >> Which is how netCDF-java knows which region to use for bucket
> access. We may look at other mechanisms to make that a bit more integrated
> into TDS configuration but for now that should work.
> >> >>
> >> >>
> >> >> Once the netCDF 5.3.0 release comes out, TDS snapshot builds will be
> built with this capability. For now, you would need to build the TDS and
> explicitly tell it to build with netCDF-Java 5.3.0-SNAPSHOT.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Ethan
> >> >>
> >> >> On Tue, Feb 4, 2020 at 2:30 PM H. Joe Lee <hyoklee@xxxxxxxxxxxx>
> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>>   Is it possible to serve netCDF data on AWS S3 using THREDDS?
> >> >>>   I think it seems possible based on the S3 feature branch [1].
> >> >>>
> >> >>>   If so, can someone share an example THREDDS catalog configuration?
> >> >>>
> >> >>>   Regards,
> >> >>>
> >> >>> [1] https://github.com/Unidata/thredds/tree/feature/s3+hdfs
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> NOTE: All exchanges posted to Unidata maintained email lists are
> >> >>> recorded in the Unidata inquiry tracking system and made publicly
> >> >>> available through the web.  Users who post to any of the lists we
> >> >>> maintain are reminded to remove any personal information that they
> >> >>> do not want to be made public.
> >> >>>
> >> >>>
> >> >>> thredds mailing list
> >> >>> thredds@xxxxxxxxxxxxxxxx
> >> >>> For list information or to unsubscribe,  visit:
> https://www.unidata.ucar.edu/mailing_lists/
> >> >
> >> > _______________________________________________
> >> > NOTE: All exchanges posted to Unidata maintained email lists are
> >> > recorded in the Unidata inquiry tracking system and made publicly
> >> > available through the web.  Users who post to any of the lists we
> >> > maintain are reminded to remove any personal information that they
> >> > do not want to be made public.
> >> >
> >> >
> >> > thredds mailing list
> >> > thredds@xxxxxxxxxxxxxxxx
> >> > For list information or to unsubscribe,  visit:
> https://www.unidata.ucar.edu/mailing_lists/
>
  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: