[netcdf-java] Slow CDMS3 access via netcdf java

I’m writing on behalf of podaac [0] which really wants to move it’s thredds 
server to the AWS cloud.

Our setup is essentially an EC2 instance with a lot of network bandwitdth to 
our S3 datastores. We hope to use Thredds to read from S3 directly. We’ve got 
this up and running and can get some results for very small requests, but we 
noticed any type of large or multifile query essentially takes too long to be 
effective.

Digging down, I’ve constructed a test to essentially do a ‘read’ on a single 
variable from a large file (~720MB).

try{
              long startTime = System.currentTimeMillis();
              NetcdfFile ncfile = 
NetcdfFiles.open("cdms3://ngap-cumulus-uat@aws/podaac-uat-cumulus-protected?MUR-JPL-L4-GLOB-v4.1/20210430090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc");
              long stopTime = System.currentTimeMillis();
              System.out.println("Read header took " +(stopTime - startTime)+ " 
ms");

              startTime = System.currentTimeMillis();
              Variable v = ncfile.findVariable("analysed_sst");
              System.out.println(v.read());
              stopTime = System.currentTimeMillis();
              System.out.println("Read variable took " +(stopTime - startTime)+ 
" ms");

                }catch (Exception e){
                                e.printStackTrace();
                }

This… takes absolutely forever- still waiting on some tests to return but 
they’ve all taken > 20 minutes and I end up closing them trying to determine 
what’s going on. As a comparison, I’m able to read the entire 720MB file using 
the AWS cli in under a minute (around 25MiB/s over my wifi):


time aws s3 cp 
s3://podaac-uat-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20210430090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
 . --profile ngap-service-uat

download: 
s3://podaac-uat-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20210430090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
 to ./20210430090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc



real  0m45.867s

user  0m2.932s

sys   0m1.891s

Is there any control (or insight) that I have over why this is taking so long? 
My only guess to why it takes so long would be: It’s reading small pieces of 
the file serially or even in parallel, but the cost of connect and download is 
so expensive. Is there anyway I can instruct it to simply download/cache the 
entire file? Read much more data in a single request? That would seem faster at 
this rate. Alternatively, speeding up the read in anyway would be a benefit.

Thanks,
Mike

[0] https://podaac.jpl.nasa.gov/


From: netcdf-java <netcdf-java-bounces@xxxxxxxxxxxxxxxx> on behalf of Sean Arms 
<sarms@xxxxxxxx>
Date: Thursday, September 16, 2021 at 12:30 PM
To: "netcdf-java@xxxxxxxxxxxxxxxx" <netcdf-java@xxxxxxxxxxxxxxxx>, 
"thredds@xxxxxxxxxxxxxxxx" <thredds@xxxxxxxxxxxxxxxx>
Subject: [EXTERNAL] [netcdf-java] A farewell message

Dear THREDDS and Netcdf-Java community,

My last day at Unidata will be tomorrow, September 17th, 2021. It was not an 
easy decision, to say the least, but I believe this is the right choice for my 
family and me. It has been my pleasure to serve you over these past ten years.

Unidata will continue to host and support the development of the THREDDS stack. 
Hailey Johnson will be taking over as project lead, and will be reaching out to 
you with some details for future plans for the netCDF-Java library and the TDS. 
The roadmap that Hailey is working on contains many exciting developments for 
the future, and I look forward to watching and helping, in a very limited way, 
these developments move forward as a community contributor.

As always, projects like netCDF-java and the TDS rely upon community 
interactions and contributions to be sustainable. Contributions to the code 
base, documentation, tackling issues, and answering questions on the mailing 
lists are all ways that you can help keep these efforts moving into the future. 
Such efforts will be incredibly helpful during the next several months as 
Hailey continues spinning-up on the THREDDS efforts, and your continued support 
and patience will be greatly appreciated throughout this transition period.

With gratitude and hope for the future,

Sean
  • 2021 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: