Re: Thredds out of memory

Tennessee Leeuwenburg wrote:

I turned the java heap size up to 1024m, and it was able to handle my 580Mb file.

Here's another question about the internals :

As some of you know, I have written a servlet which serves up NetCDF files via HTTP, retrieved and converted from a database. This works great for small data sets (<60Mb) but is behaving strangely for the large (580Mb) dataset.

Serving the file through apache, it takes, oh, a few minutes to get the DODS file onto my hard disk. Say 10 minutes, and that would be plenty.

so if i understand, you have a client that requests the file via HTTP, then just copies it to a file on disk ?


The servlet seems to take much longer. In terms of raw throughput when downloading from HTTP via Firefox, I get about 1.8Mb/s from apache, vs about 1.5 from my servlet. That's not a *huge* difference, and it's probably related to window size or something.

When I connect THREDDS to apache, there is a latency while the file is downloaded from apache, followed by throughput of about 1Mb/s and a slight reduction in file size.


When I connect THREDDS to my servlet, the initial latency is at least 10 minutes (he says waiting for the download to start). I found this a little weird, so I included some debugging in my servlet so I could watch the contents of each packet. I'm serving the data in 8192byte chunks, possible not the quickest way to go about it. What I see is a generally increasing byte range being served, but occasionally, bytes from earlier in the file are served. This seems a bit weird to me. I guess thredds is "going back" and looking things up in order to re-factor the data structure, but I want to make sure this is expected behaviour and that nothing nuts is going on.

what do you mean "connect THREDDS to apache" or" my servlet" ? The THREDDS data viewer?

generally a netcdf client like the thredds data viewer will treat the file as random access, and so may skip around in the file. if all you do is read the file sequentially, HTTP is ok. but for random access it can be really slow. Opendap is much better in this case.


I am trying to work out how to redress the situation. One easy thing to test is to vary the window size to a much larger number, say 500Kb or even megabytes. I could possibly alter this on the basis of the file size, or try to come up with some dynamic regime for altering the window size.

depends on your data access pattern.


Is there a "magic number" in thredds which is a best window size to use? Would it "prefer" to get its data in any particular way? Thredds is basically the only client for this servlet, so I will just tune it for best performance.

what do you mean by "window size" ?


Or maybe it's just some inefficiency in java's random-access - if it's a separate request every time, maybe there's even a new instance handling each request and I'm getting bogged down in object creation. Now there's a thought! If that's the case, I'll have to implement some kind of static object containing the currently open files to avoid re-opening them...

Feedback welcome. Sorry to abuse the list for hair-brained developer questions. Maybe one day I'll be able to do something useful for you...

Download still waiting...

Cheers,
-Tennessee