[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: nextInt and finalize



Hi Bob:

Bob Simons wrote:
*** Issue #1

Our copy of thredds uses netcdf-2.2.17.jar.

Our copy of thredds crashed yesterday. From the logs, we see it was throwing lots (several within a second) of this error (note "getTypicalDataset"):

2006-10-31T04:46:44.073 -0800 [1799565807][ 3743344] ERROR - dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:326)
    at ucar.nc2.ncml.Aggregation.getTypicalDataset(Aggregation.java:613)
    at ucar.nc2.ncml.Aggregation.aggExistingDimension(Aggregation.java:630)
    at ucar.nc2.ncml.Aggregation.finish(Aggregation.java:418)
    at ucar.nc2.ncml.NcMLReader.readNetcdf(NcMLReader.java:352)
at thredds.servlet.DatasetHandler$NcmlFileFactory.open(DatasetHandler.java:146)
    at ucar.nc2.NetcdfFileCache.acquire(NetcdfFileCache.java:182)
at thredds.servlet.DatasetHandler.getNcmlDataset(DatasetHandler.java:125)
    at thredds.servlet.DatasetHandler.getNetcdfFile(DatasetHandler.java:57)
    at dods.servers.netcdf.NcDODSServlet.getDataset(NcDODSServlet.java:355)
    at dods.servlet.DODSServlet.doGetDAS(DODSServlet.java:492)
    at dods.servlet.DODSServlet.doGet(DODSServlet.java:1451)
    at dods.servers.netcdf.NcDODSServlet.doGet(NcDODSServlet.java:274)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:237)
...



and then this series of errors:

2006-10-31T04:46:44.174 -0800 [1799565908][ 3743364] ERROR - dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.176 -0800 [1799565910][ 3743365] ERROR - dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.178 -0800 [1799565912][ 3743366] ERROR - dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.180 -0800 [1799565914][ 3743368] ERROR - dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.206 -0800 [1799565940][ 3743377] ERROR - dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.207 -0800 [1799565941][ 3743367] ERROR - dods.servers.netcdf.GuardedDatasetImpl - GuardedDatasetImpl close java.io.FileNotFoundException: /u00/sys/opt/jakarta-tomcat-5.0.28/content/thredds/cacheAged/gov.noaa.pfel.coastwatchsatellite-MY-chla-14day (Too many open files)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
    at ucar.nc2.ncml.Aggregation.persistWrite(Aggregation.java:269)
    at ucar.nc2.ncml.Aggregation.persist(Aggregation.java:237)
    at ucar.nc2.dataset.NetcdfDataset.close(NetcdfDataset.java:483)
at dods.servers.netcdf.GuardedDatasetImpl.close(GuardedDatasetImpl.java:68) at dods.servers.netcdf.GuardedDatasetImpl.release(GuardedDatasetImpl.java:63)
    at dods.servlet.DODSServlet.doGetDAS(DODSServlet.java:504)
    at dods.servlet.DODSServlet.doGet(DODSServlet.java:1451)
    at dods.servers.netcdf.NcDODSServlet.doGet(NcDODSServlet.java:274)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:237)



At that point, our thredds server became unstable and generally wasn't working.

In tracking down the problem, I note that your code in netcdf-2.2.17.jar for ucar.nc2.ncml.Aggregation has the getTypicalDataset method:

protected Dataset getTypicalDataset() throws IOException {
    //if (typical != null)
    //  return typical;

    int n = nestedDatasets.size();
    if (n == 0) return null;
    // pick a random one, but not the last
    int select = (n < 2) ? 0 : Math.abs(new Random().nextInt()) % (n-1);
    return (Dataset) nestedDatasets.get(select);
  }

I don't know if that is the most recent code. It is from the copy of netcdf-1.1.17.jar downloaded from your website today. The line numbers of the error messages don't line up with the line numbers in the .java files.

we are working to allow svn access soon, and you can then check out the version 
that matches your jar.


The code for "int select =" looks like it should work, and always return a value of 0, or 0 to n-1. But the error message at the very top
(java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:326)
    at ucar.nc2.ncml.Aggregation.getTypicalDataset(Aggregation.java:613)
)
indicates that select is probably being set to -1.

I note that Math.abs(Integer.MIN_VALUE) returns a negative number (-2147483648). I think that your code to generate 'select' then generates a negative number
  Math.abs(new Random().nextInt()) % (n-1)
For example, if n = 4, the result is -2;
But I couldn't find any n which would generate a select value of -1 so I can't explain the reference to -1 in the error message. So perhaps this isn't the problem, but it is suspicious.

As a solution:
I think your use of
  Math.abs(new Random().nextInt()) % (n-1)
is not recommended practice (see http://java.sun.com/developer/JDCTechTips/2001/tt0925.html). Instead, you can use
  new Random().nextInt(n - 1)
which is simpler, never generates a negative number, and generates a more random random number.

that is definitely a bug, and one i wouldnt have found. much thanks

apparently it returns in range [0,n), so i changed to new Random().nextInt(n).



**** Issue #2
The exception which actually seems to cause thredds to fail is "(Too many open files)" (see above) which follows right after the other errors.

I suspect, but can't prove, that this is related to the problem I mentioned before about your code for NetcdfFile (and related File classes) not using a finalize method to ensure that the underlying File object is closed. I don't know if you have changed your code to use a finalize method. Your last email on the subject (8/22/2006 3:35 PM) didn't say if you were or weren't going to do it. Could you please do it? It seems like good insurance.

yes, ive decided it is worth doing, and will be in the next release.

there is probably a deeper problem, as to where the files are not getting 
closed, but i agree this is good insurance.

im going to try to fix a few more bugs, ill send you an email when i have a new 
version.

thanks again



*****

I could be wrong about these things, but that is my best guess.

Thanks for looking into this.



Sincerely,

Bob Simons
Satellite Data Product Manager
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
(831)648-0272
address@hidden
<>< <>< <>< <>< <>< <>< <>< <>< <><