Re: Thredds out of memory

_From owner-netcdf-java@xxxxxxxxxxxxxxxx Wed Mar 30 22:18:44 2005
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id j2V5GBwi017694
        for netcdf-java-out; Wed, 30 Mar 2005 22:16:11 -0700 (MST)
Received: from gale.ho.bom.gov.au ([134.178.6.76])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id j2V5FTv2017249;
        Wed, 30 Mar 2005 22:15:30 -0700 (MST)
Organization: UCAR/Unidata
Keywords: 200503310515.j2V5FTv2017249
Received: from [134.178.2.87] (kahless.ho.bom.gov.au [134.178.2.87])
        by gale.ho.bom.gov.au (8.9.3 (PHNE_29774)/8.9.3) with ESMTP id FAA24417;
        Thu, 31 Mar 2005 05:14:28 GMT
User-Agent: Mozilla Thunderbird 1.0 (X11/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
CC: support-netcdf-java@xxxxxxxxxxxxxxxx,
        netcdf-java <netcdf-java@xxxxxxxxxxxxxxxx>
References: <42423D15.3080204@xxxxxxxxxx> <4242E000.4020807@xxxxxxxxxxxxxxxx> 
<42489127.3080901@xxxxxxxxxx> <4249A930.5040404@xxxxxxxxxxxxxxxx> 
<424A5BF6.1070807@xxxxxxxxxx> <424ADF41.4020006@xxxxxxxxxxxxxxxx>
X-Enigmail-Version: 0.90.0.0
X-Enigmail-Supports: pgp-inline, pgp-mime
Content-Type: multipart/mixed;
 boundary="------------070200080302040308050304"
Sender: owner-netcdf-java@xxxxxxxxxxxxxxxx
Precedence: bulk

This is a multi-part message in MIME format.
--------------070200080302040308050304
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Not quite. I have something called a MARS database. This is an 
object-oriented database, whose only access is via a compiled C program, 
and which does not support any kind of network request.

I wrote a Java servlet which parses the URL, extracts query information, 
runs the query against the database, converts the resulting GRIB file to 
NetCDF, and serves that back to the user. In this case, Thredds.

This, to the client, is (should be) invisible compared with requesting a 
NetCDF file via HTTP from any source, such as a web server like apache.

Thredds is capable of sourcing its data from HTTP sources as opposed to 
files on the local disk. My configuration file looks a little something 
like this :

<!DOCTYPE catalog SYSTEM 
"http://www.unidata.ucar.edu/projects/THREDDS/xml/AggServerCatalog.dtd";>
<catalog name="THREDDS - DODS Aggregation Server Catalog" version="0.6" 
xmlns="http://www.unidata.ucar.edu/thredds"; 
xmlns:xlink="http://www.w3.org/1999/xlink";>
    <dataset name="Top-Level Dataset" dataType="Grid" serviceName="this">
        <service name="this" serviceType="DODS" base=""/>

        <service name="apache" serviceType="NetCDF" 
base="http://kahless.ho.bom.gov.au/"/>
        <service name="marslet" serviceType="NetCDF" 
base="http://kahless.ho.bom.gov.au:8080/marslet/"/>

        <dataset name="Large Internal Marslet" serviceName="this">
            <property name="internalService" value="marslet"/>
            <dataset name="Surface Data" urlPath="verylarge.nc"/>
        </dataset>
       
        <dataset name="Large Internal Apache" serviceName="this">
            <property name="internalService" value="apache"/>
            <dataset name="Surface Data" urlPath="laps-levels-large.nc"/>
        </dataset>

    </dataset>
</catalog>

For small files, this actually works. For larger files, the interaction 
with the servlet breaks somehow, however the file sources from apache 
works okay.

I don't understand why this is the case. Software such as wget, firefox 
etc is happily able to download the file, resume partial downloads etc. 
Thredds is happily able to get smaller files. I fail to see why file 
size is affecting the system so badly.

> generally a netcdf client like the thredds data viewer will treat the 
> file as random access, and so may skip around in the file. if all you 
> do is read the file sequentially, HTTP is ok. but for random access it 
> can be really slow. Opendap is much better in this case.

That's exactly the goal - we want to use Opendap to give data to the 
various software clients that will use the data, in order to gain the 
many advantages offered. However, we have to get the files IN to thredds 
somehow.

>> Is there a "magic number" in thredds which is a best window size to 
>> use? Would it "prefer" to get its data in any particular way? Thredds 
>> is basically the only client for this servlet, so I will just tune it 
>> for best performance.
>
>
> what do you mean by "window size" ?

When I'm serving data from my servlet, I create an 8k buffer which reads 
data from disk, then is flushed to the output stream.

                // Sent in 8 byte chunks
                byte[] dataBuf = new byte[8192]; //we'll read 8K 
chunks               
                in.seek(0L);
                in.skipBytes(firstByte);
                int length = 0;
               
                long bytecount = 0;
                while(in != null && (length = 
in.read(dataBuf,0,dataBuf.length)) != -1) {
                    if(debug) servletContext.log("doGet() valid: serving 
bytes " + bytecount + " to " + length);
                    bytecount = bytecount + length;
                    out.write(dataBuf, 0, length);
                }

I have attached the full code for your interest.

Cheers,
-Tennessee

--------------070200080302040308050304
Content-Type: text/x-java;
 name="Marslet.java"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="Marslet.java"

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;

public class Marslet extends HttpServlet
{
 
    boolean debug=true;    
    
    /**
     * This returns the header only for the equivalent GET request. It may be
     * used by some clients to establish file-sizes or otherwise make use
     * of summary information before performing a GET request. The HEAD
     * response may not include a message-body.
     */
    
    protected void doHead(HttpServletRequest request, HttpServletResponse 
response) 
    throws ServletException, IOException
    {
        
        HttpSession session = request.getSession();
        ServletContext servletContext = getServletContext();
        ServletOutputStream out = response.getOutputStream();
        File ncFile = doMarsQuery(request);
        
        // Abort if file cannot be found
        if(ncFile == null) {
            servletContext.log("Null netCDF file - cannot continue");
            out.println("Error in processing - database request failed. Please 
contact the administrator tjl@xxxxxxxxxx");
            return;
        }
        
        String filename = ncFile.getPath();
        RandomAccessFile in = null;
        String contentType = "application/x-netcdf";
        
        servletContext.log("Found file, processing HEAD request");
        
        try {

            long filesize = ncFile.length();
            if(debug) servletContext.log("doHead(): filesize is "+ filesize);
            
            String rangeHeader = request.getHeader("range");
            
            // Behave differently if request is bad for some reason
            if(!isRangeHeaderValid(rangeHeader, filesize)) {
                
                // Bad numbers in range, log error
                if(rangeHeader != null && rangeHeader != "") {
                    servletContext.log("*** Invalid byte range header, sending 
entire file: " + rangeHeader);   
                }
                
                servletContext.log("Sending entire file");
                //headEntireFile(response, out, ncFile, contentType, 
servletContext);        
                
                FileInputStream inStream = null;
        
                inStream = new FileInputStream(ncFile);
                int length = (int)ncFile.length();
                response.setStatus(response.SC_PARTIAL_CONTENT);  
                response.setHeader("Accept-Ranges", "bytes");
                response.setContentLength(length);
                out.println("Content-Length: " + length);
                response.setContentType(contentType);
                
                out.flush();
                if(debug) servletContext.log("doHead() invalid: " + 
response.SC_PARTIAL_CONTENT + "Accept-Ranges: bytes" + "Content-Length: " + 
length + "Content-Type" + contentType);
                
            }
            
            // If a valid byte range has been requested
            else {
                in = new RandomAccessFile(ncFile, "r");
                
                // byte range variables
                int firstByte = 0;
                int lastByte = 0;
                int nBytes = 0;
                
                StringTokenizer st = getRangeAsTokens(rangeHeader);
                String range = "";
                
                range = st.nextToken();
                range = range.trim();
                
                // Determine firstByte
                if(range.indexOf('-') == 0) { firstByte = -1; }
                else { firstByte = (new Integer(range.substring(0, 
range.indexOf('-')))).intValue(); }
                
                // Determine lastByte
                if(range.indexOf('-') == range.length() - 1) { lastByte = -1; }
                else { lastByte = (new 
Integer(range.substring(range.indexOf('-') + 1))).intValue(); }
                
                // If firstByte < 0, then client wants from there to EOF 
                if( firstByte < 0 ) {
                    firstByte = (int) filesize - lastByte;
                    lastByte = (int) filesize - 1;
                }
                
                // If last byte < 0, set to EOF
                if(lastByte < 0) { lastByte = (int) filesize - 1; }
                nBytes = (lastByte = firstByte) + 1;
                
                ////////////////////////////////////////
                // Send the headers, do not write data
                ////////////////////////////////////////
                
                String contentRange = "bytes " + firstByte + "-" + lastByte + 
"/" + filesize;
                
                response.setStatus(response.SC_PARTIAL_CONTENT);
                response.setHeader("Accept-Ranges", "bytes");
                //response.setContentLength(nBytes);
                out.println("Content-Length: " + nBytes);
                response.setContentType(contentType);
                response.setHeader("Content-Range", ""+contentRange);
                
                if(debug) servletContext.log("doHead() valid: " + 
response.SC_PARTIAL_CONTENT + "Accept-Ranges: bytes" + "Content-Length" + 
nBytes + "Content-Type" + contentType + "Content-Range: " + contentRange);
                
            }
            
            out.flush();
            out.close();
        }
        
        catch(Exception e) { 
            servletContext.log("Catching: " + e.toString(), e); 
            e.printStackTrace();
        }
    
        finally {
            
            try { if(in != null) { in.close(); }   }
            catch( Exception e2) { servletContext.log( "Finally: " + 
e2.toString(), e2); }
        }
        
    }    
    
    protected void doGet(HttpServletRequest request, HttpServletResponse 
response)
    throws ServletException, IOException
    {
        
        HttpSession session = request.getSession();
        ServletContext servletContext = getServletContext(); // For logging
        ServletOutputStream out = response.getOutputStream();
        File ncFile = doMarsQuery(request);
        
        if(ncFile == null) {
            servletContext.log("NULL netCDF file - cannot continue");
            out.println("Error in processing - database request failed. Please 
contact the administrator tjl@xxxxxxxxxx");
            
            return;   
        }
        String filename = ncFile.getPath();
        RandomAccessFile in = null;
        String contentType = "application/x-netcdf";

        servletContext.log("doGet(): Found file, processing GET request");
        
        try {
            
            Enumeration e = request.getHeaderNames();
            while(e.hasMoreElements()) {
                String headerName = (String) e.nextElement();
                
                Enumeration e2 = request.getHeaders(headerName);
                String requestStr = "";
                while(e2.hasMoreElements()) {
                    String headerValue = (String) e2.nextElement();
                    requestStr = requestStr + "Request> " + headerName + ": " + 
headerValue + "\n";
                }
                servletContext.log(requestStr);
            }


            long filesize = ncFile.length();
            
            String rangeHeader = request.getHeader("range");
            
            //If a bad byte-range has been requested
            if(!isRangeHeaderValid(rangeHeader, filesize)) {
                
                // Bad numbers in range, log error
                if(rangeHeader != null && rangeHeader != "") {
                    servletContext.log("*** Invalid byte range header, sending 
entire file: " + rangeHeader);
                }
                
                if(debug) servletContext.log("doGet() invalid: Sending entire 
file");
                sendEntireFile(response, out, ncFile, contentType, 
servletContext);
            }
            
            // If a valid byte range has been requested
            else {

                in = new RandomAccessFile(ncFile, "r");
                
                // byte range variables
                int firstByte = 0;
                int lastByte = 0;
                int nBytes = 0;
                
                StringTokenizer st = getRangeAsTokens(rangeHeader);
                String range = "";
                
                ///////////////////////////////////////////
                // Note - the original code snippet I found handled multiple
                // byte-range requests, but this isn't actually correct. 
                // When handling byte-range requests, you must not have 
multipart
                // responses, but simply serve the data requested. (See HTTP 
1.1 Specification)
                // Presumably this means you can only have one byte-range 
request in the
                // request header, thus this code retrieves only the first 
token for 
                // byte-range
                ///////////////////////////////////////////
                    
                range = st.nextToken();
                range = range.trim();
                
                // Determine firstByte
                if(range.indexOf('-') == 0) 
                    { firstByte = -1; } // range format is "-lastbyte"
                else 
                    { firstByte = (new Integer(range.substring(0, 
range.indexOf('-')))).intValue(); }
                
                // Determine lastByte
                if(range.indexOf('-') == range.length() - 1) 
                    { lastByte = -1; }//range format is "firstbyte-"
                else 
                    { lastByte = (new 
Integer(range.substring(range.indexOf('-') + 1))).intValue(); }
                
                // If first or last byte < 0, then client wants from there to 
EOF
                if(firstByte < 0) {
                    firstByte = (int) filesize - lastByte;
                    lastByte = (int) filesize - 1;
                }
                
                // If last byte is < 0, set to EOF
                if(lastByte < 0) { lastByte = (int) filesize - 1; }
                nBytes = (lastByte - firstByte) + 1;
                
                ////////////////////////////////////////////
                // Send the headers and start writing data 
                ////////////////////////////////////////////
                
                String contentRange = "bytes "+ firstByte + "-" + lastByte + 
"/" + filesize;                
                
                response.setStatus(response.SC_PARTIAL_CONTENT);
                response.setHeader("Accept-ranges", "bytes");
                response.setContentType(contentType);
                response.setHeader("Content-Range", ""+contentRange);
                response.setContentLength(nBytes);
                
                String responseStr = "";
                responseStr = responseStr + "Response> Status: " + 
response.SC_PARTIAL_CONTENT + "\n";
                responseStr = responseStr + "Response> Accept-ranges: bytes\n";
                responseStr = responseStr + "Response> Content-Type: " + 
contentType + "\n";
                responseStr = responseStr + "Response> Content-Range: " + 
contentRange + "\n";
                responseStr = responseStr + "Response> Content-length: " 
+nBytes + "\n";
                
                if(debug) servletContext.log("doGet(): valid \n" + responseStr);
                
                //out.println("Content-Length: " + nBytes);

                // How it used to be - suspect of causing a buffer overflow
                //byte[] dataBuf = new byte[8192]; //we'll read 8K chunks       
         
                //byte[] dataBuf = new byte[nBytes + 1];
                //in.seek(0L);
                //in.skipBytes(firstByte);
                //int length = in.read(dataBuf, 0, nBytes);
                //out.write(dataBuf, 0, length);

                // Sent in 8 byte chunks
                byte[] dataBuf = new byte[8192]; //we'll read 8K chunks         
       
                in.seek(0L);
                in.skipBytes(firstByte);
                int length = 0;
                
                long bytecount = 0;
                while(in != null && (length = 
in.read(dataBuf,0,dataBuf.length)) != -1) {
                    if(debug) servletContext.log("doGet() valid: serving bytes 
" + bytecount + " to " + length);
                    bytecount = bytecount + length;
                    out.write(dataBuf, 0, length);
                }                
                
                out.flush();
                
            }
            
            out.close();
            
        }
        
        // Ignore some client-caused exceptions
        catch(java.io.IOException ioe) {

            // Ignore "Connection reset by peer" exceptions which can be cause 
by
            // a number of reasons attributable to the client. They are 
generally
            // harmless and out of our control. Log others.
            
            //if(ioe.toString().compareToIgnoreCase("java.io.IOException: 
Connection reset by peer") != 0) {
                servletContext.log(ioe.toString(), ioe);
            //}
        }
        
        // Log generic exceptions
        catch(Exception e) {
            servletContext.log(e.toString(), e);   
        }
        
        // Try to close the file if it's still open
        finally {

            try {
                if(in != null) { in.close(); }   
            }
            catch(Exception e2) {
                servletContext.log(e2.toString());   
            }
            
        }
        
    }    
    /**
     * Interpret the GET variables into a mars request and execute
     */
     
    private File doMarsQuery(HttpServletRequest request) {
            
        try 
        {
            return new File("/data/laps-levels-large.nc");
        }
        catch(Exception e) {
            getServletContext().log(e.getMessage());
            e.printStackTrace();
        }
        
        return null;
    }
    
    private File getFromCache(String requestString) {
        String tmpDirName = "/nm/scratch/marslet/";
        File tmpDir = new File(tmpDirName);        
        int hash = requestString.hashCode();
        File ncFile = new File(tmpDir, "marslet" + requestString.hashCode() + 
".nc");
        if(ncFile.exists()) { return ncFile; } else { return null; }
    }
    
    private void addToCache(File ncFile) {
        // Do nothing - no accounting just yet   
    }
    
    
    /**
     * Send the entire file to the client
     * <p>
     * @param HttpServletResponse the response object
     * @param HttpServletRequest the request object
     * @param File the file we're sending
     * @param String Content-type: header value
     */
    private void sendEntireFile(HttpServletResponse response,
                                ServletOutputStream out,
                                File file,
                                String contentType,
                                ServletContext servletContext
                                )
    throws IOException, Exception
    {

        FileInputStream inStream = null;
        
        try {
            inStream = new FileInputStream(file);
            int length = 0;
            response.setHeader("Accept-Ranges", "bytes");
            response.setContentType(contentType);
            response.setContentLength((int)file.length());
            //out.println("Content-Length: " + (int)file.length());
            
            byte[] buf = new byte[8192]; //we'll read 8K chunks
            while(inStream != null && (length = 
inStream.read(buf,0,buf.length)) != -1) {
                out.write(buf, 0, length);   
            }
        }
        catch(IOException ioe) {
            throw ioe;   
        }
        catch(Exception e) {
            throw e;
        }
        finally {
            if(inStream != null) {
                inStream.close();   
            }
        }
        
    }
    
    /**
     * Validate the byte range request header
     * The following byte range request header formats are supported:
     * <ul>
     *   <li> firstbyte-lastbyte (request for explicit range)
     *   <li> firstbyte- (request for 'firstbyte' byte to EOF)
     *   <li> -lastbyte (request for 'lastbyte' byte to EOF)
     * </ul>
     * @param String the byte range header
     * @return boolean true=valid header, false=invalid header
     */
     
    private boolean isRangeHeaderValid(String rangeHeader, long filesize) 
    {
        
        if(rangeHeader == null || rangeHeader.equals("")) { return false; }
        
        String range = "";
        int firstbyte = 0;
        int lastbyte = 0;
        
        StringTokenizer st = getRangeAsTokens(rangeHeader);
        while(st.hasMoreTokens()) {
            range = st.nextToken();
            range = range.trim();
            
            int index = range.indexOf('-');
            
            if( index == -1 ) { return false; } //Illegal: must contain a '-'
            if(range.length() <= 1) { return false; } //Illegal = musthave more 
than '-'
            
            //Case -lastbyte
            if(index == 0) {
                lastbyte = (new Integer(range.substring(range.indexOf('-') +1 
))).intValue();
                if(lastbyte > filesize) { return false; }
                else continue;
            }
            
            //Case firstbyte-
            if(index == range.length() -1) {
                firstbyte = (new 
Integer(range.substring(0,range.indexOf('-')))).intValue();
                if( firstbyte > filesize) { return false; }
                else { continue; }
            }
            
            //Case firstbyte=lastbyte
            if(index != 0 && index != range.length() -1) {
                firstbyte = (new 
Integer(range.substring(0,range.indexOf('-')))).intValue();
                lastbyte = (new Integer(range.substring(range.indexOf('-') + 
1))).intValue();
                
                if(firstbyte > lastbyte) { return false; }
                else { continue; }
            }
        }
        
        return true;
    }
    
    /**
     * Break the range header into token. Each token represents a single 
requested range.
     *
     * The most common tange header formate is ...
     *
     * range = firstbyte-lastbyte,firstbyte-lastbye
     *
     * ... with one ofr more firstbyte-lastbyte values, all comma separated
     *
     * @param String the byte range header
     * @return StringTokenizer the tokenized string 
     */
     
    private StringTokenizer getRangeAsTokens(String rangeHeader)
    {
        String ranges = rangeHeader.substring(rangeHeader.indexOf('=') + 1, 
rangeHeader.length());
        return new StringTokenizer(ranges, ",");
    }
    
    /**
     * Handle HTTP post requests
     */
    
     public void doPost(HttpServletRequest request, HttpServletResponse 
response)
     throws ServletException, IOException
     {
         return;
     }
}

--------------070200080302040308050304--