TDS Tutorial: TDS Monitoring and Debugging

Access Logs

The Tomcat access log

Configuring Tomcat to enable access logging

  1. Modify server.xml to enable the AccessLogValve.
  2. Using your favorite editor open $TOMCAT_HOME/conf/server.xml:

    $ vi server.xml
    

    Locate the AccessLogValve contained in the Host element (should be near the bottom of the file):

    <!-- Define the default virtual host
            Note: XML Schema validation will not work with Xerces 2.2.
    -->
    <Host name="localhost"  appBase="webapps"
          unpackWARs="true" autoDeploy="true"
          xmlValidation="false" xmlNamespaceAware="false">
    
      . . .
      
      <!-- Access log processes all example.
              Documentation at: /docs/config/valve.html -->
      <!--
      <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"  
                prefix="localhost_access_log." suffix=".txt" pattern="common" resolveHosts="false"/>
      -->
    
    </Host>
    

    Uncomment the AccessLogValve and change the prefix and suffix attributes to customize the access log name:

    prefix="access." 
    suffix=".log"
    

    To provide more useful information about who is accessing the TDS, change the pattern element to customize the format of each log entry:

    pattern="%h %l %u %t &quot;%r&quot; %s %b &quot;%{Referer}i&quot; &quot;%{User-Agent}i&quot; %D"
    

    When you are finished with your edits, the AccessLogValve should look something like the following:

    <Valve className="org.apache.catalina.valves.AccessLogValve"
       directory="logs"  
       prefix="access." 
       suffix=".log"
       pattern="%h %l %u %t &quot;%r&quot; %s %b &quot;%{Referer}i&quot; &quot;%{User-Agent}i&quot; %D"
       resolveHosts="false" />
    
  3. Verify the access log has been enabled.
  4. Restart Tomcat and verify an access log with a name format of access.yyyy-mm-dd.log has been generated in the $TOMCAT_HOME/logs/ directory:

    $ ls -l /home/thredds/apache-tomcat-6.0.20/logs/access*
    -rw-r--r-- 1 thredds workshop 0 2009-08-04 13:22 access.2009-08-04.log
    

Exercise One: Enable the access logging in Tomcat

  1. Using the steps outlined in the above section, enable access logging in Tomcat via the AccessLogValve.

Access log format

The access log entry format we are using is almost identical to the standard combined logging format with an addition: the %D which is used for documenting the Time taken to process the request, in millis will appear at the end of each log entry:

pattern="%h %l %u %t &quot;%r&quot; %s %b &quot;%{Referer}i&quot; &quot;%{User-Agent}i&quot; %D"

Access log format

For more information on access log format configuration, see the Tomcat Valve Component Documentation.

The above pattern makes use of the following codes:

The above pattern translates into:

128.117.140.183 - oxelson [09/Jul/2009:20:23:46 -0700] "GET /manager/html HTTP/1.1" 200 13022 "http://localhost:8080/" 
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3" 325

Another way of looking at it:

Remote host 128.117.140.183
Remote logical username from identd -
Authenticated user oxelson
Time and date of request [09/Jul/2009:20:23:46 -0700]
HTTP request method GET
Request URI /manager/html
Protocol used HTTP/1.1
HTTP server response 200
Bytes transferred 13022
Referer http://localhost:8080/
User Agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3
Response time (in millis) 325

Looking at access log data

Storing the information in the access log is only the start of log management. The next step is to analyze this information to produce useful statistics.

  1. Looking at raw access log data.
  2. Locate the access.2009.07.27.log file in your home directory which contains an access log file taken from one of our production servers.

    $ cd
    $ ls -l access* 
    -rw-r--r--  1 thredds workshop 350361 2009-08-04 13:14 access.2009-07-27.log
    

    Have a look at the log file and try to examine the data stored there. (Kind of hard to sort through it all, isn't it)?

    $ less access.2009-07-27.log
    
  3. Analyzing data subsets with awk and dig.
  4. Sometimes we don't need to see the "whole picture" and just need some quick statistics. E.g., we need to find the host that has made the most requests to the server.

    Run the following command:

    $ cat access.2009-07-27.log | awk '{print $1}' | sort | uniq -c | sort -n
       6 198.181.231.27
     168 128.117.140.75
     927 192.12.209.93
    1115 192.12.209.57
    

    The output of the access log file is piped into awk, which prints the first "field variable" $1. The result is sorted, identical lines are counted, and finally sorted again in descending order.

    192.12.209.57 is the host that has made the most requests to the server during this particular access log's period.

    But who is 192.12.209.57? (IPs are not too meaningful by themselves.)

    Run dig to find out the host information for 192.12.209.57:

    $  dig +short -x 192.12.209.57
    atm.cise-nsf.gov.
    

    atm.cise-nsf.gov is the host name for 192.12.209.57 (a bit more meaningful).

    Looking at the other IP addresses from the awk output. Without running dig, what do you think the domain of 192.12.209.93 is going to be?

    See if your assumption is correct:

    $  dig +short -x 192.12.209.93
    cise093.cise-nsf.gov.
    

    Run dig on the other addresses to get the host names for the other two remaining IPs:

    $  dig +short -x 128.117.140.75
    newhart.unidata.ucar.edu.
    $  dig +short -x 198.181.231.27
    hirudo.nsf.gov.
    

Exercise Two: Find out if any of the requests were made by authenticated users.

  1. Use the awk command sequence to find out if any authenticated users were responsible for making these requests.
  2. $ cat access.2009-07-27.log | awk '{print $3}' | sort | uniq -c | sort -n
    

    What did you find?

Exercise Three: Using parsed access log data.

  1. Parse log data.
  2. Run /usr/bin/parse_logs.pl to parse the access.2009.07.30.log file:

    $ /usr/bin/parse_logs.pl
    

    The resulting file logdata.csv will contain the log data in tab-delimited format that will have a .txt extension:

    $ ls -l *log* 
    -rw-r--r-- 1 thredds workshop 350361 2009-08-04 13:14 access.2009-07-27.log
    -rw-r--r-- 1 thredds workshop 339731 2009-08-04 13:28 logdata.csv
    
  3. Examine the log data.
  4. Open the logdata.csv file in the OpenOffice spreadsheet utility.

    What Requested URL took the most time to process?

    Response Codes

    See HTTP Status Code Definitions for a reference to server response codes.

    What types of Request Methods are recorded? What type of server Response/Status Codes did these Request Methods return?

    What was the most prevalent Referer data? What were the corresponding User Agents for these values? Does this make sense?

    Are there any entries that might be of interest from a security point of view? Is there a common data element among these entries? What do you think caused these entries?

Log Files Generated by the TDS

Introducing the TDS logging directory: $TOMCAT_HOME/content/thredds/logs

Keep in Mind

There are two locations of log files you will reference:

  1. $TOMCAT_HOME/logs/ is where you go to look at Tomcat-generated log files; and
  2. the TDS logging directory which is where you go to look at TDS-generated log files.

The log files generated by the TDS are located in the $TOMCAT_HOME/content/thredds/logs directory:

Log file Data the file contains
threddsServlet.log contains log messages used for debugging
serverStartup.log contains log messages from the initialization of the TDS
models.log contains log messages specific to gridded model data, especially GRIB data
cache.log contains log messages related to TDS object and file caching

About the threddsServlet.log

threddsServlet.log log format

  1. Look at the logging configurations for threddsServlet.log.
  2. Navigate to the unpacked thredds directory in $TOMCAT_HOME/webapps, and view the WEB-INF/log4j.xml:

    $ cd /home/thredds/apache-tomcat-6.0.20/webapps/thredds
    $ less WEB-INF/log4j.xml
    

    The TDS uses the OpenSource Apache Log4J library (hence, the name of the file: WEB-INF/log4j.xml).

    Find the first uncommented Appender element with a name of threddsServlet and a class of org.apache.log4j.DailyRollingFileAppender, and note the ConversionPattern:

      <appender name="threddsServlet" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="File" value="${tds.log.dir}/threddsServlet.log"/>
        <param name="DatePattern" value=".yyyy-MM-dd-HH"/>
        <layout class="org.apache.log4j.PatternLayout">
          <param name="ConversionPattern" value="%d{yyyy-MM-dd'T'HH:mm:ss.SSS Z} [%10r][%8X{ID}] %-5p - %c - %m%n"/>
          <!--param name="ConversionPattern" value="%d{ISO8601} [%10r - %10X{ID}] %-5p - %c - %m%n"/-->
        </layout>
      </appender>     
    
  3. Examine the log format being applied to the threddsServlet.log file.
  4. The ConversionPattern parameter specifies the Log4J pattern used to format the entries of the threddsServlet.log file:

    value="%d{yyyy-MM-dd'T'HH:mm:ss.SSS Z} [%10r][%8X{ID}] %-5p - %c - %m%n" 
    

    Log4J library

    For more information on the Log4J library, consult the online documentation.

    The above pattern makes use of the following Log4J codes:

    • %d{yyyy-MM-dd'T'HH:mm:ss.SSS Z} - Date and time
    • %10r - Time, in millis, since Tomcat startup
    • %8X{ID} - Transaction ID
    • %-5p - Logging level (options are DEBUG, INFO, WARN, ERROR, and FATAL)
    • %c - Location in the TDS code where the message was generated
    • %m - Content of the log message
    • %n - Newline character

    The above pattern translates into:

    2009-08-03T14:02:28.184 -0600 [1044713597][  214877] INFO  - thredds.server.catalogservice.LocalCatalogServiceController - handleRequestInternal(): Remote host: 192.87.16.142 - Request: "GET /thredds/catalog/nexrad/level3/NVL/VAX/20090619/catalog.xml HTTP/1.1"
    

    Another way of looking at it:

    Time and date 2009-08-03T14:02:28.184 -0600
    The number of milliseconds since the server was started [1044713597]
    Request ID [ 214877]
    Logging level INFO
    Location in the TDS code where the message was generated thredds.server.catalogservice.LocalCatalogServiceController - handleRequestInternal()
    Content of the log message Remote host: 192.87.16.142 - Request: "GET /thredds/catalog/nexrad/level3/NVL/VAX/20090619/catalog.xml HTTP/1.1

Transaction bracketing in threddsServlet.log

Start of transaction

The main message in a start of transaction log message includes:

For example:

... Remote host: 128.117.140.75 - Request: "GET /thredds/catalog.html HTTP/1.1"

End of transaction

The main message in a end of transaction log message includes:

For example:

... Request Completed - 200 - 3403 - 1

Most end of transaction messages include a status code that matches a standard HTTP status code. However, when the client breaks the connection or a request is forwarded to another TDS internal service, one of three TDS specific status codes are used:

If the request is forwarded successfully, another set of transaction messages should be initiated with the ending message containing a standard HTTP status code. If forwarding the request fails, another set of transaction message may or may not be initiated and the entry in the access log will probably either log a 400 (Not Found) or a 500 (Internal Server Error) HTTP status code.

Looking at threddsServlet.log data

Here is an example of some log messages from a request that resulted in error messages:

2009-06-03T21:49:07.147 -0600 [ 104481318][   27031] INFO  - thredds.server.opendap.OpendapServlet - Remote host: 128.117.140.75 - Request: "GET /thredds/dodsC/nexrad/level2/KFTG/20090602/Level2_KFTG_20090602_0346.ar2v.dds HTTP/1.1"
2009-06-03T21:49:07.415 -0600 [ 104481586][   27031] ERROR - thredds.server.opendap.OpendapServlet - path= /nexrad/level2/KFTG/20090602/Level2_KFTG_20090602_0346.ar2v.dds
java.io.IOException: Cant read /data/ldm/pub/native/radar/level2/KFTG/20090602/Level2_KFTG_20090602_0346.ar2v: not a valid NetCDF file.
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:716)
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:381)
 ...
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)
2009-06-03T21:49:07.416 -0600 [ 104481587][   27031] INFO  - thredds.server.opendap.OpendapServlet - Request Completed - 500 - -1 - 269

Keep in Mind

Log messages from multiple requests can be interspersed in the log file.

In the above example, note:

Exercise Four: Find an example of error transition bracketing in your own threddsServlet.log.

  1. Use the grep command to find out which threddsServlet.log file contains the error message:
  2. $ cd /home/thredds/apache-tomcat-6.0.20/content/thredds/logs
    $ grep ERROR threddsServlet.log*
    
  3. Open the threddsServlet.log file which contains the error message and find all the message transaction entries that correspond to that error.

Monitoring and Debugging Tools

Log analyzers

There are a variety of OpenSource and commercial log analysis packages you may wish to employ. Below are some resources to help locate a package:

Firefox Add-on: Live HTTP Headers

Live HTTP Headers

Exercise Five: Use Live HTTP Headers.

  1. Clear any authenticated browser sessions you may have running.
  2. Open Live HTTP Headers
  3. Go to the Tomcat manager application in your browser: http://localhost:8080/manager/html/ and login.
  4. Look at the resulting output in the Live HTTP Headers window.

Tools that come with the JDK: JConsole and JVisualVM

JVM monitoring tools

  • JVisualVM - Java Virtual Machine Monitoring, Troubleshooting and Profiling Tool.
  • JConsole - Java Monitoring and Management Console.
  1. Use JConsole to track the JVM running Tomcat.
  2. Invoke the jconsole program in $JAVA_HOME/bin:

    $ cd $JAVA_HOME/bin
    $ ./jconsole
    

    If you happen to be running more than one JVM on your machine, look to see what Process ID # for the JVM running Tomcat:

    $ ps -ef | grep tomcat
    thredds   5168     1  5 14:01 pts/2    00:00:12 /home/thredds/jdk1.6.0_14/bin/java -Djava.util.logging.config.file=/home/thredds/apache-tomcat-6.0.20/conf/logging.properties -Xmx256m -Xms256m -server -Djava.awt.headless=true -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/home/thredds/apache-tomcat-6.0.20/endorsed -classpath :/home/thredds/apache-tomcat-6.0.20/bin/bootstrap.jar -Dcatalina.base=/home/thredds/apache-tomcat-6.0.20 -Dcatalina.home=/home/thredds/apache-tomcat-6.0.20 -Djava.io.tmpdir=/home/thredds/apache-tomcat-6.0.20/temp org.apache.catalina.startup.Bootstrap start
    thredds   5451  4285  0 14:05 pts/2    00:00:00 grep tomcat
    

    Select the local JVM process corresponding to Tomcat and connect:

    Starting JConsole

  3. Try to create a PermGen error via the Tomcat manager application.
  4. In your browser window, continually Start, Stop and reload the TDS and other web applications using the manager application.

    In the meantime, chart the Memory Pool "PS Perm Gen" in the JConsoleMemory tab. Also monitor the Heap Memory Usage and Class Loading displays in the Overview tab.

  5. Try to create a PermGen error via the Tomcat manager application.
  6. Identify PermGen as it happens.
  7. The PermGen error manifests itself in the following ways:

    1. The manager application stops responding.
    2. JConsole will loose its connection with the Tomcat JVM:
    3. PermGen Error

    4. The following error is reported in catalina.out:
    5. Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: PermGen space
      
    6. In the two JConsole screen captures below, you can see the memory usage of PermGen increased with every reload of the application until it reaches the point where the JVM crashes with the OutOfMemory error. The default maximum PermGen amount of JVM is 64 MB.

      PermGen Error

      PermGen Error

Exercise Six: Use JConsole to monitor the JVM during a manager-induced PermGen error.

  1. Using the steps outlined above, try to induce a PermGen error using the manager application and monitor it via JConsole.

THREDDS This document is maintained by Unidata. Send comments to THREDDS support.