LDM Logfile Format and Interpretation

Contents


Introduction

While some LDM utilities are designed to write diagnostic and informational messages to standard output, most utilities (in particular rpc.ldmd) are designed to write such messages to a logfile. Typically, the logfile is $LDMHOME/logs/ldmd.log, where $LDMHOME is the home-directory of the LDM user.

In general, the format of most log-messages is

MMM DD hh:mm:ss localhost util[pid]: msg

Associated with each log-message is a priority-level. The levels are: error, warning, notice, informational, and debug. The priority-level is used to decide whether or not to write the message to the logfile. Most utilities log errors, warnings, and notices by default. This behavior can usually be changed when the utility is executed by using the "-v" (verbose) option. This option will cause the utility to also log informational messages (and cause the logfile to grow more rapidly). Another common option is "-x", which will cause the utility to also log debug messages (and cause the logfile to grow extremely fast). You should rarely, if ever, need to use the "-x" option.

The rpc.ldmd utility has another way to alter its logging-level. Reception of a SIGUSR2 signal by the utility will cause it to change the logging-level threshold in the following sequence:

notice -> informational -> debug -> notice -> ...
The kill(1) utility can be used to send a SIGUSR2 signal to the rpc.ldmd utility.

Message Templates

A message template is an abstract log-message in which concrete fields (e.g., UNIX process-ID, upstream host name) have been replaced with corresponding placeholders (e.g., pid, uphost). When searching this web page for the message template corresponding to an actual log-message, be sure to search for only that part of the message that doesn't vary (such parts are in a non-italic font in the templates).

In the following log-message templates, the timestamp (MMM DD DD) and localhost prefixes have been omitted and only the

util[pid]: msg
fields are shown.

Message Templates for


Message Templates for the Top-Level rpc.ldmd

rpc.ldmd[pid]: Starting Up (version: maj.min.bug; built: MMM-DD-YYYY hh:mm:ss)
A top-level (i.e., parent) rpc.ldmd was started. The package was built in month MMM day DD of year YYYY at hour hh minute mm and second ss, local time.

rpc.ldmd[pid]: child child-pid exited with status exit-code
The child process corresponding to child-pid terminated with an exit-code of exit-code.

rpc.ldmd[pid]: SIGTERM
The top-level rpc.ldmd process received a SIGTERM signal.

rpc.ldmd[pid]: Terminating process group
The top-level rpc.ldmd process is terminating all its child processes.

rpc.ldmd[pid]: NOTICE: Couldn't contact portmapper on host IP-addr, port port: rpc_msg
The top-level rpc.ldmd process can't connect to the RPC portmapper daemon on port port on the host whose IP address is IP-addr due to the RPC error rpc_msg. Generally, this is not a problem because the LDM can operate correctly even if the portmapper isn't running if it can use the well-known LDM port 388.

rpc.ldmd[pid]: Denying connection from downhost
The given top-level rpc.ldmd denied a connection attempt by downhost because the LDM configuration-file ldmd.conf did not authorize the connection.

rpc.ldmd[pid]: gethostbyaddr: failed for IP-addr
The IP address IP-addr could not be converted into a fully-qualified hostname.

rpc.ldmd[pid]: Version vers LDM already running on local host
The just-started, top-level rpc.ldmd succeeded in creating a client-side handle for communicating with a different rpc.ldmd on the same computer (there can be only one, top-level rpc.ldmd on any, given computer). The protocol version of the LDM is vers. This could be due to any of the following:
  • A top-level, rpc.ldmd is listening on port 388. This can be checked via the rpcinfo utility.
  • A top-level, rpc.ldmd is still registered with the RPC portmapper daemon on the local host -- even though it might have terminated. This can be checked via the netstat utility (which can also remove the registration).
  • The mapping from hostname to IP-address is not working on the local host. This can be checked via the nslookup or dig utilities using the following hostnames (in order): the fully-qualified hostname based on the output of the command "uname -n", "localhost", and "0.0.0.0".

rpc.ldmd[pid]: Adjusted pathological regular expression "ere" at line lineno, file pathname
The LDM configuration-file pathname contains the pathological regular-expression ere at line lineno. The LDM converted the pathological regular-expression to a non-pathological one, internally, and should work OK. At some time, however, you should edit the configuration-file to correct the regular-expression.

Message Templates for a Downstream LDM

Notices

uphost[pid]: Starting Up(maj.min.bug): uphost: criteria
A downstream LDM was started that will request data-products from uphost. The initial data-product selection criteria are criteria.

uphost[pid]: Desired product class: criteria
The data-product selection-criteria has been modified according to the most-recent data-product in the product-queue that matched the previous criteria. The new criteria are criteria.

uphost[pid]: Connected to upstream LDM-6
A downstream LDM 6 is connected to an upstream LDM 6 on uphost. The downstream LDM 6 will now send a request for data.

uphost[pid]: Upstream LDM is willing to feed
A downstream LDM 6 sent a FEEDME request to upstream LDM 6 on uphost and the reply indicates that the upstream LDM 6 is willing to send the requested data-products.

uphost[pid]: Connected to upstream LDM-5
A downstream LDM 5 is connected to an upstream LDM 5 on uphost.

uphost[pid]: FEEDME(eldm.fsl.noaa.gov): OK
A downstream LDM 5 received a positive reply to its request to be fed from an upstream LDM 5 running on uphost.

uphost[pid]: srcloc: Restricting request: criteria1 -> criteria2
A downstream LDM 6 was notified by its upstream LDM 6 on uphost that the data-product selection criteria was modified from criteria1 to criteria2. This occurred at srcloc.

uphost[pid]: SIGINT
A downstream LDM that was being fed from uphost received a SIGINT signal.

uphost[pid]: SIGTERM
A downstream LDM that was being fed from uphost received a SIGTERM signal.

uphost[pid]: Exiting
A downstream LDM that was being fed from uphost is starting to terminate.

Errors

uphost[pid]: ERROR: srcloc1; srcloc2: Couldn't connect to LDM 6 on uphost using port port: reason
A downstream LDM 6 couldn't connect to an upstream LDM 6 on uphost using the well-known LDM port port because of reason. This error occurred at srcloc2. Try the ping(1) utility to see if the host is offline.

uphost[pid]: ERROR: srcloc1; srcloc:2 Couldn't connect to LDM 6 on uphost using either port port or portmapper: reason
A downstream LDM 6 couldn't connect to an upstream LDM 6 on uphost uphost using either the well-known LDM port port (tried first) or the portmapper daemon on that host (tried second). The cause of the second failure was reason. This error occurred at srcloc2. Try the ping(1) utility to see if the host is offline.

uphost[pid]: NOTICE: srcloc1; srcloc2: nullproc_6 failure to uphost; srcloc3: RPC: Program/version mismatch; low version = 4, high version = 5
A downstream LDM 6 was unable to connect to an upstream LDM 6 on uphost because the upstream LDM only supports the LDM 4 protocol and the LDM 5 protocol. The downstream LDM 6 will try again as a downstream LDM 5. This error occurred at srcloc3 and was further interpreted at srcloc2 and at srcloc1.

uphost[pid]: ERROR: srcloc: Connection to upstream LDM closed
The connection to an upstream LDM was closed for some reason. The reason might be in the LDM logfile on uphost. This error occurred at srcloc.

uphost[pid]: ERROR: srcloc: Upstream LDM died
A downstream LDM 6 was informed by the parent rpc.ldmd of its upstream LDM 6 that the upstream LDM 6 had terminated. The downstream LDM only makes this inquiry if it hasn't received any data-products in over one minute. This error occurred at srcloc.

uphost[pid]: Connection closed by upstream LDM
An upstream LDM 5 closed the connection to a its downstream LDM 5 for some reason. The reason might be in the LDM logfile on uphost.

uphost[pid]: Connection timed-out
A downstream LDM 5 closed the connection to its upstream LDM 5 on uphost because the downstream LDM 5's reply to a message from the upstream LDM 5 timed-out.

uphost[pid]: ################################: never completed
A downstream LDM 5 started receiving a new data-product from uphost while still waiting for a previous data-product to complete. The data-product signature of the outstanding data-product was "################################".

Message Templates for an Upstream LDM

Log-messages from upstream LDMs usually have the string "(feed)" or "(noti)" in the util field of the log-message, corresponding to FEEDME and NOTIFYME requests from the downstream LDM, repectively.

Notices

downhost(feed)[pid]: srcloc: Starting Up(maj.min.bug/6): criteria
An upstream LDM 6 was started. It will send every data-product that matches criteria to its downstream LDM 6 running on downhost.

downhost(feed)[pid]: topo: downhost feedtype
An upstream LDM is ready to send data-products of feedtype feedtype to downhost.

downhost(feed)[pid]: SIGTERM
An upstream LDM sending data-products to downhost received a SIGTERM signal -- probably from its top-level (parent) rpc.ldmd.

downhost(feed)[pid]: Starting Up(maj.min.bug/5): criteria
An upstream LDM 5 was started. It will send every data-product that matches criteria. to its downstream LDM 5 running on downhost.

downhost[pid]: ldmprog_4: ldmping from downhost
An ldmping utility on downhost connected (briefly) to this upstream LDM.

Errors

downhost(feed)[pid]: srcloc: nullproc_6() failure to downhost: RPC: Timed out
An attempt by an upstream LDM 6 to flush the connection to its downstream LDM 6 on downhost failed due to an RPC timeout. This might be due to network congestion and the downstream LDM 6 might still be connected. This error occurred at srcloc.

downhost(feed)[pid]: srcloc: nullproc_6() failure to downhost: RPC: Unable to send; errno = Broken pipe
An attempt by an upstream LDM 6 to flush the connection to its downstream LDM 6 on downhost failed due to a broken connection. The reason for the broken connection is unknown. The reason might be in the LDM logfile on downhost. This error occurred at srcloc.

downhost(feed)[pid]: srcloc: HEREIS: RPC: Unable to send; errno = Broken pipe
An attempt by an upstream LDM 6 to send a data-product using a HEREIS message to its downstream LDM 6 on downhost failed due to a broken connection. The reason for the broken connection is unknown. The reason might, however, be in the LDM logfile on downhost. This error occurred at srcloc.

downhost(feed)[pid]: srcloc: Product send failure: I/O error
An attempt by an upstream LDM 6 to send a data-product to its downstream LDM 6 on downhost failed. The reason for the failure should have been previously printed. This error occurred at srcloc.

downhost(feed)[pid]: prod-ID: RPC: Unable to receive (4)
An upstream LDM 5 did not receive a timely response to a HEREIS message to its downstream LDM 5 on downhost. The upstream LDM 5 was trying to send the data-product identified by prod-ID.

downhost(feed)[pid]: pq_sequence failed: I/O error (errno = errno)
An upstream LDM 5 was unable to send a data-product to its downstream LDM 5 on downhost. The reason for the failure should have been previously printed. The operating-system error-code is errno.

downhost(feed)[pid]: srcloc: nullproc_6() failure to downhost: RPC: Unable to receive; errno = Connection reset by peer
An attempt by an upstream LDM 6 to flush the connection to its downstream LDM 6 on downhost failed because the downstream LDM 6 closed the connection. The LDM logfile on downhost might contain the reason. This error occurred at srcloc.

downhost[pid]: NOTICE: srcloc1; srcloc2: nullproc_6 failure to thelma.ucar.edu; srcloc3: RPC: Timed out
A downstream LDM 6 was unable to connect to an upstream LDM 6 on downhost because the NULLPROC message timed-out. The downstream LDM 6 will try again. This error occurred at srcloc1 and was further interpreted at srcloc2 and srcloc3.

downhost(feed)[pid]: prod-ID: RPC: Timed out
An upstream LDM 6 was unable to send a data-product to a downstream LDM 5 on downhost because the RPC layer timed-out. The data-product is identified by prod-ID.

downhost(feed)[pid]: RECLASS: criteria
A downstream LDM 5 running on downhost requested that the data-product selection-criteria be modified. The new selection-criteria are criteria. This action is typically due to the downstream LDM 5 noticing that the data-product creation-times of recently-arrived data-products are too old and advancing the beginning time of the data-product creation-time in order to catch-up.

downhost(feed)[pid]: h_clnt_call: downhost: 1: time elapsed elapsed-time
An upstream LDM 5 spent an unusually long time sending a data-product to a downstream LDM 5 running on downhost. This probably indicates network congestion.

downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Unable to receive; errno = Connection reset by peer
An upstream LDM 6 didn't receive a reply to a COMINGSOON message that it sent its downstream LDM 6 running on downhost because the downstream LDM 6 closed the connection. The reason for the closure might exist in the LDM logfile on downhost. This error occurred at srcloc.

downhost(feed)[pid]: prod-ID: RPC: Unable to receive
An upstream LDM 6 did not receive a reply to a COMINGSOON message that it sent to a downstream LDM 5 on downhost. The data-product that was being sent is identified by prod-ID.

downhost(feed)[pid]: prod-ID: SHUTTING_DOWN
An upstream LDM 5 sent a data-product to its downstream LDM 5 on downhost which replied that it was terminating. The data-product that was sent is identified by prod-ID.

downhost(noti)[pid]: nullproc5(downhost): RPC: Unable to receive
A notifying upstream LDM 5 was unable to flush the connection to its downstream LDM 5 on downhost because the downstream LDM 5 did not reply. The downstream LDM 5 was probably a notifyme process that terminated.

downhost(feed)[pid]: srcloc: blkdata5(blockno): RPC: Timed out
An upstream LDM 5 was unable to send a block of data to its downstream LDM 5 on downhost because the RPC-call timed-out. This error occurred at srcloc.

downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Timed out
An upstream LDM 6 was unable to send a COMINGSOON message to its downstream LDM 6 on downhost because the RPC-call timed-out. This error occurred at srcloc.

downhost(feed)[pid]: srcloc: Product send failure: No such process
An upstream LDM 6 was unable to send a data-product to to its downstream LDM 6 on downhost because that process mysteriously vanished. The reason might be found in the LDM logfile on downhost. This error occurred at srcloc.

downhost(noti)[pid]: prod-ID: RPC: Unable to receive
An upstream LDM 5 did not receive a timely response to its NOTIFICATION message to its downstream LDM 5 on downhost. The data-product identifier was prod-ID.

downhost(feed)[pid]: nullproc5(downhost): RPC: Timed out
An upstream LDM 5 was unable to send a NULLPROC message to its downstream LDM 5 on downhost because the RPC-call timed-out.

downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Unable to send; errno = Broken pipe
An upstream LDM 6 was unable to send a COMINGSOON message to its downstream LDM 6 on downhost because the connection was broken. The reason might be found in the LDM logfile on downhost. This error occurred at srcloc.

downhost(feed)[pid]: srcloc: blkdata5(blockno): RPC: Server can't decode arguments
An upstream LDM 5 was unable to send a block of data to its downstream LDM 5 on downhost because the downstream LDM 5 was unable to decode the arguments of the RPC-call.

downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Server can't decode arguments
An upstream LDM 6 was unable to send a COMINGSOON message to its downstream LDM 6 on downhost because the downstream LDM 6 was unable to decode the arguments of the RPC-call.

downhost(feed)[pid]: prod-ID: RPC: Server can't decode arguments (RPC-code)
An upstream LDM 5 was unable to send a data-product to its downstream LDM 5 on downhost because the downstream LDM 5 was unable to decode the arguments of the HEREIS message. The RPC error-code is RPC-code.

downhost(feed)[pid]: prod-ID: RPC: Unable to send
An upstream LDM 5 was unable to send a data-product to its downstream LDM 5 for an unknown reason. The reason might be found in the LDM logfile on downhost.

Message Templates for Non-rpc.ldmd Utilties

Notices

pqact[pid]: pbuf_flush fileno: time elapsed elapsed-time
A pqact process took an unusually long time to write a block of data to an LDM decoder. The computer might be overloaded or the computer or decoder might be too slow. The file-descriptor of the pipe was fileno.

pqact[pid]: child child-pid terminated by signal sig
The child process of a pqact process was terminated by signal sig. This had to have been done by somthing outside the LDM system.

pqact[pid]: unio_open: pathname: Interrupted system call
A pqact process was unable to create file pathname because the process was interrupted by a signal.

pqact[pid]: pipe_prodput: trying again
A pqact process was unable to write a data-product to an LDM decoder. Because this could be due to a temporary condition, the process will try again.

pqact[pid]: ReReading configuration file pathname
A pqact process received a SIGHUP signal and will try to reread its configuration-file.

rtstats[pid]: Starting Up (parent-pid)
The rtstats utility was started by the parent process whose process-ID is parent-pid (typically, the top-level rpc.ldmd).

rtstats[pid]: Interrupt
The rtstats process received an interrupt signal.

rtstats[pid]: Exiting
The rtstats process is exiting.

scour: [ID pid localn.notice] skipping, no recent files in pathname
A scour process skipped the directory pathname because it contained no recent files.

Errors

pqact[pid]: pbuf_flush (fileno) Timed out
A pqact process was unable to flush a pipe to an LDM decoder because the flush timed-out. The computer or decoder is too slow or the computer is overloaded. The file-descriptor of the pipe was fileno.

pqact[pid]: pipe_dbufput: pathname write error
A pqact process was unable to write to LDM decoder pathname.

pqact[pid]: pbuf_flush (fileno) write: Broken pipe
A pqact process was unable to write to an LDM decoder because the pipe was disconnected -- probably due to the decoder. The file-descriptor of the pipe was fileno.

Key

blockno
Is the origin-0 block-number of an LDM 5 BLKDATA message.
child-pid
Is the UNIX process-ID number of the relevant child process.
criteria
Is the Data-product selection criteria in the form
start stop {{feedtype "pattern"} [,...]}
where:
start
Is the start-time in the form YYYYMMDDhhmmss.sss or the string TS_ZERO. The data-product creation-time must be equal-to or greater than this. The string TS_ZERO is, effectively, the beginning of time.
stop
Is the stop-time in the form YYYYMMDDhhmmss.sss or the string TS_ENDT. The data-product creation-time must be less-than or equal-to this. The string TS_ENDT is, effectively, the end of time.
pattern
Is the ERE for matching the data-product identifier.
DD
Is the day of the month when the message was generated (e.g., 08).
downhost
Is the downstream host. This can be a hostname, a fully-qualified hostname, or an IP address.
feedtype
Is the data-product feedtype.
fileno
Is the UNIX file descriptor of an open file.
elapsed-time
Is the elapsed-time interval in seconds.
hh:mm:ss
Is the time of day when the message was generated. A 24-hour UTC clock is used (e.g., 17:25:42).
localhost
Is the identifier of the host on which the LDM system generated the log-message.
MMM
Is the abbreviation of the month when the message was generated (e.g., Jan).
msg
Is the message from the utility.
pathname
Is the pathname of a file or directory.
pid
Is the UNIX process-ID number of the process that generated the message.
prod-ID
Is the data-product identifier.
uphost
Is the upstream host identifier. This can be a hostname, a fully-qualified hostname, or an IP address.
util
Is the identifier of the utility that generated the message. This is usually the name of the program (e.g., rpc.ldmd, pqact). For a downstream LDM, however, the identifier is the name of computer that's running the upstream LDM. For an upstream LDM, the identifier has the form
downhost(mode)
where:
mode
Is (typically) "feed" or (atypically) "noti" to indicate that the upstream LDM results from a FEEDME or NOTIFYME request, respectively.
maj.min.bug
Is the major, minor, and bug-fix version numbers, respectively, of an LDM release.
srcloc
Is the location in the source-code where the log-message was generated in the form
filename:lineno
where:
filename
Is the name of the source-file where the log-message was generated.
lineno
Is the line-number in filename where the log-message was generated.
YYYYMMDDhhmmss.sss
Is a time composed of the year (YYYY) month (MM) day (DD) hour (hh) minute (mm) and second (ss.sss).