While some LDM utilities are designed to write diagnostic and informational messages to standard output, most utilities (in particular rpc.ldmd) are designed to write such messages to a logfile. Typically, the logfile is $LDMHOME/logs/ldmd.log, where $LDMHOME is the home-directory of the LDM user.
In general, the format of most log-messages is
MMM DD hh:mm:ss localhost util[pid]: msg
Associated with each log-message is a priority-level. The levels are: error, warning, notice, informational, and debug. The priority-level is used to decide whether or not to write the message to the logfile. Most utilities log errors, warnings, and notices by default. This behavior can usually be changed when the utility is executed by using the "-v" (verbose) option. This option will cause the utility to also log informational messages (and cause the logfile to grow more rapidly). Another common option is "-x", which will cause the utility to also log debug messages (and cause the logfile to grow extremely fast). You should rarely, if ever, need to use the "-x" option.
The rpc.ldmd utility has another way to alter its logging-level. Reception of a SIGUSR2 signal by the utility will cause it to change the logging-level threshold in the following sequence:
notice -> informational -> debug -> notice -> ...The kill(1) utility can be used to send a SIGUSR2 signal to the rpc.ldmd utility.
A message template is an abstract log-message in which concrete fields (e.g., UNIX process-ID, upstream host name) have been replaced with corresponding placeholders (e.g., pid, uphost). When searching this web page for the message template corresponding to an actual log-message, be sure to search for only that part of the message that doesn't vary (such parts are in a non-italic font in the templates).
In the following log-message templates, the timestamp (MMM DD DD) and localhost prefixes have been omitted and only the
util[pid]: msgfields are shown.
Message Templates for
- rpc.ldmd[pid]: Starting Up (version: maj.min.bug; built: MMM-DD-YYYY hh:mm:ss)
- A top-level (i.e., parent) rpc.ldmd was started. The package was built in month MMM day DD of year YYYY at hour hh minute mm and second ss, local time.
- rpc.ldmd[pid]: child child-pid exited with status exit-code
- The child process corresponding to child-pid terminated with an exit-code of exit-code.
- rpc.ldmd[pid]: SIGTERM
- The top-level rpc.ldmd process received a SIGTERM signal.
- rpc.ldmd[pid]: Terminating process group
- The top-level rpc.ldmd process is terminating all its child processes.
- rpc.ldmd[pid]: local_portmapper_running(): clnttcp_create() failure: : RPC: Remote system error - Connection refused
- The top-level rpc.ldmd process can't connect to the RPC portmapper daemon on the local host. This means that the process was unable to register its ports with the portmapper and will have to rely on other LDMs connecting to the well-known port 388. Generally, this is not a problem.
- rpc.ldmd[pid]: Denying connection from downhost
- The given top-level rpc.ldmd denied a connection attempt by downhost because the LDM configuration-file ldmd.conf did not authorize the connection.
- rpc.ldmd[pid]: gethostbyaddr: failed for IP-addr
- The IP address IP-addr could not be converted into a fully-qualified hostname.
- rpc.ldmd[pid]: Version vers LDM already running on local host
- The just-started, top-level rpc.ldmd succeeded in creating a client-side handle for communicating with a different rpc.ldmd on the same computer (there can be only one, top-level rpc.ldmd on any, given computer). The protocol version of the LDM is vers. This could be due to any of the following:
- A top-level, rpc.ldmd is listening on port 388. This can be checked via the rpcinfo utility.
- A top-level, rpc.ldmd is still registered with the RPC portmapper daemon on the local host -- even though it might have terminated. This can be checked via the netstat utility (which can also remove the registration).
- The mapping from hostname to IP-address is not working on the local host. This can be checked via the nslookup or dig utilities using the following hostnames (in order): the fully-qualified hostname based on the output of the command "uname -n", "localhost", and "0.0.0.0".
Notices
- uphost[pid]: Starting Up(maj.min.bug): uphost: criteria
- A downstream LDM was started that will request data-products from uphost. The initial data-product selection criteria are criteria.
- uphost[pid]: Desired product class: criteria
- The data-product selection-criteria has been modified according to the most-recent data-product in the product-queue that matched the previous criteria. The new criteria are criteria.
- uphost[pid]: Connected to upstream LDM-6
- A downstream LDM 6 is connected to an upstream LDM 6 on uphost. The downstream LDM 6 will now send a request for data.
- uphost[pid]: Upstream LDM is willing to feed
- A downstream LDM 6 sent a FEEDME request to upstream LDM 6 on uphost and the reply indicates that the upstream LDM 6 is willing to send the requested data-products.
- uphost[pid]: Connected to upstream LDM-5
- A downstream LDM 5 is connected to an upstream LDM 5 on uphost.
- uphost[pid]: FEEDME(eldm.fsl.noaa.gov): OK
- A downstream LDM 5 received a positive reply to its request to be fed from an upstream LDM 5 running on uphost.
- uphost[pid]: srcloc: Restricting request: criteria1 -> criteria2
- A downstream LDM 6 was notified by its upstream LDM 6 on uphost that the data-product selection criteria was modified from criteria1 to criteria2. This occurred at srcloc.
- uphost[pid]: SIGINT
- A downstream LDM that was being fed from uphost received a SIGINT signal.
- uphost[pid]: SIGTERM
- A downstream LDM that was being fed from uphost received a SIGTERM signal.
- uphost[pid]: Exiting
- A downstream LDM that was being fed from uphost is starting to terminate.
Errors
- uphost[pid]: ERROR: srcloc1; srcloc2: Couldn't connect to LDM 6 on uphost
- A downstream LDM 6 couldn't connect to an upstream LDM 6 on uphost. The host could be offline or not running an LDM 6. This error occurred at srcloc2. Try the ping(1) utility to see if the host is offline.
- uphost[pid]: NOTICE: srcloc1; srcloc2: nullproc_6 failure to uphost; srcloc3: RPC: Program/version mismatch; low version = 4, high version = 5
- A downstream LDM 6 was unable to connect to an upstream LDM 6 on uphost because the upstream LDM only supports the LDM 4 protocol and the LDM 5 protocol. The downstream LDM 6 will try again as a downstream LDM 5. This error occurred at srcloc3 and was further interpreted at srcloc2 and at srcloc1.
- uphost[pid]: ERROR: srcloc: Connection to upstream LDM closed
- The connection to an upstream LDM was closed for some reason. The reason might be in the LDM logfile on uphost. This error occurred at srcloc.
- uphost[pid]: ERROR: srcloc: Upstream LDM died
- A downstream LDM 6 was informed by the parent rpc.ldmd of its upstream LDM 6 that the upstream LDM 6 had terminated. The downstream LDM only makes this inquiry if it hasn't received any data-products in over one minute. This error occurred at srcloc.
- uphost[pid]: Connection closed by upstream LDM
- An upstream LDM 5 closed the connection to a its downstream LDM 5 for some reason. The reason might be in the LDM logfile on uphost.
- uphost[pid]: Connection timed-out
- A downstream LDM 5 closed the connection to its upstream LDM 5 on uphost because the downstream LDM 5's reply to a message from the upstream LDM 5 timed-out.
- uphost[pid]: ################################: never completed
- A downstream LDM 5 started receiving a new data-product from uphost while still waiting for a previous data-product to complete. The data-product signature of the outstanding data-product was "################################".
Log-messages from upstream LDMs usually have the string "(feed)" or "(noti)" in the util field of the log-message, corresponding to FEEDME and NOTIFYME requests from the downstream LDM, repectively.
Notices
- downhost(feed)[pid]: srcloc: Starting Up(maj.min.bug/6): criteria
- An upstream LDM 6 was started. It will send every data-product that matches criteria to its downstream LDM 6 running on downhost.
- downhost(feed)[pid]: topo: downhost feedtype
- An upstream LDM is ready to send data-products of feedtype feedtype to downhost.
- downhost(feed)[pid]: SIGTERM
- An upstream LDM sending data-products to downhost received a SIGTERM signal -- probably from its top-level (parent) rpc.ldmd.
- downhost(feed)[pid]: Starting Up(maj.min.bug/5): criteria
- An upstream LDM 5 was started. It will send every data-product that matches criteria. to its downstream LDM 5 running on downhost.
- downhost[pid]: ldmprog_4: ldmping from downhost
- An ldmping utility on downhost connected (briefly) to this upstream LDM.
Errors
- downhost(feed)[pid]: srcloc: nullproc_6() failure to downhost: RPC: Timed out
- An attempt by an upstream LDM 6 to flush the connection to its downstream LDM 6 on downhost failed due to an RPC timeout. This might be due to network congestion and the downstream LDM 6 might still be connected. This error occurred at srcloc.
- downhost(feed)[pid]: srcloc: nullproc_6() failure to downhost: RPC: Unable to send; errno = Broken pipe
- An attempt by an upstream LDM 6 to flush the connection to its downstream LDM 6 on downhost failed due to a broken connection. The reason for the broken connection is unknown. The reason might be in the LDM logfile on downhost. This error occurred at srcloc.
- downhost(feed)[pid]: srcloc: HEREIS: RPC: Unable to send; errno = Broken pipe
- An attempt by an upstream LDM 6 to send a data-product using a HEREIS message to its downstream LDM 6 on downhost failed due to a broken connection. The reason for the broken connection is unknown. The reason might, however, be in the LDM logfile on downhost. This error occurred at srcloc.
- downhost(feed)[pid]: srcloc: Product send failure: I/O error
- An attempt by an upstream LDM 6 to send a data-product to its downstream LDM 6 on downhost failed. The reason for the failure should have been previously printed. This error occurred at srcloc.
- downhost(feed)[pid]: prod-ID: RPC: Unable to receive (4)
- An upstream LDM 5 did not receive a timely response to a HEREIS message to its downstream LDM 5 on downhost. The upstream LDM 5 was trying to send the data-product identified by prod-ID.
- downhost(feed)[pid]: pq_sequence failed: I/O error (errno = errno)
- An upstream LDM 5 was unable to send a data-product to its downstream LDM 5 on downhost. The reason for the failure should have been previously printed. The operating-system error-code is errno.
- downhost(feed)[pid]: srcloc: nullproc_6() failure to downhost: RPC: Unable to receive; errno = Connection reset by peer
- An attempt by an upstream LDM 6 to flush the connection to its downstream LDM 6 on downhost failed because the downstream LDM 6 closed the connection. The LDM logfile on downhost might contain the reason. This error occurred at srcloc.
- downhost[pid]: NOTICE: srcloc1; srcloc2: nullproc_6 failure to thelma.ucar.edu; srcloc3: RPC: Timed out
- A downstream LDM 6 was unable to connect to an upstream LDM 6 on downhost because the NULLPROC message timed-out. The downstream LDM 6 will try again. This error occurred at srcloc1 and was further interpreted at srcloc2 and srcloc3.
- downhost(feed)[pid]: prod-ID: RPC: Timed out
- An upstream LDM 6 was unable to send a data-product to a downstream LDM 5 on downhost because the RPC layer timed-out. The data-product is identified by prod-ID.
- downhost(feed)[pid]: RECLASS: criteria
- A downstream LDM 5 running on downhost requested that the data-product selection-criteria be modified. The new selection-criteria are criteria. This action is typically due to the downstream LDM 5 noticing that the data-product creation-times of recently-arrived data-products are too old and advancing the beginning time of the data-product creation-time in order to catch-up.
- downhost(feed)[pid]: h_clnt_call: downhost: 1: time elapsed elapsed-time
- An upstream LDM 5 spent an unusually long time sending a data-product to a downstream LDM 5 running on downhost. This probably indicates network congestion.
- downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Unable to receive; errno = Connection reset by peer
- An upstream LDM 6 didn't receive a reply to a COMINGSOON message that it sent its downstream LDM 6 running on downhost because the downstream LDM 6 closed the connection. The reason for the closure might exist in the LDM logfile on downhost. This error occurred at srcloc.
- downhost(feed)[pid]: prod-ID: RPC: Unable to receive
- An upstream LDM 6 did not receive a reply to a COMINGSOON message that it sent to a downstream LDM 5 on downhost. The data-product that was being sent is identified by prod-ID.
- downhost(feed)[pid]: prod-ID: SHUTTING_DOWN
- An upstream LDM 5 sent a data-product to its downstream LDM 5 on downhost which replied that it was terminating. The data-product that was sent is identified by prod-ID.
- downhost(noti)[pid]: nullproc5(downhost): RPC: Unable to receive
- A notifying upstream LDM 5 was unable to flush the connection to its downstream LDM 5 on downhost because the downstream LDM 5 did not reply. The downstream LDM 5 was probably a notifyme process that terminated.
- downhost(feed)[pid]: srcloc: blkdata5(blockno): RPC: Timed out
- An upstream LDM 5 was unable to send a block of data to its downstream LDM 5 on downhost because the RPC-call timed-out. This error occurred at srcloc.
- downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Timed out
- An upstream LDM 6 was unable to send a COMINGSOON message to its downstream LDM 6 on downhost because the RPC-call timed-out. This error occurred at srcloc.
- downhost(feed)[pid]: srcloc: Product send failure: No such process
- An upstream LDM 6 was unable to send a data-product to to its downstream LDM 6 on downhost because that process mysteriously vanished. The reason might be found in the LDM logfile on downhost. This error occurred at srcloc.
- downhost(noti)[pid]: prod-ID: RPC: Unable to receive
- An upstream LDM 5 did not receive a timely response to its NOTIFICATION message to its downstream LDM 5 on downhost. The data-product identifier was prod-ID.
- downhost(feed)[pid]: nullproc5(downhost): RPC: Timed out
- An upstream LDM 5 was unable to send a NULLPROC message to its downstream LDM 5 on downhost because the RPC-call timed-out.
- downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Unable to send; errno = Broken pipe
- An upstream LDM 6 was unable to send a COMINGSOON message to its downstream LDM 6 on downhost because the connection was broken. The reason might be found in the LDM logfile on downhost. This error occurred at srcloc.
- downhost(feed)[pid]: srcloc: blkdata5(blockno): RPC: Server can't decode arguments
- An upstream LDM 5 was unable to send a block of data to its downstream LDM 5 on downhost because the downstream LDM 5 was unable to decode the arguments of the RPC-call.
- downhost(feed)[pid]: srcloc: COMINGSOON: RPC: Server can't decode arguments
- An upstream LDM 6 was unable to send a COMINGSOON message to its downstream LDM 6 on downhost because the downstream LDM 6 was unable to decode the arguments of the RPC-call.
- downhost(feed)[pid]: prod-ID: RPC: Server can't decode arguments (RPC-code)
- An upstream LDM 5 was unable to send a data-product to its downstream LDM 5 on downhost because the downstream LDM 5 was unable to decode the arguments of the HEREIS message. The RPC error-code is RPC-code.
- downhost(feed)[pid]: prod-ID: RPC: Unable to send
- An upstream LDM 5 was unable to send a data-product to its downstream LDM 5 for an unknown reason. The reason might be found in the LDM logfile on downhost.
Notices
- pqact[pid]: pbuf_flush fileno: time elapsed elapsed-time
- A pqact process took an unusually long time to write a block of data to an LDM decoder. The computer might be overloaded or the computer or decoder might be too slow. The file-descriptor of the pipe was fileno.
- pqact[pid]: child child-pid terminated by signal sig
- The child process of a pqact process was terminated by signal sig. This had to have been done by somthing outside the LDM system.
- pqact[pid]: unio_open: pathname: Interrupted system call
- A pqact process was unable to create file pathname because the process was interrupted by a signal.
- pqact[pid]: pipe_prodput: trying again
- A pqact process was unable to write a data-product to an LDM decoder. Because this could be due to a temporary condition, the process will try again.
- pqact[pid]: ReReading configuration file pathname
- A pqact process received a SIGHUP signal and will try to reread its configuration-file.
- rtstats[pid]: Starting Up (parent-pid)
- The rtstats utility was started by the parent process whose process-ID is parent-pid (typically, the top-level rpc.ldmd).
- rtstats[pid]: Interrupt
- The rtstats process received an interrupt signal.
- rtstats[pid]: Exiting
- The rtstats process is exiting.
- scour: [ID pid local0.notice] skipping, no recent files in pathname
- A scour process skipped the directory pathname because it contained no recent files.
Errors
- pqact[pid]: pbuf_flush (fileno) Timed out
- A pqact process was unable to flush a pipe to an LDM decoder because the flush timed-out. The computer or decoder is too slow or the computer is overloaded. The file-descriptor of the pipe was fileno.
- pqact[pid]: pipe_dbufput: pathname write error
- A pqact process was unable to write to LDM decoder pathname.
- pqact[pid]: pbuf_flush (fileno) write: Broken pipe
- A pqact process was unable to write to an LDM decoder because the pipe was disconnected -- probably due to the decoder. The file-descriptor of the pipe was fileno.
- blockno
- Is the origin-0 block-number of an LDM 5 BLKDATA message.
- child-pid
- Is the UNIX process-ID number of the relevant child process.
- criteria
- Is the Data-product selection criteria in the form
start stop {{feedtype "pattern"} [,...]}where:
- start
- Is the start-time in the form YYYYMMDDhhmmss.sss or the string TS_ZERO. The data-product creation-time must be equal-to or greater than this. The string TS_ZERO is, effectively, the beginning of time.
- stop
- Is the stop-time in the form YYYYMMDDhhmmss.sss or the string TS_ENDT. The data-product creation-time must be less-than or equal-to this. The string TS_ENDT is, effectively, the end of time.
- pattern
- Is the ERE for matching the data-product identifier.
- DD
- Is the day of the month when the message was generated (e.g., 08).
- downhost
- Is the downstream host. This can be a hostname, a fully-qualified hostname, or an IP address.
- feedtype
- Is the data-product feedtype.
- fileno
- Is the UNIX file descriptor of an open file.
- elapsed-time
- Is the elapsed-time interval in seconds.
- hh:mm:ss
- Is the time of day when the message was generated. A 24-hour UTC clock is used (e.g., 17:25:42).
- localhost
- Is the identifier of the host on which the LDM system generated the log-message.
- MMM
- Is the abbreviation of the month when the message was generated (e.g., Jan).
- msg
- Is the message from the utility.
- pathname
- Is the pathname of a file or directory.
- pid
- Is the UNIX process-ID number of the process that generated the message.
- prod-ID
- Is the data-product identifier.
- uphost
- Is the upstream host identifier. This can be a hostname, a fully-qualified hostname, or an IP address.
- util
- Is the identifier of the utility that generated the message. This is usually the name of the program (e.g., rpc.ldmd, pqact). For a downstream LDM, however, the identifier is the name of computer that's running the upstream LDM. For an upstream LDM, the identifier has the form
where:downhost(mode)
- mode
- Is (typically) "feed" or (atypically) "noti" to indicate that the upstream LDM results from a FEEDME or NOTIFYME request, respectively.
- maj.min.bug
- Is the major, minor, and bug-fix version numbers, respectively, of an LDM release.
- srcloc
- Is the location in the source-code where the log-message was generated in the form
filename:linenowhere:
- filename
- Is the name of the source-file where the log-message was generated.
- lineno
- Is the line-number in filename where the log-message was generated.
- YYYYMMDDhhmmss.sss
- Is a time composed of the year (YYYY) month (MM) day (DD) hour (hh) minute (mm) and second (ss.sss).