[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[TIGGE #CUA-629523]: Re: dataportal not receiving data from tigge-ldm.ecmwf.int



Manuel,

> This is the output of "lsof | egrep 'PID|unidata'" on tigge-portal:
> COMMAND     PID       USER   FD      TYPE             DEVICE       SIZE
> NODE NAME
> rpc.ldmd  29317        ldm    0u     IPv4            1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd  29317        ldm    1u     IPv4            1168838
> TCP
> tigge-portal.ecmwf.int:unidata-ldm->tigge-ldm.ecmwf.int:45328 (CLOSE_WAIT)
> rpc.ldmd  29321        ldm    0u     IPv4            1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd  29321        ldm    4u     IPv4            2421682
> TCP
> tigge-portal.ecmwf.int:48653->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> rpc.ldmd  29322        ldm    0u     IPv4            1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd  29322        ldm    3u     IPv4            3064475
> TCP
> tigge-portal.ecmwf.int:55991->tigge-ldm.ecmwf.int:unidata-ldm (SYN_SENT)
> rpc.ldmd  29322        ldm    4u     IPv4            2421860
> TCP
> tigge-portal.ecmwf.int:48659->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> rpc.ldmd  29323        ldm    0u     IPv4            1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd  29323        ldm    3u     IPv4            3064477
> TCP
> tigge-portal.ecmwf.int:55992->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> rpc.ldmd  29325        ldm    0u     IPv4            1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd  29325        ldm    3u     IPv4            3064474
> TCP
> tigge-portal.ecmwf.int:55990->tigge-ldm.ecmwf.int:unidata-ldm (SYN_SENT)
> rpc.ldmd  29326        ldm    0u     IPv4            1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd  29326        ldm    4u     IPv4            2421808
> TCP
> tigge-portal.ecmwf.int:48657->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> 
> All rpc.ldmd do listen on port 388. And this is because when a process
> fork(2) another process, the child inherits the open file descriptors of
> the parent process. This is normal behaviour.

One of the very first things a child LDM process does is to close the listening 
socket (see "server/ldmd.c"; search for "fork()") Therefore, you should never 
see what you did see unless something is very wrong, in my opinion.

Also, the ps(1) output you sent showed multiple, top-level LDM servers.  While 
not impossible, this also shouldn't happen.

The netstat(1) utility on one of our Linux systems has a "-p" option that 
prints the PID.  Can you verify multiple LDM listeners using that utility?

> I suppose the one with the lowest PID. I have been digging in logs, and
> this is the extract of the logfiles when I last started LDM:
> 
> Apr 11 08:46:04 tigge-ldm rpc.ldmd[31899] NOTE: Starting Up (version:
> 6.4.5.1; built: Jan 23 2006 22:38:02)
> Apr 11 08:46:04 tigge-ldm rpc.ldmd[31899] NOTE: Using local address
> 0.0.0.0:388
> Apr 11 08:46:04 tigge-ldm pqact[31903] NOTE: Starting Up
> Apr 11 08:46:04 tigge-ldm rtstats[31904] NOTE: Starting Up (31899)
> Apr 11 08:46:04 tigge-ldm tigge-portal[31907] NOTE: Starting
> Up(6.4.5.1): tigge-portal.ecmwf.int:388 20060411074604.938 TS_ENDT {{A
> NY,  "\.missing$"}}
> Apr 11 08:46:04 tigge-ldm dataportal[31906] NOTE: Starting Up(6.4.5.1):
> dataportal.ucar.edu:388 20060411074604.938 TS_ENDT {{ANY,
> "\.missing$"}}
> Apr 11 08:46:04 tigge-ldm pqact[31903] INFO: Successfully read
> configuration-file "etc/tigge_pqact.conf"
> Apr 11 08:46:05 tigge-ldm pqact[31903] INFO: TS_ZERO TS_ENDT {{ANY,
> "missing"}}
> Apr 11 08:46:05 tigge-ldm pqact[31903] INFO:        0 20060411084605.347
> ANY 000  _BEGIN_
> Apr 11 08:46:05 tigge-ldm dataportal[31906] INFO: No matching
> data-product in product-queue
> Apr 11 08:46:05 tigge-ldm tigge-portal[31907] INFO: No matching
> data-product in product-queue
> Apr 11 08:46:05 tigge-ldm dataportal[31906] NOTE: LDM-6 desired
> product-class: 20060411074605.349 TS_ENDT {{ANY,  "\.missing$"}}
> Apr 11 08:46:05 tigge-ldm tigge-portal[31907] NOTE: LDM-6 desired
> product-class: 20060411074605.349 TS_ENDT {{ANY,  "\.missing$"}}
> Apr 11 08:46:05 tigge-ldm dataportal[31906] INFO: Connected to upstream
> LDM-6 on host dataportal.ucar.edu using port 388
> Apr 11 08:46:05 tigge-ldm dataportal[31906] NOTE: Upstream LDM-6 on
> dataportal.ucar.edu is willing to be a primary feeder
> pqinsert INFO:  9205744 20060411084605.849     EXP 000
> z_tigge_c_ecmf_20060410120000.manifest
> Apr 11 08:46:06 tigge-ldm rpc.ldmd[31899] INFO: RPC buffer sizes for
> dataportal.ucar.edu: send=16384; recv=87380
> Apr 11 08:46:06 tigge-ldm dataportal[31913] INFO: Connection from
> dataportal.ucar.edu
> pqinsert INFO:   428963 20060411084606.439     EXP 000
> z_tigge_c_ecmf_20060410120000_0001_pf_pl_0090_002_0600_u.grib:88065
> pqinsert INFO:   428963 20060411084606.484     EXP 000
> z_tigge_c_ecmf_20060410120000_0001_pf_pl_0090_002_0600_v.grib:88066
> 
> 
> After all this information, what do you want me to do ? Do you still
> want me to go ahead with:
> ldmadmin stop
> kill remaining
> ldmadmin clean
> pqcheck -v
> check everything is gone
> ldmadmin start

Try using netstat(1) to verify multiple listeners.  Then, stop everything, 
restart, and see if you get multiple top-level LDM-s again.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: CUA-629523
Department: Support IDD TIGGE
Priority: Normal
Status: On Hold


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.