[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20041109: rpc.ldmd processes & their children (LDM 5.1.4/6.0.14)



Jamie,

> To: address@hidden
> From: Jamie Pelagatti <address@hidden>
> Subject: rpc.ldmd processes & their children (LDM 5.1.4/6.0.14)
> Organization: UCAR/Unidata
> Keywords: 200411091933.iA9JXhvV000066

The above message contained the following:

> We upgraded from LDM 5.1.4 to 6.0.14 a while ago and I'm trying to
> track down a problem that has appeared involving multiple rpc.ldmd
> processes and their child processes.
>
> We are acquiring NEXRAD data from 40 or so radars. For convenience, we
> have a script that starts the rpc.ldmd daemon, creating various LDM
> config files on the fly. The script is run once for each radar feed
> desired. Typically, we run our script only once on each machine that
> needs NEXRAD data. However, we may sometimes run it a number of times
> on the same machine for various reasons.
>
> All this worked fine with LDM 5.1.4 but, as we upgraded to 6.0.14, we
> found that we can no longer run the script more than once on a machine
> (i.e. spawn multiple rpc.ldmd processes).  Instead, rpc.ldmd simply
> complains and exits without spawning the needed process to ingest the
> data.
>
> Here's exactly what happens, assuming, say, we want two NEXRAD
> feeds. (I turned on debugging.)  In version 5.1.4:
> 
> 1. We start an rpc.ldmd for the first feed.
> 2. It spawns a child to do the ingesting.
> 
>   Nov 09 16:03:11 rpc.ldmd[29780]: Starting Up (built: Oct 30 2001 15:12:38)
>           Mapping 53305344
>   Nov 09 16:03:11 129.55.62.31[29782]: run_requester: Starting Up: 
> 129.55.62.31
>   Nov 09 16:03:11 129.55.62.31[29782]: run_requester: 20041109154811.388 
> TS_ENDT {{NEXRD2,  
> "^L2.*KIWX"}}
>           port 43108
>           tcp sock: 0
>           FEEDME(129.55.62.31) returns OK
>   Nov 09 16:03:11 129.55.62.31[29782]: FEEDME(129.55.62.31): OK
>           PQUEUE_DUP
>         
> 3. We start a second rpc.ldmd for the second feed.
> 4. It spawns a child to do the ingesting.
> 5. The second rpc.ldmd detects another master rpc.ldmd running and exits 
> leaving a master and 
> two children to do the ingesting.
> 
>   Nov 09 16:05:00 rpc.ldmd[29809]: Starting Up (built: Oct 30 2001 15:12:38)
>           Mapping 53305344
>           port 45857
>   Nov 09 16:05:00 129.55.62.32[29811]: run_requester: Starting Up: 
> 129.55.62.32
>   Nov 09 16:05:00 129.55.62.32[29811]: run_requester: 20041109155000.019 
> TS_ENDT {{NEXRD2,  
> "^L2.*KLOT"}}
>   Nov 09 16:05:00 rpc.ldmd[29809]: Another server is already running at 
> 127.0.0.1 on port 
> 43108.
>   Nov 09 16:05:00 rpc.ldmd[29809]: Version 5
>   Nov 09 16:05:00 rpc.ldmd[29809]: Exiting
>   FEEDME(129.55.62.32) returns OK
>   Nov 09 16:05:00 129.55.62.32[29811]: FEEDME(129.55.62.32): OK
> 
> But in version 6.0.14:
> 
> 1. We start an rpc.ldmd for the first feed.
> 2. It spawns a child to do the ingesting.
> 
>   Nov 09 15:51:33 rpc.ldmd[29700]: Starting Up (version: 6.0.14; built: May  
> 6 2004 17:47:35)
>           main(): Opening product-queue
>           Mapping 53305344
>           main(): Creating service portal
>           create_ldm_tcp_svc(): Checking for another LDM
>           create_ldm_tcp_svc(): Getting TCP socket
>           create_ldm_tcp_svc(): Eliminating EADDRINUSE problem.
          create_ldm_tcp_svc(): Getting root privs
>           create_ldm_tcp_svc(): Binding socket
>           create_ldm_tcp_svc(): Calling getsockname()
>           port 58420
>           create_ldm_tcp_svc(): Calling listen()
>           create_ldm_tcp_svc(): Checking portmapper
>           create_ldm_tcp_svc(): Registering
>           create_ldm_tcp_svc(): Releasing root privs
>           tcp sock: 0
>           main(): Reading configuration-file
>           main(): Serving socket
>   Nov 09 15:51:33 129.55.62.31[29702]: Starting Up(6.0.14): 129.55.62.31: 
> TS_ZERO TS_ENDT 
> {{CRAFT,  "^L2.*KIWX"}}
>   Nov 09 15:51:33 129.55.62.31[29702]: Desired product class: 
> 20041109153633.345 TS_ENDT 
> {{CRAFT,  "^L2.*KIWX"}}
>   Nov 09 15:51:33 129.55.62.31[29702]: Connected to upstream LDM-6
>           requester6.c:274: Calling feedme_6(...)
>   Nov 09 15:51:33 129.55.62.31[29702]: Upstream LDM is willing to feed
>           requester6.c:524: Calling run_service()
>           requester6.c:187: Downstream LDM initialized
>           PQUEUE_DUP
>           pq_insertNoSig(): rpqe_new() failure
> 
> 3. We start a second rpc.ldmd for the second feed.
> 4. The second rpc.ldmd detects another master rpc.ldmd running and exits.
> 
>   Nov 09 15:55:56 rpc.ldmd[29740]: Starting Up (version: 6.0.14; built: May  
> 6 2004 17:47:35)
>           main(): Opening product-queue
>           Mapping 53305344
>           main(): Creating service portal
>           create_ldm_tcp_svc(): Checking for another LDM
>   Nov 09 15:55:57 rpc.ldmd[29740]: Version 6 LDM already running on local host
>   Nov 09 15:55:57 rpc.ldmd[29740]: Version 5 LDM already running on local host
>   Nov 09 15:55:57 rpc.ldmd[29740]: Exiting
> 
> In other words, the daemon now exits before spawning the required
> child process. I found nothing about this change of behavior in any
> release notes or problems list so I'm wondering if it's intentional. I
> have examined the CVS histories and I think the change occurred in
> ldmd.c version 1.174.
> 
> We are aware that we can have multiple data requests in the rpc.ldmd
> config file rather than run rpc.ldmd multiple times, but our scripts
> work well and are geared to the latter behavior.  Thus, I want to find
> out if this is a "bug" or a "feature" before I rewrite them.

I'm sorry to say that your scripts were relying on undocumented behavior
of the LDM system.  The LDM system was never intended to work if another
LDM system was already running.  That intention was reified in version
6.0 of the LDM.

On the bright side, version 6.1 of the LDM now supports an "ldmadmin
restart" command, so it should be possible to modify the LDM
configuration-file and then execute this command.

> Thanks.
> 
> ---------------------------+---------------------------
> James M. Pelagatti (Jamie) | MIT Lincoln Laboratory    
>   Software Engineer        | Group 43 (Weather Sensing)
>   (781) 981-1886           | 244 Wood St., Room S1-611 
>   FAX: (781) 981-0632      | Lexington, MA 02420-9108  
>   mailto:address@hidden  | http://www.ll.mit.edu     

Regards,
Steve Emmerson

> NOTE: All email exchanges with Unidata User Support are recorded in the
> Unidata inquiry tracking system and then made publicly available
> through the web.  If you do not want to have your interactions made
> available in this way, you must let us know in each email you send to us.