[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #CYR-432236]: NWS/NCEP/AWC LDM 6.7.0 woes



Mick,

> > I am skilled enough, and a SIGSEGV shouldn't have been issued ---
> > at least not according to the strace(1) output (thanks for that).
> 
> You're absolutely right, and I think I know what's going on.
> 
> On a machine as "juicy" as this, one may be tempted to make use of the
> multiple CPUs by way of parallel builds ("make -j5" since this is a
> 4-way system). However, unless the build process (makefile) is
> specifically written to handle this, it'll most likely cause grief -
> which it surely did:
> 
> ================================8<================================
> make[1]: Entering directory `/usr/local/ldm/ldm-6.7.0/src'
> Making `all' in directory /usr/local/ldm/ldm-6.7.0/src/pqcreate
> 
> 
> Making `all' in directory /usr/local/ldm/ldm-6.7.0/src/pqinsert
> ================================8<================================
> 
> Here you can see the effects of parallel builds - make is entering
> pqcreate and pqinsert at the same time.
> 
> ================================8<================================
> make[2]: Entering directory `/usr/local/ldm/ldm-6.7.0/src/pqcreate'
> c89 -c -g -m64 -DNDEBUG  -I.. -I../config -I../misc -I../ulog
> -I../protocol -I../pq pqcreate.c
> make[2]: Entering directory `/usr/local/ldm/ldm-6.7.0/src/pqinsert'
> c89 -c -g -m64 -DNDEBUG  -I.. -I../config -I../misc -I../ulog
> -I../protocol -I../pq pqinsert.c
> ar -cru ../libldm.a atofeedt.o autoshift.o globals.o h_clnt.o ldm4_svc.o
> ldm5_svc.o ldm_xdr.o ldm_xlen.o ldm5_clnt.o ldm6_clnt.o ldmprint.o
> ldm_clnt.o md5c.o one_svc_run.o prod_info.o prod_class.o savedInfo.o
> timestamp.o xdr_data.o
> ranlib ../libldm.a
> ================================8<================================
> 
> And since there's likely a dependency on libraries that haven't been
> built yet, hilariry ensues
> 
> ================================8<================================
> ../libldm.a(pq.o): In function `tq_init':
> /usr/local/ldm/ldm-6.7.0/src/pq/pq.c:465: undefined reference to `TS_ENDT'
> /usr/local/ldm/ldm-6.7.0/src/pq/pq.c:465: undefined reference to `TS_ENDT'
> /usr/local/ldm/ldm-6.7.0/src/pq/pq.c:470: undefined reference to `TS_NONE'
> /usr/local/ldm/ldm-6.7.0/src/pq/pq.c:470: undefined reference to `TS_NONE'
> /usr/local/ldm/ldm-6.7.0/src/pq/pq.c:487: undefined reference to `TS_NONE'
> /usr/local/ldm/ldm-6.7.0/src/pq/pq.c:487: undefined reference to `TS_NONE'

Indeed!  The above errors should not have occurred.

> ================================8<================================
> 
> ...etc etc ad nauseum. One could say that had I paid close enough
> attention to the stuff coming out of the 'make -j5' command, I would
> have caught it. However, what I DID pay attention to didn't indicate a
> build failure:
> 
> ================================8<================================
> ar -cru ../libldm.a atofeedt.o autoshift.o globals.o h_clnt.o ldm4_svc.o
> ldm5_svc.o ldm_xdr.o ldm_xlen.o ldm5_clnt.o ldm6_clnt.o ldmprint.o
> ldm_clnt.o md5c.o one_svc_run.o prod_info.o prod_class.o savedInfo.o
> timestamp.o xdr_data.o
> ranlib ../libldm.a
> make[2]: Leaving directory `/usr/local/ldm/ldm-6.7.0/src/protocol'
> c89 -c -g -m64 -DNDEBUG xdr_array.c
> 
> Returning to directory /usr/local/ldm/ldm-6.7.0/src
> 
> make[1]: Leaving directory `/usr/local/ldm/ldm-6.7.0/src'
> c89 -c -g -m64 -DNDEBUG xdr_float.c
> c89 -c -g -m64 -DNDEBUG xdr_mem.c
> c89 -c -g -m64 -DNDEBUG xdr_rec.c
> c89 -c -g -m64 -DNDEBUG xdr_reference.c
> c89 -c -g -m64 -DNDEBUG xdr_stdio.c
> ar -cru ../libldm.a auth_none.o auth_unix.o  authunix_prot.o
> bindresvport.o clnt_generic.o clnt_perror.o clnt_raw.o clnt_simple.o
> clnt_tcp.o clnt_udp.o rpc_dtablesize.o get_myaddress.o getrpcport.o
> pmap_clnt.o pmap_getmaps.o pmap_getport.o pmap_prot.o pmap_prot2.o
> pmap_rmt.o rpc_prot.o rpc_callmsg.o rpc_commondata.o svc.o svc_auth.o
> svc_auth_unix.o svc_raw.o svc_run.o svc_simple.o svc_tcp.o svc_udp.o
> xdr.o xdr_array.o xdr_float.o xdr_mem.o xdr_rec.o xdr_reference.o
> xdr_stdio.o
> ranlib ../libldm.a
> make[1]: Leaving directory `/usr/local/ldm/ldm-6.7.0/src/rpc'
> 
> Returning to directory /usr/local/ldm/ldm-6.7.0/src
> ================================8<================================
> 
> So in short, we're talking about a PEBCAK issue.

Been there.  Done that.  Bought the T-shirt franchise.  :-)

> To make sure you have the full picture of the events, I wanted to go
> through the motions of building it incorrectly, so I could provide a gdb
> trace. However, now it seems I'm having trouble recreating the problem.

Excellent!

> It seems to me as though the parallel build attempts was the problem.
> 
> Thanks!

Good luck.


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: CYR-432236
Department: Support LDM
Priority: Normal
Status: Closed