[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #CYR-432236]: NWS/NCEP/AWC LDM 6.7.0 woes



Mick,

> I'm running into some LDM issues and Marc Singer recommended I emailed
> you with my findings.
> 
> First, here are some system facts:
> 
> CPU:  2 x dual-core AMD Opteron 2216 @ 2.4 GHz
> MEM:  6 GiB
> OS:   RHEL5.3 x86_64
> Kernel:       2.6.18-128.1.1.el5
> GCC:  gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)
> LDM:  6.7.0

Nice system.  The LDM should run on it.

> I'm trying to build LDM 6.7.0, and while the build process itself had an
> issue ($LDMHOME/$VDIR/src/server - complaint about undefined
> 'ldm_version'), I was able to get past that:
> 
> ==============8<=========================
> --- Makefile.old        2009-03-25 14:31:40.000000000 +0000
> +++ Makefile    2009-03-25 12:27:35.000000000 +0000
> @@ -4,8 +4,9 @@
> #
> include ../macros.make
> 
> -INCLUDES = -I../config -I../misc -I../ulog -I../protocol -I../pq
> +INCLUDES = -I ../ -I../config -I../misc -I../ulog -I../protocol -I../pq
> TAG_SRCS       = \
> +       ../*.c ../*.h \
> ../misc/*.c ../misc/*.h \
> ../ulog/*.c ../ulog/*.h \
> ../protocol/*.c ../protocol/*.h \
> ==============8<=========================
> 
> However, when trying to create a queue (through ldmadmin or from
> commandline directly) I get a SIGSEGV. Here's an strace:
> 
> ==============8<=========================
> [ldm@server_101 pqcreate]$ strace ./pqcreate -v -f -s 100 -S 10 -q
> $HOME/data/ldm.pq
> execve("./pqcreate", ["./pqcreate", "-v", "-f", "-s", "100", "-S", "10",
> "-q", "/usr/local/ldm/data/ldm.pq"], [/* 22 vars */]) = 0
> brk(0)                                  = 0xb86d000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x2b321f397000
> uname({sys="Linux", node="server_101.eee", ...}) = 0
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or
> directory)
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=35594, ...}) = 0
> mmap(NULL, 35594, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b321f398000
> close(3)                                = 0
> open("/lib64/libm.so.6", O_RDONLY)      = 3
> read(3,
> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`>\300\0239\0\0\0"...,
> 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=615136, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x2b321f3a1000
> mmap(0x3913c00000, 2629848, PROT_READ|PROT_EXEC,
> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3913c00000
> mprotect(0x3913c82000, 2093056, PROT_NONE) = 0
> mmap(0x3913e81000, 8192, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x81000) = 0x3913e81000
> close(3)                                = 0
> open("/lib64/libc.so.6", O_RDONLY)      = 3
> read(3,
> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\332\201\0229\0\0\0"...,
> 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=1713088, ...}) = 0
> mmap(0x3912800000, 3494168, PROT_READ|PROT_EXEC,
> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3912800000
> mprotect(0x391294c000, 2097152, PROT_NONE) = 0
> mmap(0x3912b4c000, 20480, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14c000) = 0x3912b4c000
> mmap(0x3912b51000, 16664, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3912b51000
> close(3)                                = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x2b321f3a2000
> arch_prctl(ARCH_SET_FS, 0x2b321f3a27c0) = 0
> mprotect(0x3912b4c000, 16384, PROT_READ) = 0
> mprotect(0x3913e81000, 4096, PROT_READ) = 0
> mprotect(0x391261b000, 4096, PROT_READ) = 0
> munmap(0x2b321f398000, 35594)           = 0
> fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x2b321f398000
> write(1, "pqfname=/usr/local/ldm/data/ldm."...,
> 47pqfname=/usr/local/ldm/data/ldm.pq, pflags=129
> ) = 47
> write(2, "Creating /usr/local/ldm/data/ldm"..., 61Creating
> /usr/local/ldm/data/ldm.pq, 100 bytes, 10 products.
> ) = 61
> brk(0)                                  = 0xb86d000
> brk(0xb88e000)                          = 0xb88e000
> open("/usr/local/ldm/data/ldm.pq", O_RDWR|O_CREAT|O_EXCL|O_TRUNC, 0666) = 3
> fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=4096}) = 0
> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> lseek(3, 12284, SEEK_SET)               = 12284
> write(3, "\0\0\0\0", 4)                 = 4
> mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x2b321f399000
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
> ==============8<=========================
> 
> The result is the same regardless of if the filesize is 100 B or 500
> MiB. It looks like that mmap makes the kernel unhappy, but I'm not
> skilled enough to see why.

I am skilled enough, and a SIGSEGV shouldn't have been issued ---
at least not according to the strace(1) output (thanks for that).

Firstly, is /usr/local/ldm/data local to the system?  Is it on a
RAID?

It's possible that the SIGSEGV didn't occur within the mmap(2) call
but from just after it's return.  Would you please execute the same
pqcreate(1) command in a debugger and see where the SIGSEGV occurs.
It would be best if debugging were enabled by 1) executing "make
distclean" in the top-level source-directory; 2) setting the
environment variable CFLAGS to "-g"; 3) executing the "configure"
script; and 4) executing "make".

When that's done, then cd(1) into the "pqcreate" directory and execute
the commands

    rm /usr/local/ldm/data/ldm.pq
    gdb pqcreate

Inside gdb(1), execute the command

    run -s 400M -q /usr/local/ldm/data/ldm.pq

Send me a stack trace when it receives a SIGSEGV.

 The corresponding lines (187,188) in
> pqcreate.c are:
> 
> ==============8<=========================
> errnum = pq_create(pqfname, 0666, pflags,
> 0, initialsz, nproducts, &pq);
> ==============8<=========================
> 
> I did not save the configure/make logfiles thinking it was probably an
> issue past that, but if you want me to send those to you, I can.
> 
> Thank you in advance for any insight you can offer.


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: CYR-432236
Department: Support LDM
Priority: Normal
Status: Closed