Ticket #106 (new bug)
Opened 6 months ago
Segmentation Fault with trove debugging enabled
| Reported by: | harms | Owned by: | slang |
|---|---|---|---|
| Priority: | major | Component: | SERVER |
| Version: | 2.8.1 | Keywords: | |
| Cc: |
Description
Hey everyone -
Nick and I are digging a little bit into trove and have found a bit of a bug. When trove debugging is enabled (by way of the config file "trove" flag) the server will crash under I/O calls (namely pvfs2-cp).
It sometimes runs for a few seconds before crashing but it's
consistent enough to seg fault every time I try to transfer a 256 MB file onto or off of the server. I tested on both 32 bit RHEL5 and 64 bit Fedora 10, release 2.8.1 on both. Only one server was running and it was acting as both a metadata and an I/O server.
I believe it's something to do with threading since it happens when printing out a status message (I'm fairly certain the call to gossip_debug() on line 327 of dbpf-bstream.c is the culprit). Here is the last bit of the log file and the stack trace from gdb on RHEL5 32 bit:
[D 05/26 13:39] aio_progress_notification: BSTREAM_READ_LIST complete: aio_return() says 262144 [fd = 11] [D 05/26 13:39] *** starting delayed ops if any (state is LIST_PROC_ALLPOSTED) [D 05/26 13:39] DBPF I/O ops in progress: 1 [New Thread 0xb56a0b90 (LWP 1272)] [Thread 0xb2cfeb90 (LWP 1271) exited] [D 05/26 13:39] issue_or_delay_io_operation: lio_listio posted 0xa0d0ec8 (handle 9223372036854775805, ret 0) [D 05/26 13:39] --- aio_progress_notification called with handle 9223372036854775805 (0xa0d0ec8) [D 05/26 13:39] aio_progress_notification: BSTREAM_READ_LIST complete: aio_return() says 262144 [fd = 11]
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb56a0b90 (LWP 1272)] 0x00c04993 in strlen () from /lib/libc.so.6 (gdb) bt #0 0x00c04993 in strlen () from /lib/libc.so.6 #1 0x00bd4bce in vfprintf () from /lib/libc.so.6 #2 0x00bf53b4 in vsnprintf () from /lib/libc.so.6 #3 0x08059bed in gossip_debug_fp_va (fp=0xb569fb5c,
prefix=<value optimized out>, format=0xb569fc80 "*** starting delayed ops if any (state is ST
complete: aio_return() says 262144 [fd = 11]\n", ap=0xb56a00d0 "t: hpz\016\b", ts=13455348)
at src/common/gossip/gossip.c:506
#4 0x0805a041 in gossip_debug (mask=65536, prefix=63 '?',
format=0x80dc3b0 "*** starting delayed ops if any (state is %s)\n") at src/common/gossip/gossip.c:281
#5 0x080a9ed9 in aio_progress_notification (sig=
{sival_int = 168627912, sival_ptr = 0xa0d0ec8})
at src/io/trove/trove-dbpf/dbpf-bstream.c:237
#6 0x080ba89c in alt_lio_thread (foo=0xa0d0ce8)
at src/io/trove/trove-dbpf/dbpf-alt-aio.c:275
#7 0x00d0f49b in start_thread () from /lib/libpthread.so.0 #8 0x00c6642e in clone () from /lib/libc.so.6
Thanks, - Dave
![(please configure the [header_logo] section in trac.ini)](/projects/pvfs/chrome/common/trac_banner.png)