Learn more about these different git repos.
Other Git URLs
After #1037 had occurred (a SIGSEGV in sssd_be) logins were no longer possible on the system, although resolving of uids was still functional.
Logged in /var/log/secure was:
crond[8488]: pam_sss(crond:account): Request to sssd failed. Timer expired
Restarting SSSD resumed normal operation, but pam_sssd segfaulted on restart. The core file for this was not retained by abrtd.
syslog exerpt of restart syslog
This excerpt from syslog is interesting:
Oct 7 08:34:39 machine kernel: Process 12916(abrt-hook-ccpp) has RLIMIT_CORE set to 1
The abrt developers say that this indicates that abrt itself might have crashed. IIRC you're running 6.0..that release had a bug where if /proc/<PID>/exe dissapeared before abrt was able to read it, abrt crashed. Incidentally, I saw something similar in logs of one of our test systems (running 6.2 beta):
kernel: sssd_pam[3919]: segfault at 3032756193 ip 0000003c1ac0247d sp 00007ffff9b48528 error 4 in libtevent.so. 0.9.8[3c1ac00000+9000] abrt[4159]: Can't read /proc/3919/exe link
I'll try to reproduce the problem locally as well..
In the meantime, I've got two suggestions that might help us catch the bug should it hit you again: 1. raise the value /proc/sys/kernel/core_pipe_limit. If it was set to a very low value (1 or 2 perhaps). If multiple processes crashed at the same time, the kernel would not wait for the core-collecting process to grab the info, so later /proc/<pid>/exe would not be available when abrt gets to process it and abrt would crash 2. raising ulimit to unlimited should produce the core file in a "classic" way even if abrt chokes on it
/proc/sys/kernel/core_pipe_limit
Picking up, I can reproduce this now at will.
owner: somebody => jhrozek
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.5.14 priority: major => blocker
The backtrace:
(gdb) bt full #0 _tevent_add_timer (ev=0x1927250, mem_ctx=0x19288b0, next_event=..., handler=0x340841b790 <ltdb_callback>, private_data=0x19288b0, handler_name=0x340842ae7e "ltdb_callback", location=0x340842ae67 "ldb_tdb/ldb_tdb.c:1282") at tevent.c:358 No locals. #1 0x000000340841a163 in ltdb_handle_request (module=0x1927860, req=0x1928620) at ldb_tdb/ldb_tdb.c:1282 ldb = 0x1927160 ev = 0x1927250 ac = 0x19288b0 te = <value optimized out> tv = {tv_sec = 0, tv_usec = 0} #2 0x000000340841196a in ldb_next_request (module=0x1927860, request=0x1928620) at common/ldb_modules.c:563 ret = <value optimized out> #3 0x000000340840808b in asq_search (module=0x1927380, req=0x1928620) at modules/asq.c:358 ldb = 0x1927160 base_req = <value optimized out> control = 0x0 ac = <value optimized out> ret = 2 #4 0x000000340840bc2f in ldb_search (ldb=0x1927160, mem_ctx=0x192aef0, result=0x7fff1d8d0500, base=<value optimized out>, scope=<value optimized out>, attrs=0x647aa0, exp_fmt=0x43e5a0 "(&(objectclass=user)(|(nameAlias=%s)(name=%s)))") at common/ldb.c:1349 req = 0x1928620 res = 0x1923cf0 expression = 0x19249b0 "(&(objectclass=user)(|(nameAlias=kau20)(name=kau20)))" ap = {{gp_offset = 48, fp_offset = 48, overflow_arg_area = 0x7fff1d8d0498, reg_save_area = 0x7fff1d8d03a0}} ret = <value optimized out> #5 0x00000000004212dc in sysdb_getpwnam (mem_ctx=0x19269f0, ctx=0x1927060, domain=0x1922610, name=0x1929b20 "kau20", _res=0x1926a10) at src/db/sysdb_search.c:64 tmpctx = 0x192aef0 attrs = {0x43ee84 "name", 0x43ef7d "uidNumber", 0x43ef61 "gidNumber", 0x43ef87 "gecos", 0x43ef8d "homeDirectory", 0x43ef9b "loginShell", 0x43ef19 "lastUpdate", 0x43ef24 "dataExpireTimestamp", 0x43ef38 "initgrExpireTimestamp", 0x43e6fd "objectClass", 0x0} base_dn = 0x1931c30 res = 0x0 sanitized_name = 0x1927310 "kau20" ret = 0 #6 0x000000000040bbef in pam_check_user_search (preq=0x19269f0) at src/responder/pam/pamsrv_cmd.c:844 dom = 0x1922610 cctx = 0x1928cc0 name = 0x1929b20 "kau20" sysdb = 0x1927060 cacheExpire = 26365472 ret = 0 __FUNCTION__ = "pam_check_user_search" #7 0x000000000040c36f in pam_check_user_dp_callback (err_maj=3, err_min=5, err_msg=0x4424fa "Internal Error", ptr=0x19269f0) at src/responder/pam/pamsrv_cmd.c:955 preq = 0x19269f0 ret = 0 pctx = 0x1924060 __FUNCTION__ = "pam_check_user_dp_callback" #8 0x0000000000436495 in sss_dp_req_destructor (ptr=0x1924e20) at src/responder/common/responder_dp.c:100 sdp_req = 0x1924e20 cb = 0x192b760 next = 0x0 key = {type = HASH_KEY_STRING, {str = 0x1931920 "3kau20@LDAP", ul = 26417440}} hret = 0 __FUNCTION__ = "sss_dp_req_destructor" #9 0x000000340a002d9e in _talloc_free_internal (ptr=0x1924e20, location=0x340a007b1d "talloc.c:1893") at talloc.c:600 d = 0x4362da <sss_dp_req_destructor> tc = 0x4362da #10 0x000000340a002c2b in _talloc_free_internal (ptr=0x19222b0, location=0x340a007b1d "talloc.c:1893") at talloc.c:631 child = 0x1924e20 new_parent = 0x0 tc = 0x1924e20 #11 0x000000340a002c2b in _talloc_free_internal (ptr=0x1924060, location=0x340a007b1d "talloc.c:1893") at talloc.c:631 child = 0x19222b0 new_parent = 0x0 tc = 0x19222b0 #12 0x000000340a002c2b in _talloc_free_internal (ptr=0x1921420, location=0x340a007b1d "talloc.c:1893") at talloc.c:631 child = 0x1924060 new_parent = 0x0 tc = 0x1924060 #13 0x000000340a002c2b in _talloc_free_internal (ptr=0x1920320, location=0x340a007b1d "talloc.c:1893") at talloc.c:631 child = 0x1921420 new_parent = 0x0 tc = 0x1921420 #14 0x000000340a001abb in _talloc_free_internal (ptr=0x1920140, location=0x340a007b1d "talloc.c:1893") at talloc.c:631 child = 0x1920320 new_parent = 0x0 #15 _talloc_free (ptr=0x1920140, location=0x340a007b1d "talloc.c:1893") at talloc.c:1133 tc = 0x1920320 #16 0x0000003406035d92 in __run_exit_handlers (status=0) at exit.c:78 atfct = <value optimized out> onfct = <value optimized out> cxafct = <value optimized out> f = <value optimized out> #17 exit (status=0) at exit.c:100 No locals. #18 0x000000000042d623 in default_quit (ev=0x1920320, se=0x19211a0, signum=15, count=1, siginfo=0x0, private_data=0x0) at src/util/server.c:251 done_sigterm = 0 __FUNCTION__ = "default_quit" #19 0x0000003407403baa in tevent_common_check_signal (ev=0x1920320) at tevent_signal.c:353 se = 0x19211a0 count = 1 sl = <value optimized out> next = 0x0 counter = {count = <value optimized out>, seen = 0} clear_processed_siginfo = <value optimized out> i = <value optimized out> #20 0x00000034074054fa in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at tevent_standard.c:267 ret = -1 i = <value optimized out> events = {{events = 17, data = {ptr = 0x1931c30, fd = 26418224, u32 = 26418224, u64 = 26418224}}} timeout = <value optimized out> #21 std_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at tevent_standard.c:544 std_ev = 0x19203e0 tval = {tv_sec = 0, tv_usec = 999965} #22 0x00000034074026d0 in _tevent_loop_once (ev=0x1920320, location=0x440ba7 "src/util/server.c:526") at tevent.c:490 ret = <value optimized out> nesting_stack_ptr = 0x0 #23 0x000000340740273b in tevent_common_loop_wait (ev=0x1920320, location=0x440ba7 "src/util/server.c:526") at tevent.c:591 ret = <value optimized out> #24 0x000000000042e67f in server_loop (main_ctx=0x1921420) at src/util/server.c:526 No locals. #25 0x0000000000408487 in main (argc=4, argv=0x7fff1d8d0d08) at src/responder/pam/pamsrv.c:230 opt = -1 pc = 0x191f010 main_ctx = 0x1921420 ret = 0 long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x647b40, val = 0, descrip = 0x43a1e6 "Help options:", argDescrip = 0x0}, {longName = 0x43a1f4 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x647c38, val = 0, descrip = 0x43a18f "Debug level", argDescrip = 0x0}, {longName = 0x43a200 "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x647c3c, val = 0, descrip = 0x43a1a0 "Send the debug output to files instead of stderr", argDescrip = 0x0}, { longName = 0x43a20f "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x647b00, val = 0, descrip = 0x43a1d1 "Add debug timestamps", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main" (gdb) print ev $2 = (struct tevent_context *) 0x1927250 (gdb) print *ev $3 = {ops = 0x3400000002, fd_events = 0x2, timer_events = 0x1927bf0, immediate_events = 0x0, signal_events = 0x0, additional_data = 0x1927310, pipe_fde = 0x0, pipe_fds = {49, 0}, debug_ops = {debug = 0x1926b70, context = 0x19286f0}, nesting = { allowed = true, level = 0, hook_fn = 0, hook_private = 0x30}}
Core file: http://jhrozek.fedorapeople.org/sssd_pam_crash/sssd_pam_debug.core.tar.gz
Valgrind seems to suggest we're accessing a memory that was already free()-d:
==7271== Invalid read of size 8 ==7271== at 0x340A001EFA: talloc_get_name (in /usr/lib64/libtalloc.so.2.0.1) ==7271== by 0x340A001F5D: talloc_check_name (in /usr/lib64/libtalloc.so.2.0.1) ==7271== by 0x34074052E4: ??? (in /usr/lib64/libtevent.so.0.9.8) ==7271== by 0x34074026CF: _tevent_loop_once (in /usr/lib64/libtevent.so.0.9.8) ==7271== by 0x340840A463: ldb_wait (in /usr/lib64/libldb.so.0.9.10) ==7271== by 0x340840BC48: ldb_search (in /usr/lib64/libldb.so.0.9.10) ==7271== by 0x42BB5C: sysdb_getpwnam (sysdb_search.c:59) ==7271== by 0x4093E4: pam_check_user_search (pamsrv_cmd.c:849) ==7271== by 0x40D438: pam_check_user_dp_callback (pamsrv_cmd.c:960) ==7271== by 0x4154E5: sss_dp_req_destructor (responder_dp.c:100) ==7271== by 0x340A002D9D: ??? (in /usr/lib64/libtalloc.so.2.0.1) ==7271== by 0x340A002C2A: ??? (in /usr/lib64/libtalloc.so.2.0.1) ==7271== Address 0x4c4d3f0 is 48 bytes inside a block of size 104 free'd ==7271== at 0x4A0595D: free (vg_replace_malloc.c:366) ==7271== by 0x340A002CA7: ??? (in /usr/lib64/libtalloc.so.2.0.1) ==7271== by 0x340A002C2A: ??? (in /usr/lib64/libtalloc.so.2.0.1) ==7271== by 0x340A001ABA: _talloc_free (in /usr/lib64/libtalloc.so.2.0.1) ==7271== by 0x3406035D91: exit (in /lib64/libc-2.12.so) ==7271== by 0x438426: default_quit (server.c:251) ==7271== by 0x3407403BA9: ??? (in /usr/lib64/libtevent.so.0.9.8) ==7271== by 0x34074054F9: ??? (in /usr/lib64/libtevent.so.0.9.8) ==7271== by 0x34074026CF: _tevent_loop_once (in /usr/lib64/libtevent.so.0.9.8) ==7271== by 0x340740273A: ??? (in /usr/lib64/libtevent.so.0.9.8) ==7271== by 0x437E36: server_loop (server.c:571) ==7271== by 0x4087B9: main (pamsrv.c:235)
owner: jhrozek => jzeleny patch: 0 => 1 status: new => assigned
Fixed in: - 6d0fbda (master) - 1b8f712 (sssd-1-6) - 453698e (sssd-1-5)
resolution: => fixed status: assigned => closed
rhbz: => 0
Metadata Update from @prefect: - Issue assigned to jzeleny - Issue set to the milestone: SSSD 1.5.14
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/2080
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.