https://bugzilla.redhat.com/show_bug.cgi?id=863576 (Red Hat Enterprise Linux 6)
Description of problem: Dirsrv deadlock in abandon Version-Release number of selected component (if applicable): ipa-server-2.2.0-16.el6.x86_64 389-ds-base-1.2.11.13-1.el6.x86_64 How reproducible: Have not reproduced yet... Test Env: 2 IPA Master Servers 2 IPA Clients 10K users 10 Groups 1k per group Sudo command setup Activities: -Running Manually triggered admin through the UI (del sudo rules) -Hourly running scheduled ssh sudo runtime client load (10 threads) System running load fine but and began failing early morning 5:00AM. System became inoperative. Could not able to kinit, unable to restart. Cause determined to be a DirSrv deadlock. Additional info: Dev Noriko Hosoi indicates: This is a self-deadlock in abandon... I see 3 threads waiting for the same mutex. The first one thread 23 already acquired the mutex in do_abandon, but it tries to grab it again in pagedresults_free_one_msgid. Thread 23 (Thread 0x7f446b458700 (LWP 22023)): #0 0x0000003ffec0e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003ffec093be in _L_lock_995 () from /lib64/libpthread.so.0 #2 0x0000003ffec09326 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003001c23f79 in PR_Lock () from /lib64/libnspr4.so #4 0x00000030014889f9 in pagedresults_free_one_msgid (conn=0x7f4470e68670, msgid=31) at ldap/servers/slapd/pagedresults.c:272 #5 0x000000000040d4ad in do_abandon (pb=0xa56ee70) at ldap/servers/slapd/abandon.c:156 #6 0x0000000000414104 in connection_dispatch_operation () at ldap/servers/slapd/connection.c:646 #7 connection_threadmain () at ldap/servers/slapd/connection.c:2338 #8 0x0000003001c299e3 in ?? () from /lib64/libnspr4.so #9 0x0000003ffec07851 in start_thread () from /lib64/libpthread.so.0 #10 0x0000003ffe8e767d in clone () from /lib64/libc.so.6 (gdb) p conn $6 = (Connection *) 0x7f4470e68670 (gdb) p *conn $7 = {c_sb = 0x88b1610, c_sd = 84, c_ldapversion = 3, c_dn = 0xa760d90 "fqdn=sti-high-3.testrelm.com,cn=computers,cn=accounts,dc=testrelm,dc=com", c_isroot = 0, c_isreplication_session = 0, c_authtype = 0x995b480 "SASL GSSAPI", c_external_dn = 0x0, c_external_authtype = 0x30014cf896 "none", cin_addr = 0xae24a80, cin_destaddr = 0xa607410, c_domain = 0x0, c_ops = 0xae9e640, c_gettingber = 0, c_currentber = 0x0, c_starttime = 1349254852, c_connid = 140400, c_opsinitiated = 59, c_opscompleted = 57, c_threadnumber = 2, c_refcnt = 3, c_mutex = 0x824f3e0, c_pdumutex = 0x89d5060, c_idlesince = 1349254862, c_private = 0x890d590, c_flags = 34, c_needpw = 0, c_client_cert = 0x0, c_prfd = 0x85bac50, c_ci = 84, c_fdi = -1, c_next = 0x7f4470e682c8, c_prev = 0x7f4470e69ed0, c_bi_backend = 0x0, c_extension = 0xb145440, c_sasl_conn = 0xb5e9990, c_local_ssf = 0, c_sasl_ssf = 56, c_ssl_ssf = 0, c_unix_local = 0, c_local_valid = 0, c_local_uid = 0, c_local_gid = 0, c_pagedresults = { prl_maxlen = 64, prl_count = 6, prl_list = 0x94f5c30}, c_push_io_layer_cb = 0, c_pop_io_layer_cb = 0, c_io_layer_cb_data = 0x0} Thread 19 (Thread 0x7f4468c54700 (LWP 22027)): #0 0x0000003ffec0e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003ffec093be in _L_lock_995 () from /lib64/libpthread.so.0 #2 0x0000003ffec09326 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003001c23f79 in PR_Lock () from /lib64/libnspr4.so #4 0x00000030014883fd in pagedresults_set_search_result (conn=0x7f4470e68670, sr=0x0, locked=0, index=0) at ldap/servers/slapd/pagedresults.c:360 #5 0x00007f447b1819fc in ldbm_back_next_search_entry_ext (pb=0xb02fca0, use_extension=0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:1426 #6 0x00000030014854c1 in iterate (pb=0xb02fca0, be=<value optimized out>, pnentries=0x7f4468c51394, pagesize=1000, pr_statp=0x7f4468c51388, send_result=1) at ldap/servers/slapd/opshared.c:1250 #7 0x00000030014859d7 in send_results_ext (pb=0xb02fca0, nentries=0x7f4468c51394, pagesize=1000, pr_stat=0x7f4468c51388, send_result=1) at ldap/servers/slapd/opshared.c:1651 #8 0x00000030014866cb in op_shared_search (pb=0xb02fca0, send_result=1) at ldap/servers/slapd/opshared.c:828 #9 0x00000000004263a4 in do_search (pb=<value optimized out>) at ldap/servers/slapd/search.c:400 #10 0x000000000041426a in connection_dispatch_operation () at ldap/servers/slapd/connection.c:621 #11 connection_threadmain () at ldap/servers/slapd/connection.c:2338 #12 0x0000003001c299e3 in ?? () from /lib64/libnspr4.so #13 0x0000003ffec07851 in start_thread () from /lib64/libpthread.so.0 #14 0x0000003ffe8e767d in clone () from /lib64/libc.so.6 (gdb) p conn $4 = (Connection *) 0x7f4470e68670 (gdb) p *conn $5 = {c_sb = 0x88b1610, c_sd = 84, c_ldapversion = 3, c_dn = 0xa760d90 "fqdn=sti-high-3.testrelm.com,cn=computers,cn=accounts,dc=testrelm,dc=com", c_isroot = 0, c_isreplication_session = 0, c_authtype = 0x995b480 "SASL GSSAPI", c_external_dn = 0x0, c_external_authtype = 0x30014cf896 "none", cin_addr = 0xae24a80, cin_destaddr = 0xa607410, c_domain = 0x0, c_ops = 0xae9e640, c_gettingber = 0, c_currentber = 0x0, c_starttime = 1349254852, c_connid = 140400, c_opsinitiated = 59, c_opscompleted = 57, c_threadnumber = 2, c_refcnt = 3, c_mutex = 0x824f3e0, c_pdumutex = 0x89d5060, c_idlesince = 1349254862, c_private = 0x890d590, c_flags = 34, c_needpw = 0, c_client_cert = 0x0, c_prfd = 0x85bac50, c_ci = 84, c_fdi = -1, c_next = 0x7f4470e682c8, c_prev = 0x7f4470e69ed0, c_bi_backend = 0x0, c_extension = 0xb145440, c_sasl_conn = 0xb5e9990, c_local_ssf = 0, c_sasl_ssf = 56, c_ssl_ssf = 0, c_unix_local = 0, c_local_valid = 0, c_local_uid = 0, c_local_gid = 0, c_pagedresults = { prl_maxlen = 64, prl_count = 6, prl_list = 0x94f5c30}, c_push_io_layer_cb = 0, c_pop_io_layer_cb = 0, c_io_layer_cb_data = 0x0} Thread 1 (Thread 0x7f44818d87c0 (LWP 21797)): #0 0x0000003ffec0e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003ffec093be in _L_lock_995 () from /lib64/libpthread.so.0 #2 0x0000003ffec09326 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003001c23f79 in PR_Lock () from /lib64/libnspr4.so #4 0x0000000000417ef6 in setup_pr_read_pds (ports=0x7fff13810910) at ldap/servers/slapd/daemon.c:1675 #5 slapd_daemon (ports=0x7fff13810910) at ldap/servers/slapd/daemon.c:1144 #6 0x000000000041f22f in main (argc=7, argv=0x7fff13810ca8) at ldap/servers/slapd/main.c:1253 (gdb) p c $1 = (Connection *) 0x7f4470e68670 (gdb) p *c $2 = {c_sb = 0x88b1610, c_sd = 84, c_ldapversion = 3, c_dn = 0xa760d90 "fqdn=sti-high-3.testrelm.com,cn=computers,cn=accounts,dc=testrelm,dc=com", c_isroot = 0, c_isreplication_session = 0, c_authtype = 0x995b480 "SASL GSSAPI", c_external_dn = 0x0, c_external_authtype = 0x30014cf896 "none", cin_addr = 0xae24a80, cin_destaddr = 0xa607410, c_domain = 0x0, c_ops = 0xae9e640, c_gettingber = 0, c_currentber = 0x0, c_starttime = 1349254852, c_connid = 140400, c_opsinitiated = 59, c_opscompleted = 57, c_threadnumber = 2, c_refcnt = 3, c_mutex = 0x824f3e0, c_pdumutex = 0x89d5060, c_idlesince = 1349254862, c_private = 0x890d590, c_flags = 34, c_needpw = 0, c_client_cert = 0x0, c_prfd = 0x85bac50, c_ci = 84, c_fdi = -1, c_next = 0x7f4470e682c8, c_prev = 0x7f4470e69ed0, c_bi_backend = 0x0, c_extension = 0xb145440, c_sasl_conn = 0xb5e9990, c_local_ssf = 0, c_sasl_ssf = 56, c_ssl_ssf = 0, c_unix_local = 0, c_local_valid = 0, c_local_uid = 0, c_local_gid = 0, c_pagedresults = { prl_maxlen = 64, prl_count = 6, prl_list = 0x94f5c30}, c_push_io_layer_cb = 0, c_pop_io_layer_cb = 0, c_io_layer_cb_data = 0x0}
Reviewed by Rich (Thank you!!)
Pushed to master.
$ git push Counting objects: 19, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 1.78 KiB, done. Total 10 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git cd48bbd..c19bb9d master -> master
Cherry-picked and pushed to external 389-ds-base-1.2.11.
$ git cherry-pick -x -e c19bb9d
$ git push origin 389-ds-base-1.2.11:389-ds-base-1.2.11 Counting objects: 19, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 1.83 KiB, done. Total 10 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 009fd8c..4d82507 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
Metadata Update from @nhosoi: - Issue assigned to rmeggins - Issue set to the milestone: 1.2.11.16
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/485
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.