In IPA upgrade this deadlock was encountered:
Thread 4 (Thread 0x7f27f97e2700 (LWP 11679)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f282fa1a1af in _L_lock_1026 () from /lib64/libpthread.so.0 #2 0x00007f282fa1a151 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f2833f46d18) at ../nptl/pthread_mutex_lock.c:134 #3 0x00007f28300722c9 in PR_Lock (lock=lock@entry=0x7f2833f46d18) at ../../../nspr/pr/src/pthreads/ptsynch.c:174 #4 0x00007f2830072883 in PR_EnterMonitor (mon=0x7f2833f46d10) at ../../../nspr/pr/src/pthreads/ptsynch.c:529 #5 0x00007f282612bff6 in dblayer_txn_begin () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #6 0x00007f282614bba3 in ldbm_back_add () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #7 0x00007f2831c4ec1a in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0 #8 0x00007f2831c4f4a3 in add_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0 #9 0x00007f28249daf0d in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #10 0x00007f2831ca7cb0 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0 #11 0x00007f2831ca7f08 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0 #12 0x00007f2826166b91 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #13 0x00007f2831c94fa1 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0 #14 0x00007f2831c95a54 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0 #15 0x00007f28258446a1 in memberof_fix_memberof_callback () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so #16 0x00007f2831cbc3cd in send_ldap_search_entry_ext () from /usr/lib64/dirsrv/libslapd.so.0 #17 0x00007f2831cbcc0c in send_ldap_search_entry () from /usr/lib64/dirsrv/libslapd.so.0 #18 0x00007f2831c9a593 in iterate.isra.0.constprop.3 () from /usr/lib64/dirsrv/libslapd.so.0 #19 0x00007f2831c9a73a in send_results_ext.constprop.2 () from /usr/lib64/dirsrv/libslapd.so.0 #20 0x00007f2831c9be21 in op_shared_search () from /usr/lib64/dirsrv/libslapd.so.0 #21 0x00007f2831cabc8e in search_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0 #22 0x00007f2825845ef7 in memberof_fixup_task_thread () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so #23 0x00007f2830077c10 in _pt_root (arg=0x7f28140485e0) at ../../../nspr/pr/src/pthreads/ptthread.c:204 #24 0x00007f282fa17f33 in start_thread (arg=0x7f27f97e2700) at pthread_create.c:309 #25 0x00007f282f745ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 36 (Thread 0x7f283204f700 (LWP 11643)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f282fa1a1af in _L_lock_1026 () from /lib64/libpthread.so.0 #2 0x00007f282fa1a151 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f2833f9fd38) at ../nptl/pthread_mutex_lock.c:134 #3 0x00007f28300722c9 in PR_Lock (lock=lock@entry=0x7f2833f9fd38) at ../../../nspr/pr/src/pthreads/ptsynch.c:174 #4 0x00007f2830072883 in PR_EnterMonitor (mon=0x7f2833f9fd30) at ../../../nspr/pr/src/pthreads/ptsynch.c:529 #5 0x00007f2825846eca in memberof_postop_del () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so #6 0x00007f2831ca7cb0 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0 #7 0x00007f2831ca7f08 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0 #8 0x00007f28261594de in ldbm_back_delete () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #9 0x00007f2831c5beb0 in op_shared_delete () from /usr/lib64/dirsrv/libslapd.so.0 #10 0x00007f2831c5c062 in delete_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0 #11 0x00007f28249db6de in delete_changerecord () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #12 0x00007f28249db9b1 in changelog_trim_thread_fn () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #13 0x00007f2830077c10 in _pt_root (arg=0x7f28100008e0) at ../../../nspr/pr/src/pthreads/ptthread.c:204 #14 0x00007f282fa17f33 in start_thread (arg=0x7f283204f700) at pthread_create.c:309 #15 0x00007f282f745ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
There is a scope for memberof defined:
memberofentryscope: dc=ipa,dc=dev,dc=lan
which should exclude cn=changelog, but the check is only done after acquiring the memberof lock.
I think for postop_add and postop_del this check coould be done when memberof_oktodo() is called
Ludwig, setting the next 1.3.2 to the milestone. Please feel free to adjust it to your plan. Thanks!
Is this still an issue in the IPA upgrade (from which version to which)?
If so, could you put the 389-ds-base version and the stacktraces with the matched debuginfo (there are some functions which options and lines are missing...)?
Setting to FUTURE.
Comment by Ludwig: global lock and lmdb redesign should handle this
Version: 389-ds-base-1.3.3.1-15.el7_1.x86_64
Replying to [comment:10 mreynolds]: Thanks to Mark for picking up this ticket. Setting Milestone to 1.3.4.3...
attachment 0001-Ticket-47931-memberOf-retrocl-deadlocks.patch
Hi Mark!
I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;) {{{ Bug Description: When concurrently updating multiple backends the memberOf and retrocl plugins can deadlock on each other. This is caused by the required retrocl lock, and the db lock on the changenumber index in the retrocl db.
Fix Description: Added scoping to the retrocl that allows subtrees/suffixes to be included or excluded. Also moved the existing memberOf scoping outside of its global lock. }}} So, the deadlock was caused by the memberof scoping inside the global lock and retrocl lock (retrocl_internal_lock or retrocl_cn_lock?)? And basically just moving the memberof scoping out solves the deadlock?
Replying to [comment:14 nhosoi]:
Hi Mark! I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;)
I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;)
It is, and this fix is almost a workaround. The retrocl design makes it very difficult to avoid deadlocks when dealing with multiple backends.
{{{ Bug Description: When concurrently updating multiple backends the memberOf and retrocl plugins can deadlock on each other. This is caused by the required retrocl lock, and the db lock on the changenumber index in the retrocl db. Fix Description: Added scoping to the retrocl that allows subtrees/suffixes to be included or excluded. Also moved the existing memberOf scoping outside of its global lock. }}} So, the deadlock was caused by the memberof scoping inside the global lock and retrocl lock (retrocl_internal_lock or retrocl_cn_lock?)? And basically just moving the memberof scoping out solves the deadlock?
{{{ Bug Description: When concurrently updating multiple backends the memberOf and retrocl plugins can deadlock on each other. This is caused by the required retrocl lock, and the db lock on the changenumber index in the retrocl db.
There are two different deadlocks this patch tries to address. The first one is very easy to reproduce, the other I can not reproduce. Here is the stack from the first deadloack that is easy to reproduce:
{{{ Thread 14
Thread 13 (Thread 0x7fc678ff1700 (LWP 1931)):
}}}
Thread 14 is updating many members in one mod operation, so the retrocl is being called multiple times in the modify operation. It holds a lock on the changennumber index(via retrocl plugin). Then there are updates occuring on a different backend (thread 13) that grab the retrocl lock, but then gets stuck waiting on the changenumber index which thread 13 holds. But then thread 14, which has the changenumber lock, tries to grab the retrocl lock again - and we deadlock between the two threads.
So by adding scoping to retrocl, where we exclude the other backends, we avoid this deadlock.
The second deadlock, the stack is listed in this ticket, I can not reproduce. But... By moving the scoping check outside of the memberOf lock, and combined with the new retrocl scoping, should address this deadlock as well.
Really, the problem is the retro changelog design. The retrocl plugin almost needs to be either rewritten/redesigned entirely, or simply removed and the replication changelog extended to fulfill the needs that the retrocl was providing.
So for now, this patch addresses the issues that current customers are running into with IPA, and it also adds some new functionality/flexibility to the retrocl plugin.
Thank you for your explanation, Mark! Now I understood the design of your fix.
2 minor issues... 1. Is 'i' supposed to get incremented? {{{ 645 / Check if the the entry include scope is a child of the sdn / 646 static Slapi_DN 647 memberof_scope_is_child_of_dn(MemberOfConfig config, Slapi_DN sdn) 648 { 649 int i = 0; 650 651 while(config->entryScopes && config->entryScopes[i]){ 652 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){ 653 return config->entryScopes[i]; 654 } 655 } 656 return NULL; 657 } }}} 2. Is retro changelog applicable to the dynamic plug-in? If yes, retrocl_includes|excludes could leak? {{{ 48 static Slapi_DN retrocl_includes = NULL; 49 static Slapi_DN retrocl_excludes = NULL; 391 retrocl_excludes = (Slapi_DN )slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1); 411 retrocl_includes = (Slapi_DN )slapi_ch_calloc(sizeof(Slapi_DN ),num_vals+1); }}}
Replying to [comment:16 nhosoi]:
Thank you for your explanation, Mark! Now I understood the design of your fix. 2 minor issues... 1. Is 'i' supposed to get incremented? {{{ 645 / Check if the the entry include scope is a child of the sdn / 646 static Slapi_DN 647 memberof_scope_is_child_of_dn(MemberOfConfig config, Slapi_DN *sdn) 648 { 649 int i = 0; 650 651 while(config->entryScopes && config->entryScopes[i]){ 652 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){ 653 return config->entryScopes[i]; 654 } 655 } 656 return NULL; 657 } }}}
2 minor issues... 1. Is 'i' supposed to get incremented? {{{ 645 / Check if the the entry include scope is a child of the sdn / 646 static Slapi_DN 647 memberof_scope_is_child_of_dn(MemberOfConfig config, Slapi_DN *sdn) 648 { 649 int i = 0; 650 651 while(config->entryScopes && config->entryScopes[i]){ 652 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){ 653 return config->entryScopes[i]; 654 } 655 } 656 return NULL; 657 } }}}
Good catch
Is retro changelog applicable to the dynamic plug-in? If yes, retrocl_includes|excludes could leak? {{{ 48 static Slapi_DN retrocl_includes = NULL; 49 static Slapi_DN retrocl_excludes = NULL; 391 retrocl_excludes = (Slapi_DN )slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1); 411 retrocl_includes = (Slapi_DN )slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1); }}}
This is freed in the plugin stop function, and that's how dynamic plugins work. Any change to plugin's config triggers the plugin to be restarted. So this is not leaked.
revision 0001-Ticket-47931-memberOf-retrocl-deadlocks.2.patch
New patch attached...
Hi Mark, thanks for waiting. Please push it!
6b138a2..fd959ac master -> master commit fd959ac864d6d86d24928bc2c6f097d1a6031ecd Author: Mark Reynolds mreynolds@redhat.com Date: Tue Aug 4 12:19:31 2015 -0400
e62b481..d810847 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit d8108476d3bedbcc03f6c61bfb3d50e921faaf42
Coverity issues found, reopening...
Fix coverity issues 0002-Ticket-47931-Fix-coverity-issues.patch
fd959ac..5daea97 master -> master commit 5daea973e4526584ee41d7b9f4b1b4993b4de6f1 Author: Mark Reynolds mreynolds@redhat.com Date: Mon Aug 10 10:42:40 2015 -0400
d810847..9a0047e 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 9a0047ef75f6dbeb1980ac77fab5d62865c77e6a
08eabd5..f86dab3 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit f86dab3bc94482df4f6850ea6a9a90371c1cfc03
f86dab3..e6343e6 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit e6343e66aa2e4537388b386c055b7ca8044f8f34
Debug logging was accidentally left in the source - it has now been removed:
29c09a5..db7153f master -> master commit db7153f89bf3dda935e6ef4f175697bda32fe720 Author: Mark Reynolds mreynolds@redhat.com Date: Wed Aug 19 10:03:50 2015 -0400
2c5e0d5..1781280 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 1781280f133c4877f83949400294641a558f5406
895dc4f..5c48f28 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 5c48f28d2d6c46aa3cd2883207c18002d01e4994
lib389 test script 0001-Ticket-47931-lib389-test-memberof-and-retrocl-deadlo.patch
Replying to [comment:27 mreynolds]: Probably, a silly question... :p Could you tell me where test_ticket47931 is called?
Replying to [comment:28 nhosoi]:
Simon modified create-ticket.py to make new ticket tests more compliant with the pytest way. So now in the main function we call
{{{ pytest.main("-s %s" % CURRENT_FILE) }}}
Which executes all the functions that start with "test".
I see. Thanks!
Thanks Noriko!
4277f2b..48470e7 master -> master commit 48470e7a6b0a350149ced0a1377e625b836dba64 Author: Mark Reynolds mreynolds@redhat.com Date: Fri Aug 28 15:01:58 2015 -0400
8c9c2fc..22e4de7 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 22e4de79107e49af74e0ff7615552b61db240b71
f10e2e6..e7d3b2f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit e7d3b2f120e0a3e9ce642733b803686fbcc49222
Metadata Update from @mreynolds: - Issue assigned to mreynolds - Issue set to the milestone: 1.3.3.13
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1262
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.