#47449 deadlock after adding and deleting entries
Closed: wontfix None Opened 10 years ago by mreynolds.

If you have multiple clients, each adding and deleting users the server will deadlock. I created 5 ldif files. Each ldif file added and then deleted 200 entries. Using 5 separate ldapmodify's the server will deadlock within a minute or so. Appears to be an issue with an entry cache lock not being unlocked: Thread 29 (Thread 0x7f7d16bfd700 (LWP 8337)): #3 0x0000003b56223fe9 in PR_Lock () from /lib64/libnspr4.so #4 0x0000003b5622410b in PR_EnterMonitor () from /lib64/libnspr4.so #5 0x00007f7d29eb19bb in dblayer_lock_backend (be=0x2094160) at ../ds/ldap/servers/slapd/back-ldbm/dblayer.c:3942 #6 0x00007f7d29eb102f in dblayer_txn_begin (be=0x2094160, parent_txn=0x0, txn=0x7f7d16bfa860) at ../ds/ldap/servers/slapd/back-ldbm/dblayer.c:3664 #7 0x00007f7d29eeb814 in ldbm_back_delete (pb=0x7f7d16bfcaa0) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_delete.c:257 #8 0x00007f7d2dc8def4 in op_shared_delete (pb=0x7f7d16bfcaa0) at ../ds/ldap/servers/slapd/delete.c:364 #9 0x00007f7d2dc8d6dd in do_delete (pb=0x7f7d16bfcaa0) at ../ds/ldap/servers/slapd/delete.c:128 #10 0x000000000041578e in connection_dispatch_operation (conn=0x7f7d2464f730, op=0x231ee80, pb=0x7f7d16bfcaa0) at ../ds/ldap/servers/slapd/connection.c:643 #11 0x0000000000417524 in connection_threadmain () at ../ds/ldap/servers/slapd/connection.c:2482 Thread 27 (Thread 0x7f7d157fb700 (LWP 8339)): #3 0x0000003b56223fe9 in PR_Lock () from /lib64/libnspr4.so #4 0x0000003b5622410b in PR_EnterMonitor () from /lib64/libnspr4.so #5 0x00007f7d29eb19bb in dblayer_lock_backend (be=0x2094160) at ../ds/ldap/servers/slapd/back-ldbm/dblayer.c:3942 #6 0x00007f7d29eb102f in dblayer_txn_begin (be=0x2094160, parent_txn=0x0, txn=0x7f7d157f6790) at ../ds/ldap/servers/slapd/back-ldbm/dblayer.c:3664 #7 0x00007f7d29ede3f1 in ldbm_back_add (pb=0x7f7d157faaa0) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_add.c:261 #8 0x00007f7d2dc7dc4b in op_shared_add (pb=0x7f7d157faaa0) at ../ds/ldap/servers/slapd/add.c:735 #9 0x00007f7d2dc7cb96 in do_add (pb=0x7f7d157faaa0) at ../ds/ldap/servers/slapd/add.c:258 #10 0x000000000041576c in connection_dispatch_operation (conn=0x7f7d2464f878, op=0x22ec000, pb=0x7f7d157faaa0) at ../ds/ldap/servers/slapd/connection.c:638 #11 0x0000000000417524 in connection_threadmain () at ../ds/ldap/servers/slapd/connection.c:2482 Thread 15 (Thread 0x7f7d09bf5700 (LWP 8351)): #0 0x000000377560e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000037756093be in _L_lock_995 () from /lib64/libpthread.so.0 #2 0x0000003775609326 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003b56223fe9 in PR_Lock () from /lib64/libnspr4.so #4 0x0000003b5622410b in PR_EnterMonitor () from /lib64/libnspr4.so #5 0x00007f7d29eb19bb in dblayer_lock_backend (be=0x2094160) at ../ds/ldap/servers/slapd/back-ldbm/dblayer.c:3942 #6 0x00007f7d29eb102f in dblayer_txn_begin (be=0x2094160, parent_txn=0x0, txn=0x7f7d09bf0790) at ../ds/ldap/servers/slapd/back-ldbm/dblayer.c:3664 #7 0x00007f7d29ede3f1 in ldbm_back_add (pb=0x7f7d09bf4aa0) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_add.c:261 #8 0x00007f7d2dc7dc4b in op_shared_add (pb=0x7f7d09bf4aa0) at ../ds/ldap/servers/slapd/add.c:735 #9 0x00007f7d2dc7cb96 in do_add (pb=0x7f7d09bf4aa0) at ../ds/ldap/servers/slapd/add.c:258 #10 0x000000000041576c in connection_dispatch_operation (conn=0x7f7d2464f4a0, op=0x232c850, pb=0x7f7d09bf4aa0) at ../ds/ldap/servers/slapd/connection.c:638 #11 0x0000000000417524 in connection_threadmain () at ../ds/ldap/servers/slapd/connection.c:2482 ---> this thread is causing the deadlock #3 0x0000003b56223fe9 in PR_Lock () from /lib64/libnspr4.so #4 0x0000003b5622410b in PR_EnterMonitor () from /lib64/libnspr4.so #5 0x00007f7d29ea7169 in cache_lock_entry (cache=0x21130b8, e=0x22f67a0) at ../ds/ldap/servers/slapd/back-ldbm/cache.c:1502 #6 0x00007f7d29ebee77 in find_entry_internal_dn (pb=0x7f7d073f0aa0, be=0x2094160, sdn=0x7f7ca400dec0, lock=1, txn=0x7f7d073ee860, flags=0) at ../ds/ldap/servers/slapd/back-ldbm/findentry.c:155 #7 0x00007f7d29ebf446 in find_entry_internal (pb=0x7f7d073f0aa0, be=0x2094160, addr=0x22f4b68, lock=1, txn=0x7f7d073ee860, flags=0) at ../ds/ldap/servers/slapd/back-ldbm/findentry.c:293 #8 0x00007f7d29ebf530 in find_entry2modify (pb=0x7f7d073f0aa0, be=0x2094160, addr=0x22f4b68, txn=0x7f7d073ee860) at ../ds/ldap/servers/slapd/back-ldbm/findentry.c:324 #9 0x00007f7d29eeb8b4 in ldbm_back_delete (pb=0x7f7d073f0aa0) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_delete.c:273 #10 0x00007f7d2dc8def4 in op_shared_delete (pb=0x7f7d073f0aa0) at ../ds/ldap/servers/slapd/delete.c:364 #11 0x00007f7d2dc8d6dd in do_delete (pb=0x7f7d073f0aa0) at ../ds/ldap/servers/slapd/delete.c:128 #12 0x000000000041578e in connection_dispatch_operation (conn=0x7f7d2464f358, op=0x22f4a90, pb=0x7f7d073f0aa0) at ../ds/ldap/servers/slapd/connection.c:643 #13 0x0000000000417524 in connection_threadmain () at ../ds/ldap/servers/slapd/connection.c:2482

Version: 1.3.0
Is this version info correct? (I guess it could be 1.3.2/master?)

Replying to [comment:2 nhosoi]:

Version: 1.3.0
Is this version info correct? (I guess it could be 1.3.2/master?)

Yes this is with 1.3.2(master).

In ldbm_back_delete() I also forced the setting of the error code in case any future code shuffling occurs.

in ldbm_delete - in the first 3 cases, retval = -1 already - it is not necessary to set it, except perhaps to make the assumptions more clear

in ldbm_modrdn - would rather not make a change that is only formatting

Replying to [comment:6 rmeggins]:

in ldbm_delete - in the first 3 cases, retval = -1 already - it is not necessary to set it, except perhaps to make the assumptions more clear

Right, I was under the assumption that this bug might have happened from code being shuffled around. So I hard set it to avoid future mistakes. But this "mistake" is present in all versions of 389(at least 1.2.11 and up). Anyway I just remove it.

in ldbm_modrdn - would rather not make a change that is only formatting.

No problem, it was only formatting.

New patch attached.

git merge ticket47449
Updating aa55789..6bd78b3
Fast-forward
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

git push origin master
To ssh://git.fedorahosted.org/git/389/ds.git
aa55789..6bd78b3 master -> master

commit 6bd78b3

1.3.1

024abee..8ea067b 389-ds-base-1.3.1 -> 389-ds-base-1.3.1

1.3.0

3f75400..93bde65 389-ds-base-1.3.0 -> 389-ds-base-1.3.0

1.2.11

c1dcfc6..66fbebc 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 66fbebc

Metadata Update from @nkinder:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.2.11.22

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/786

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata