If using betxn pre/post operation plugins, and a betxn pre/post op plugin returns a failure code, the dblayer_env_lock is not released. During shutdown, or backend deletion, the code attempts to acquire a write lock, and the write lock will block forever because the previous read locks were not released.
I think this unnecessary result code checking accidentally skipped the dblayer_txn_abort... The problematic code is found only in ldbm_modify. (Other ops -- add, delete, modrdn -- do not check ldap_result_code before calling dblayer_txn_abort.) I'd like to run some more tests to verify my theory... Could you share your test case? Thanks!
diff --git a/ldap/servers/slapd/back-ldbm/ldbm_modify.c b/ldap/servers/slapd/bac index f4f1263..10c38cf 100644 --- a/ldap/servers/slapd/back-ldbm/ldbm_modify.c +++ b/ldap/servers/slapd/back-ldbm/ldbm_modify.c @@ -614,7 +614,7 @@ error_return:
if (disk_full) { rc= return_on_disk_full(li);
I found this running the Managed Entry and Linked Attributes tests with the plugin type changed to betxnpreoperation. The tests would hang during the cleanup phase. I have a fix that I'm testing - the fix just does this: {{{ - if (retry_count > 0) { + if (txn.back_txn_txn && (txn.back_txn_txn != parent_txn)) { }}}
So far it seems to be working fine - no hangs.
Argh.. You are right. Stupid bugs. :(
Please add this change (in ldbm_back_modify) to your fix, too. The check is not needed. - } else if (ldap_result_code != LDAP_SUCCESS) { + } else {
Thanks!
ok - but is there a case where ldap_result_code == LDAP_SUCCESS and we have to abort the transaction?
Replying to [comment:4 rmeggins]:
I don't (want to) think there is such a case. But if an error occurs in one of the plugins (at other than the LDAP operations) and the plugin could slip setting ldap_result_code but just return error (which is set to retval only.)
git patch file (master) 0001-Trac-Ticket-290-server-hangs-during-shutdown-if-betx.patch
1) 1.2.11 will have a different way to set the txn from plugins - so just remove those linked attrs changes 2) in the diskfull return case, I think the txn should be aborted too 3) the value sdn free patch looks good, but could we have that in a separate commit?
Replying to [comment:7 rmeggins]:
1) 1.2.11 will have a different way to set the txn from plugins - so just remove those linked attrs changes
That's what I thought. All right. I'll remove them.
2) in the diskfull return case, I think the txn should be aborted too
Theoretically, yes. But most likely, it crashes in txn_abort. (And the crash cannot be captured...):
3) the value sdn free patch looks good, but could we have that in a separate commit?
Yeah, that'd be better. Do we open a new ticket for such a case?
Replying to [comment:8 nhosoi]:
Replying to [comment:7 rmeggins]: 1) 1.2.11 will have a different way to set the txn from plugins - so just remove those linked attrs changes That's what I thought. All right. I'll remove them. Thanks 2) in the diskfull return case, I think the txn should be aborted too Theoretically, yes. But most likely, it crashes in txn_abort. (And the crash cannot be captured...): ok - then what you have is fine 3) the value sdn free patch looks good, but could we have that in a separate commit? Yeah, that'd be better. Do we open a new ticket for such a case?
That's what I thought. All right. I'll remove them. Thanks
Theoretically, yes. But most likely, it crashes in txn_abort. (And the crash cannot be captured...): ok - then what you have is fine
No, just a different commit - will make it easier to track and cherry-pick if necessary
Thanks to Rich for his reviews and comments.
I've removed the changes on linked_attrs.c. And separated the fixes into 2 parts as follows:
commit 69087b4 Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com Date: Tue Feb 21 13:22:07 2012 -0800
Trac Ticket #290 - server hangs during shutdown if betxn pre/post op fails https://fedorahosted.org/389/ticket/290 Fix description: If a dn type modify value is invalid, the modify operation could crash the server. This patch fixes it.
commit 4495cf3 Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com Date: Tue Feb 21 13:20:12 2012 -0800
Trac Ticket #290 - server hangs during shutdown if betxn pre/post op fails https://fedorahosted.org/389/ticket/290 Fix description: The logic to call dblayer_txn_abort in the ldbm_back update functions was wrong. If the operations fail, the abort function has to be called if dblayer_txn_begin is already called and dblayer_txn_commit is not. This patch checks the txn value set by dblayer_txn_commit to determine to call dblayer_txn_abort or not.
Pushed to master.
$ git merge trac290 Updating 2e5ee4d..69087b4 Fast-forward ldap/servers/slapd/back-ldbm/ldbm_add.c | 4 ++-- ldap/servers/slapd/back-ldbm/ldbm_delete.c | 4 ++-- ldap/servers/slapd/back-ldbm/ldbm_modify.c | 6 +++--- ldap/servers/slapd/back-ldbm/ldbm_modrdn.c | 5 +++-- ldap/servers/slapd/value.c | 3 ++- 5 files changed, 12 insertions(+), 10 deletions(-)
$ git push Counting objects: 26, done. Delta compression using up to 4 threads. Compressing objects: 100% (16/16), done. Writing objects: 100% (16/16), 1.85 KiB, done. Total 16 (delta 13), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 2e5ee4d..69087b4 master -> master
this fix was cherry-picked to 1.2.10 commit changeset:b197245/389-ds-base Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com Date: Tue Feb 21 13:22:07 2012 -0800 Fix description: If a dn type modify value is invalid, the modify operation could crash the server. This patch fixes it. (cherry picked from commit changeset:69087b4/389-ds-base)
commit changeset:b197245/389-ds-base Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com Date: Tue Feb 21 13:22:07 2012 -0800 Fix description: If a dn type modify value is invalid, the modify operation could crash the server. This patch fixes it. (cherry picked from commit changeset:69087b4/389-ds-base) 1.2.10 branch
Added initial screened field value.
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.2.11.a1
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/290
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.