https://bugzilla.redhat.com/show_bug.cgi?id=742582
The FreeIPA development team reported an issue with ns-slapd hanging when under load from IPA. (https://fedorahosted.org/freeipa/ticket/1885). I am able to reproduce this issue on F15 x86_64. The hang looks to be some sort of deadlock within libdb. When the hang occurs, the ns-slapd process goes to 100% CPU and doesn't progress. I built a debug version of 389-ds-base and db4, and was able to get some interesting stack traces from 3 threads that seem to contribute to this problem. There is one thread doing a write to the memberOf index db, another thread doing a read from the memberOf index db, and the checkpoint thread attempting to do a checkpoint. These threads all seem to be yielding or waiting for a mutex. The db_stat tool doesn't show anything waiting for locks, so it seems to be something more internal to libdb.
batch move to milestone 1.3
I attempted to reproduce this again today on F16 using the 389-ds-base-1.2.10.rc1 bits, as I was having trouble reproducing this the last time I looked at it. I am still able to reproduce the hang with the test scripts that are attached to the original bug, but the hang does not occur every time.
The current test script has a loop that performs 3000 membership iterations. I would recommend increasing this to 5000+, and it may finish successfully sometimes since the issue is timing related.
results of the reproducer dshang
The test was run on Fedora 15 x86_64 (dual core).
The attachment dshang_result.tar.gz contains: with_db4-4.8.30/db_stat.dshang /typescript.dshang /dshang.out with_db5-5.1.25-3/db_stat.dshang /typescript.dshang /dshang.out
The bad news is the deadlock is observed both with db4 and with db5.
Another observation is if I run the same test dshang on Fecora 16 x86_64 (dual core), the hang does not occur both with db4 and with db5.
I ran the dshang test script on RHEL6.3: Red Hat Enterprise Linux Server release 6.3 Beta (Santiago) # lscpu Architecture: x86_64 CPU(s): 8
The test successfully ran for 36 hours.
Note: In this test, DS is linked to db4-4.7.25-16.el6.x86_64; there is no db5 (libdb) available on RHEL6.
No package libdb available.
Since the symptom is observed only on Fedora15 and no other platforms, we are closing ticket for now.
Added initial screened field value.
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.2.11.rc1
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/25
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Invalid)
Login to comment on this ticket.