Issue #335: transaction retries need to be cache aware - 389-ds-base

389-ds-base

#335 transaction retries need to be cache aware

Closed: wontfix None Opened 12 years ago by rmeggins.

The code that resets the original entries/mods/etc. when retrying deadlocked transactions in ldbm_back_delete et. al. need to be cache aware when deleting entries that are cached - for example, ldbm_delete.c:485

            backentry_free(&e);

The problem is that e is cached, so cannot be freed like this. Freeing cached entries seems to cause problems in the server, assertion failures, other crashes.

nhosoi commented 12 years ago

Agh.. You are right.

We have to do something like this...
cache_find_dn/id
if exists, cache_return; then cache_remove
otherwise backentry_free

Could this be the cause of the other crash bug, e.g., bz808770?

rmeggins commented 12 years ago

Replying to [comment:1 nhosoi]:

Agh.. You are right.

We have to do something like this...
cache_find_dn/id
if exists, cache_return; then cache_remove
otherwise backentry_free

Could this be the cause of the other crash bug, e.g., bz808770?

No, because the txn retry code is only in 1.2.11 - I found it trying to reproduce 808770 with 1.2.11 . . .

nhosoi commented 12 years ago

Rich, I'm wondering if you are already working on this ticket?
This ticket is assigned to me. Is it okay for me to start looking into this?

rmeggins commented 12 years ago

Replying to [comment:3 nhosoi]:

Rich, I'm wondering if you are already working on this ticket?
This ticket is assigned to me. Is it okay for me to start looking into this?

I'm not working on this. Feel free to take it.

nhosoi commented 12 years ago

Replying to [comment:4 rmeggins]:

I'm not working on this. Feel free to take it.
Thanks!

nhosoi commented 12 years ago

Rich, do you happen to have a reproducer (for the assertion failure / crashes) handy?

rmeggins commented 12 years ago

Replying to [comment:6 nhosoi]:

Rich, do you happen to have a reproducer (for the assertion failure / crashes) handy?

not easily - what we really need is a way to "inject" deadlocks - that is, we need to be able to trigger a deadlock programatically rather than having to put the directory server under a heavy load with many updates + searches at the same time

nhosoi commented 12 years ago

Replying to [comment:7 rmeggins]:

Replying to [comment:6 nhosoi]:

Rich, do you happen to have a reproducer (for the assertion failure / crashes) handy?

not easily - what we really need is a way to "inject" deadlocks - that is, we need to be able to trigger a deadlock programatically rather than having to put the directory server under a heavy load with many updates + searches at the same time

I see. Thanks... We may be able to make it happen easier once we disable the backend SERIALLOCK. BTW, the SERIALLOCK is not applied to the replicated operation. Are you getting the assertion failure/crash in the MMR topology?

rmeggins commented 12 years ago

Replying to [comment:8 nhosoi]:

Replying to [comment:7 rmeggins]:

Replying to [comment:6 nhosoi]:

Rich, do you happen to have a reproducer (for the assertion failure / crashes) handy?

not easily - what we really need is a way to "inject" deadlocks - that is, we need to be able to trigger a deadlock programatically rather than having to put the directory server under a heavy load with many updates + searches at the same time

I see. Thanks... We may be able to make it happen easier once we disable the backend SERIALLOCK. BTW, the SERIALLOCK is not applied to the replicated operation. Are you getting the assertion failure/crash in the MMR topology?

My test replicates the same setup as the reported problem, which does use MMR, but I don't think you need to use MMR to reproduce the problem - we just need some way to make calls to db->put() db->get() dbc->c_put() dbc->c_get() and dbc->c_close() return DB_LOCK_DEADLOCK errors without having to generate large loads.

nhosoi commented 12 years ago

Thanks! I could reproduce the problem quite easily on the standalone server. I'm seeing the assertion failures and crashes. Continue debugging...

nhosoi commented 12 years ago

git patch file (master)
0001-Trac-Ticket-335-transaction-retries-need-to-be-cache.patch

nhosoi commented 12 years ago

Fix description:
When libdb returns DEADLOCK and backend update function retries
the operation, the target entry is reset to the original shape.
The target entry could be or could not be in the entry cache.
Regardless of the status, the original code just released the
entry with backentry_free before going into the next loop, which
causes the cache error.

This patch checks the status of the entry. If it is in the entry
cache, remove it from the entry cache and add a new entry back to
the cache if necessary. To get the accurate cache status of each
entry, the output argument cache_res to id2entry_add_ext is added.

Additinally, error checking for the conflict value in index_add_mods
was week (curr_attr). This patch is adding the check.

nhosoi commented 12 years ago

Reviewed by Rich (Thank you!!)

Pushed to master.

$ git merge ticket335
Updating a50f8c0..bddb5a4
Fast-forward
ldap/servers/slapd/back-ldbm/cache.c | 2 +-
ldap/servers/slapd/back-ldbm/id2entry.c | 16 ++++++---
ldap/servers/slapd/back-ldbm/import-threads.c | 2 +-
ldap/servers/slapd/back-ldbm/index.c | 35 +++++++++++---------
ldap/servers/slapd/back-ldbm/ldbm_add.c | 20 ++++++++++-
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 33 ++++++++++++++-----
ldap/servers/slapd/back-ldbm/ldbm_modify.c | 41 +++++++++++++++++++-----
ldap/servers/slapd/back-ldbm/ldbm_modrdn.c | 23 +++++++++++--
ldap/servers/slapd/back-ldbm/proto-back-ldbm.h | 2 +-
9 files changed, 129 insertions(+), 45 deletions(-)
$ git push
Counting objects: 88, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (34/34), done.
Writing objects: 100% (34/34), 6.17 KiB, done.
Total 34 (delta 30), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
a50f8c0..bddb5a4 master -> master

nhosoi commented 11 years ago

git patch file (master)
0002-Trac-Ticket-335-transaction-retries-need-to-be-cache.patch

nhosoi commented 11 years ago

Fix description:
Commit bddb5a4 includes this fix:

Additinally, error checking for the conflict value in index_add_mods
was week (curr_attr). This patch is adding the check.

The fix was incomplete. If an add-attempted attribute type itself
does not exist in the entry (not only the attribute value) after
mods applied, the attribute type/value should not have been indexed.
This patch fixes it.

It's been reviewed by Rich on bz832560. (Thanks, Rich!!)

Pushed to master.

$ git push
Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 992 bytes, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
79799c9..420c6c2 master -> master

Pushed to 389-ds-base-1.2.11, as well.

$ git push origin ds1211-local:389-ds-base-1.2.11
Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 1.02 KiB, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
ff11ccc..7d6801f ds1211-local -> 389-ds-base-1.2.11

nkinder commented 11 years ago

Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=833202

rmeggins commented 11 years ago

master:
commit changeset:420c6c2/389-ds-base
Author: Noriko Hosoi nhosoi@totoro.usersys.redhat.com
Date: Fri Jun 15 16:32:09 2012 -0700

1.2.11:
commit changset:7d6801f/389-ds-base
Author: Noriko Hosoi nhosoi@totoro.usersys.redhat.com
Date: Fri Jun 15 16:32:09 2012 -0700

nkinder commented 11 years ago

Added initial screened field value.

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.11.rc1

7 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/335

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Metadata

Assignee

nhosoi

Tags

None

Blocking

None

Depending on

None

Priority

critical

Milestone

1.2.11.rc1

reviewstatus

ack

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=833202

origin

None

389-ds-base

Source Code

#335 transaction retries need to be cache aware Closed: wontfix None Opened 12 years ago by rmeggins.

Metadata

#335 transaction retries need to be cache aware

Closed: wontfix None Opened 12 years ago by rmeggins.