#351 use betxn plugins by default
Closed: wontfix None Opened 11 years ago by rmeggins.

Make all regular plugins to be betxn plugins


To ssh://git.fedorahosted.org/git/389/ds.git
dd65701..9c0f530 master -> master

commit changeset:9c0f530/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Apr 30 20:42:34 2012 -0600

had to do a partial revert for 1.2.11
commit changeset:d63d4f2/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Fri May 4 14:38:13 2012 -0600

This causes deadlocks in memberof, and the risk for other deadlocks is high - need to do more testing before we can be sure this will work

changing milestone to 1.3.0.a1

set default ticket origin to Community

Added initial screened field value.

{{{
Fix Description:
. Enabled betxn by default on the following plugins:
7-bit check Plugin, Attribute uniqueness Plugin
Auto Membership Plugin, Class of Service Plugin
Managed Entries, MemberOf Plugin
Multimaster Replication Plugin
PAM Pass Through Auth Plugin
Referential integrity postoperation Plugin
Roles Plugin, State Change Plugin, USN Plugin
. Exposed backend transaction to plugins:
slapi_back_transaction_begin|commit|abort.
. Backend serial lock is held just before the backend transaction,
instead of at the earliest timing into the backend db plugin.
. dse: adjusting to the bepost behaviour, put betxn post hook into
"need_be_postop" clause.
. MemberOf Plugin:
+ If betxn is on, MemberOf post operations are called at the
betxn postop timing, which is aborted if the main operation
fails.
+ When betxn is on, member of operations are in the transaction
as sell as in the backend serial lock. Taking advantage of it,
memberof_lock is not held if betxn is on.
+ MemberOf fixup task uses exposed transaction APIs.
. Multimaster Replication Plugin
+ If betxn is on, Multimaster Replication bepost operations are
called at the betxn postop timing. Since betxn post callbacks
are already declared, each bepost callback is called from the
existing betxn post callbacks (see multimaster_be_betxnpostop_*).
. PAM Pass Through Auth Plugin:
+ If betxn is on, PAM Pass Through pre/post operations are called
at the betxn preop/postop timing, respectivly.
. Referential integrity postoperation Plugin
+ If betxn is on, Referential integrity post operations are called
at the betxn postop timing.
+ When betxn is on, referential integirity post operations are in
the transaction as sell as in the backend serial lock. Taking
advantage of it, referint_lock is not held if betxn is on.

. Miscellaneous
+ cos_cache.h: added '#include "ldaplog.h" and removed copied
LDAPDebug from cos.c and cos_cache.c.
+ cos_cache.c: added missing CR at the end of some error messages.
+ repl5_replica.c: removed (nscpentrydn=*) from searching tombstone
entry filter.
+ back-ldbm.h: increased RETRY_TIMES count from 50 to 1024.
+ entry.c: in addition to "true"|"false", "yes"|"no", and digits,
let slapi_entry_attr_get_bool accept "on"|"off".
+ mapping_tree.c: changed the log level of a warning "Mapping tree
node entry for "" point to an unknown backend" issued in mtn_get_be
to BACKLDBM". This message is logged at the start up time of Class
of Service plugin from the dse hook, which is benign.
}}}

The patch looks good.

You are still using bepreop for multimaster - why not betxn preop? Shouldn't those urp operations be done in the same transaction as the main database update and the betxn postops?

Why RETRY_COUNT 1024 instead of 50? Are you seeing a lot of DB_DEADLOCKs?

Replying to [comment:10 rmeggins]:

The patch looks good.

You are still using bepreop for multimaster - why not betxn preop? Shouldn't those urp operations be done in the same transaction as the main database update and the betxn postops?

First, I put them into betxn and ran into a problem, which was caused by this call in multimaster_bepreop_modify/modrdn. We should not clean up state info once transaction is started. :)
/ Clean up old state information /
purge_entry_state_information(pb);
And SLAPI_TXN_RUV_MODS_FN is set in bepreop, which is used before the transaction starts (ldbm_txn_ruv_modify_context). So, this should be done in bepreop, too.
slapi_pblock_set(pb, SLAPI_TXN_RUV_MODS_FN, (void *)replica_ruv_smods_for_op);

Regarding preop urp codes..., I thought their main task is examining the conflicts and the real fix is done in bepost, but I guess that's not true...

It looks I need to divide beporeop callbacks into 2 pieces: real bepreop and betxnpreop (urp).

Let me work on it...

Why RETRY_COUNT 1024 instead of 50? Are you seeing a lot of DB_DEADLOCKs?
At the early stage, I saw some and increased it. But once the code is settled, I don't see any deadlock reports. I think I'm going to revert it.

Thanks a lot, Rich!

In addition to the fixes following the Rich's suggestions (comment 11), the take 2 patch contains ...
. to register cos_post_op are POSTOP only to avoid the deadlock between the transaction and cos change_lock.
. to move transaction begin before "finding" an backend entry in ldbm_back_add/delete/modify/modrdn to avoid the deadlock between the serial lock and the individual entry cache lock.

the formatting in ldbm_back_modrdn() is off

So it looks like now the BE_PREOPERATION plugins are called inside of the transaction? Perhaps then we don't need BE_PREOPERATION plugins any more, and they should all be BETXN_PREOPERATION plugins instead?

Replying to [comment:14 rmeggins]:

the formatting in ldbm_back_modrdn() is off

Oops. Thanks!

So it looks like now the BE_PREOPERATION plugins are called inside of the transaction? Perhaps then we don't need BE_PREOPERATION plugins any more, and they should all be BETXN_PREOPERATION plugins instead?

Yeah, I thought about it, too. The difference is BE_PREOPERATION is called just once, but BETXN_PREOPERATION is called multiple times if libdb retry occurs. Plugin side may want to keep the difference of the behaviour?

git patch file (master) - take 2 (fixed ldbm_back_modrdn format)
0001-Trac-Ticket-351-use-betxn-plugins-by-default.3.patch

copy/paste entire sections of code inside the retry loop, just so we can do the target entry lookup inside the transaction . . . is there really no other way to do this?

Replying to [comment:16 rmeggins]:

copy/paste entire sections of code inside the retry loop, just so we can do the target entry lookup inside the transaction . . . is there really no other way to do this?

Currently, the code looks like this ...
for (repeat up to RETRY_TIMES) {
If not the first time, abort the transaction.
Start the transaction.
If it is the first time, run the ex-pre-transaction code.
Run the real in-transaction code.
If deadlock occurs, continue.
Otherwise, commit the transaction.
}

We could just re-format the code like this ...
(but basically, taking the same path...?)
Start the transaction.
Run the ex-pre-transaction code.
for (repeat up to RETRY_TIMES) {
Run the in-transaction code.
If deadlock occurs, abort the transaction, start the transaction and continue.
Otherwise, commit the transaction.
}

Replying to [comment:17 nhosoi]:

Replying to [comment:16 rmeggins]:

copy/paste entire sections of code inside the retry loop, just so we can do the target entry lookup inside the transaction . . . is there really no other way to do this?

Currently, the code looks like this ...
for (repeat up to RETRY_TIMES) {
If not the first time, abort the transaction.
Start the transaction.
If it is the first time, run the ex-pre-transaction code.
Run the real in-transaction code.
If deadlock occurs, continue.
Otherwise, commit the transaction.
}

We could just re-format the code like this ...
(but basically, taking the same path...?)
Start the transaction.
Run the ex-pre-transaction code.
for (repeat up to RETRY_TIMES) {
Run the in-transaction code.
If deadlock occurs, abort the transaction, start the transaction and continue.
Otherwise, commit the transaction.
}

Ok. I would rather leave it the way you have in your patch. Ack.

Thanks, Rich. Reliab15 revealed a bug in multimaster_be_betxnpostop_*. Debugging it now... {{{ Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f602bff7700 (LWP 26397)] 0x00007f604feaacee in csn_get_replicaid (csn=0x0) at ldap/servers/slapd/csn.c:193 193 return csn->rid; (gdb) bt #0 0x00007f604feaacee in csn_get_replicaid (csn=0x0) at ldap/servers/slapd/csn.c:193 #1 0x00007f604a29e901 in write_changelog_and_ruv (pb=0x1aa1270) at ldap/servers/plugins/replication/repl5_plugins.c:1216 #2 0x00007f604a29e3b8 in multimaster_be_betxnpostop_add (pb=0x1aa1270) at ldap/servers/plugins/replication/repl5_plugins.c:1016 #3 0x00007f604ff00941 in plugin_call_func (list=0x1adcd10, operation=560, pb=0x1aa1270, call_one=0) at ldap/servers/slapd/plugin.c:1453 #4 0x00007f604ff00801 in plugin_call_list (list=0x1aaf1d0, operation=560, pb=0x1aa1270) at ldap/servers/slapd/plugin.c:1415 #5 0x00007f604fefed0c in plugin_call_plugins (pb=0x1aa1270, whichfunction=560) at ldap/servers/slapd/plugin.c:398 #6 0x00007f604a8cd9d3 in ldbm_back_add (pb=0x1aa1270) at ldap/servers/slapd/back-ldbm/ldbm_add.c:992 #7 0x00007f604fe9ef31 in op_shared_add (pb=0x1aa1270) at ldap/servers/slapd/add.c:734 #8 0x00007f604fe9e001 in do_add (pb=0x1aa1270) at ldap/servers/slapd/add.c:258 #9 0x0000000000414740 in connection_dispatch_operation (conn=0x7f6044524e10, op=0x1a9a800, pb=0x1aa1270) at ldap/servers/slapd/connection.c:578 #10 0x000000000041611e in connection_threadmain () at ldap/servers/slapd/connection.c:2338 #11 0x00000033e7628c03 in ?? () from /usr/lib64/libnspr4.so #12 0x00000033e4607d14 in start_thread () from /usr/lib64/libpthread.so.0 #13 0x00000033e3ef167d in clone () from /usr/lib64/libc.so.6 (gdb) p csn $1 = (const CSN *) 0x0 }}}

Reviewed by Rich (Thank you sooooo much!)

Pushed to master.

$ git merge trac351-0
Updating 3c6756f..cd48bbd
Fast-forward
ldap/ldif/50replication-plugins.ldif | 1 +
ldap/ldif/template-dse.ldif.in | 18 +-
ldap/ldif/template-pampta.ldif.in | 2 +-
ldap/servers/plugins/cos/cos.c | 110 +--
ldap/servers/plugins/cos/cos_cache.c | 75 +-
ldap/servers/plugins/cos/cos_cache.h | 1 +
ldap/servers/plugins/memberof/memberof.c | 46 +-
ldap/servers/plugins/pam_passthru/pam_passthru.h | 1 +
ldap/servers/plugins/pam_passthru/pam_ptpreop.c | 190 +++--
ldap/servers/plugins/referint/referint.c | 65 +-
ldap/servers/plugins/replication/repl5.h | 11 +-
ldap/servers/plugins/replication/repl5_init.c | 178 ++++-
ldap/servers/plugins/replication/repl5_plugins.c | 155 +++-
ldap/servers/plugins/replication/repl5_replica.c | 2 +-
ldap/servers/plugins/replication/repl_extop.c | 1 +
ldap/servers/plugins/roles/roles_plugin.c | 6 +-
ldap/servers/slapd/back-ldbm/dblayer.c | 124 ++-
ldap/servers/slapd/back-ldbm/idl.c | 12 +-
ldap/servers/slapd/back-ldbm/idl_new.c | 45 +-
ldap/servers/slapd/back-ldbm/ldbm_add.c | 830 ++++++++++----------
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 629 ++++++++-------
ldap/servers/slapd/back-ldbm/ldbm_modify.c | 223 +++---
ldap/servers/slapd/back-ldbm/ldbm_modrdn.c | 1091 +++++++++++++-------------
ldap/servers/slapd/back-ldbm/ldif2ldbm.c | 22 +-
ldap/servers/slapd/back-ldbm/proto-back-ldbm.h | 18 +-
ldap/servers/slapd/backend.c | 28 +
ldap/servers/slapd/dse.c | 49 +-
ldap/servers/slapd/entry.c | 40 +-
ldap/servers/slapd/mapping_tree.c | 10 +-
ldap/servers/slapd/pblock.c | 5 +
ldap/servers/slapd/slapi-plugin.h | 34 +
31 files changed, 2303 insertions(+), 1719 deletions(-)

$ git push
Counting objects: 89, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (45/45), done.
Writing objects: 100% (45/45), 19.74 KiB, done.
Total 45 (delta 40), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
3c6756f..cd48bbd master -> master

Metadata Update from @rmeggins:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.3.0.a1

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/351

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata