if a plugin operation succeeded, but the operation itself fails and is aborted the RUV is in an incorrect state (rolled up to the succesful plugin op).
Here is test case: enable memberof and add two members to a group, the second has no objectclasss allowing memberof, so the complete operation fails with err=65
[13/Oct/2016:14:48:21 +0200] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 57ff838e000000640000 into pending list <<<< main MOD [13/Oct/2016:14:48:21 +0200] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 57ff838e000100640000 into pending list <<<< succesful memberof [13/Oct/2016:14:48:21 +0200] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 57ff838e000100640000 [13/Oct/2016:14:48:21 +0200] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 57ff838e000200640000 into pending list <<< failing memberof [13/Oct/2016:14:48:21 +0200] NSMMReplicationPlugin - csn=57ff838e000200640000 process postop: canceling operation csn [13/Oct/2016:14:48:22 +0200] NSMMReplicationPlugin - conn=866 op=1 csn=57ff838e000000640000 process postop: canceling operation csn [13/Oct/2016:14:48:22 +0200] NSMMReplicationPlugin - agmt="cn=to200" (localhost:4945): {replica 100 ldap://localhost:30522} 57f4cb95000000640000 57ff838e000100640000 57ff8295 nsds50ruv: {replicageneration} 57bf05e40000012c0000 nsds50ruv: {replica 100 ldap://localhost:30522} 57f4cb95000000640000 57ff838e000100640000
looks like the pending list mechanism does not work correctly
Looks like the behaviour was introduce by the fix for #359.
We have a real dilemma in how to rollup the pending list and decide when to update the RUV.
if we do it like it is done at the moment (after fix #359) the ruv gets updated even if there is a smaller csn not committed.
but if we do it like before, the uncommited csn would prevent to update the RUV, but if the parent operation is cancelled, it could potentially leave uncommitted csns in the pending list and would then prevent to update the RUV forever.
A fix has to pass the test for #359
This can be seen when having locker not able to resolve deadlocks. The messages that could show we are in presence of this bug are:
============================== [17/Oct/2016:10:32:07 +0000] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=5804e0db000800030000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [17/Oct/2016:10:32:07 +0000] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (5804e0db000800030000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [17/Oct/2016:10:32:07 +0000] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for <entry dn> (uniqid: 40d752a2-4fbb11e5-93a0d259-b00ed7d3, optype: 32) to changelog csn 5804e0db000800030000 [17/Oct/2016:10:32:07 +0000] - SLAPI_PLUGIN_BE_TXN_POST_DELETE_FN plugin returned error code but did not set SLAPI_RESULT_CODE [17/Oct/2016:10:32:07 +0000] agmt="<agmt dn>" (<host>:389) - Can't locate CSN 5804e0dc000200060000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized. =========================================
Thanks for the details, Ludwig. I agree the #359 introduced a bug.
And this proposal looks correct. https://fedorahosted.org/389/ticket/359#comment:23
Here is my summary and a proposal from an oflien discussion:
{{{ Problem:
A csn pending list is a list of CSNs which have not yet been used to update the maxcsn of the RUV. There are two scenarios wher it is necessary to keep a pending list and only update the RUV with a specific CSN of this list after all CSN in the list which are smaller are committed.
Scenario 1: There are parallel write operations.
Assume operations 1,2,3 are started with csn_1 < csn_2 < csn_3. If op_3 completes first and would commit and update the ruv, the server would send op_3 to a consumer, the consumer would update it's RUV with csn_3 and if later op_1 and op_2 are committed they would not be sent to the consumer because their csn < csn_3.
NOTE: this scenario is not possible with the current use of the backend lock, but I think this very scenario was the reason to introduce the pending list.
Scenario 2: Internal operations inside the main operation.
An operation is started with csn_1. Plugins are called and can trigger internal updates with csn_2,....csn_5. All these operations are done in nested transactions- If one of the plugins or the main operation fails, the transactions are aborted and undone. None of the changes will survive and they will not be in the changelog. If any of the CSNs csn_1 ... csn_5 would have updated the RUV then the RUV would be ahead of the changelog and cursor position will fail. Therefore the pending list should ensure that only after all the csns 1..5 have successfully be committed the RUV is updated.
There are two variants of this scenario:
Scenario 2.1: the main operation is a direct client operation, so all csns have the same replica id
Scenario 2.2: the main operation is a replicated operation, so the csn for the main operation and the internal operation have different replica ids
NOTE: There is a nightmare variant of this scenario:
Scenario 2.3: One of the plugins triggers an internal update in an otehr backend with different RUV and pending lists.
What is currently implemented ?
We have two types of csn pending list.
1] A pending list of min csns maintained in the replica object.
This is a very specific list and only used until the min csn in the RUV for the local replicaID is set.
(I think this could be achieved by using the ruv pending list, see below)
2] a pending list in each ruv element
When a csn is created or received (either in replica_generate_next_csn() or in process_operation() in mmr preop), it is inserted into the pending list of the ruv element with the corresponding replica ID.
When an operation fails, in process_postop the csn is cancelled and removed from the pending list.
When an operation succeeds write_changelog_and_ruv() triggers a special handling of the pending list.
It sets the state of the csn of the completed operation to committed and it rolls up the pending list to detect the highest committed csn and updates the RUV with this csn.
What are the problems with the current implementation ?
All csns in the pending list are seen as csns of independent operation, the fact that csn of internal ops are tied to the main op is not reflected. If the main op is aborted the ruv can already have been updated with a csn from an internal op. Since the fix for ticket #359 the pending list rollup ignores not committed csns and selects the highest commited csn to update the ruv the pending lists for the different ruv elements are treated independently. Even if the fix for #359 would be reverted, this would handle only scenario 2.1, not 2.2 there is an asymmetry between committing and cancelling an csn. The commit is inside the txn and backend lock, wheres the cancelling is in postop and in some cases (urp) the ldap_errror could already have been reset and the csn will not be cancelled and remain uncommitted (this was the scenario which was fixed in #359)
What should be done ?
Here is a proposal for redesign of pending list management.
First some terminology:
Primary csn: the csn of the external operation, direct or replicated
Secondary csn: csns of internal operations, generated inside the op with the primary csn
Full operation (need a better name): the set of operations for a primary and its secondary csns
And some conditions:
All operations for a primary csn and its secondary csns are processed in one thread
There are only one or two replica IDs used in the pending list for one replica in a full operation. In the scenario 2.3 there are several replicas affected, but in each this condition holds.
Proposal:
Maintain the primary csn in thread local data, like it is used for the agreement name (or txn stack): prim_csn.
Extend the data structure of the pending list to keep prim_csn for each inserted csn
If a csn is created or received check prim_csn: if it exists use it, if it doesn't exist set it
when inserting a csn to the pending list pass the prim_csn
when cancelling a csn, if it is the prim_csn also cancell all secondary csns
when committing a csn,
if it is not the primary csn, just set the committed flag
if it is the prim_csn trigger the pending list rollup, stop at the first not committed csn
if the RID of the prim_csn is not the local RID also rollup the pending list for the local RID.
Does this also handle scenario 2.3, where a full operation spans more than one backend ?
No, but it could be extended and instead of only staring a prim_csn in the thread data also store a refrence to the affected replica. Cancelling or committing could then run replica_update_ruv for multiple replicas. }}}
The proposal might not work in some deployments.
The basic idea is: an operation is started, direct update or replicated, then the csn of this op is the primary csn then plugins are called and can generate mods with new csns when the main operation is committed or aborted the pending list is rolled up and the RUV is updated or the pending list is cleaered
this nesting of ops and committing is true for the transaction, but the ruv update is done by the repl plugin in the txn_post call in write_chnagelog_and_ruv and since replication is a plugin itself the write_changelog_and_ruv for the primary csn could be called and then another plugin could generate an additional mod.
so if repl pugin is called after the plugins triggering internal ops, everything is fine, if not no guarantee.
I have been discussing this already today with Thierry and there are a few options to handle this:
change the precedence of the repl plugin so that it will always be last
move the write_changelog_and_ruv out of the plugin call, this is possible by a meachnism already used in managing the ruv_context, register a function in the pblock, call it between plugin_call_plugins() and dblayer_txn_[commit|abort]
split the write_changelog and update_ruv, do the write_changelog stll inside the txn and do the update_ruv in the postop
attachment 0001-ticket-49008-test-implementation-of-new-csn-pending-.patch
Just a minor commend about the patch, when prim_csn is set in thread private (ruv_add_csn_inprogress line 1648), it calls
{{{ set_thread_primary_csn(csn_dup(csn)); }}}
But set_thread_primary_csn also duplicate the csn before setting it into the thread private. The first duplicate will leak.
good catch, it was introduced in the last moment, when I looked at th compiler warnings and it said that
{{{ set_thread_primary_csn(csn); }}} looses the const from csn, and i just thought, yes it will be freed, so it has to be duplictaed. we need to remove one of the csn_dup() calls
Also, if setting NULL it should free the current contents of the thread data.
Replying to [comment:12 mreynolds]:
this is done in the destructor provided here:# {{{ PR_NewThreadPrivateIndex (&thread_primary_csn, csnplFreeCSN); }}}
Replying to [comment:13 lkrispen]:
Replying to [comment:12 mreynolds]: Also, if setting NULL it should free the current contents of the thread data. this is done in the destructor provided here:# {{{ PR_NewThreadPrivateIndex (&thread_primary_csn, csnplFreeCSN); }}}
Yup, my bad, thanks for confirming!
Reading again the patch it looks really great. Just minor points:
Thanks for looking into this again.
The dup of the prim_csn is just an error (I was removing compiler warnings and did the wrong thing). Adding a comment is also a good idea
In my concept csn should onlöy be comitted when the primary csn is committed, otherwise the pending list rollup could updatet the RUV with a csn which still can be aborted
I fully agree and prefer the option to commit sub-csn in a raw when the primary is committed. My remark was just about https://fedorahosted.org/389/ticket/49008#comment:7 In that update you mentioned
{{{ when committing a csn,
if it is not the primary csn, just set the committed flag }}}
That is not exactly what is implemented. But as I said I prefer your implementation
attachment 0001-Ticket-49008-aborted-operation-can-leave-RUV-in-inco.patch
attachment ticket49008_test.py
Test case looks good to me. And it passes with the fix.
Very nice! However, there are indentation issues in csnplCommitAll() :)
attachment 0001-Ticket-49008-Add-CI-test.patch
I've cleaned up the test case a bit. Feel free to add it to your commit or I can push it for you, if you're okay with the changes.
attachment 0001-Ticket-49008-v2-aborted-operation-can-leave-RUV-in-i.patch
Replying to [comment:20 mreynolds]:
seem new patch
Yes very nice job. Just minor thing, csnplCommitAll and csnplRemoveAll do always return 0. It could simplify the code not testing their return value.
thanks for the comment - you are right, but if you do not insist on it I would like to keep it as is. I did not change the logic of the caller when moving from csnplCommit/Remove to ..All, at the moment I do not see when they should return an error, but I am not sure it will stay like this Also for the csnplCommitAll call there are two different log messages depending on the rc, now only the "success" info msg will be logged, but I would not like to not check an rc and always log success.
I agree, failures of those routines (even if they do not occur now) should be caught. So I am fine with current patch (except the indentation reported by Mark). You have my ACK
attachment 0001-Ticket-49008-backport-1.3.5-aborted-operation-can-le.patch
backported patch to 1.3.5 attached
committed to master:
commit 0c7ef56
Add CI test written by Ludwig Krispenz lkrispen@redhat.com.
To ssh://git.fedorahosted.org/git/389/ds.git 0c7ef56..360d797 master -> master commit 360d797 Author: Simon Pichugin spichugi@redhat.com Date: Tue Jan 17 15:32:38 2017 +0100
committed to 1.3.5:
commit 79a3dea
attachment 0001-fix-regression-of-patch-for-49008-handle-read-only-r.patch
0001-fix-regression-of-patch-for-49008-handle-read-only-r.patch
Thanks for the patch, Ludwig!
I have one question... Is the change ok for HUB? It shares the same RID?
If it works, you have my ack.
attachment 0001-fix-for-reg-in-49008-check-if-ruv-element-exists.patch
The replicaid on a hub should also be the read_only_replicaid, but I am not definitely sure what happens when downgrading a master to a hub. An alternative patch would be to check if the ruv element exists, new patch attached
Metadata Update from @tbordaz: - Issue set to the milestone: 1.2.11.33
Metadata Update from @mreynolds: - Custom field component reset (from Replication - General) - Custom field reviewstatus reset (from review?) - Issue close_status updated to: None
Adjust CI test for new MO plugin behavior
<img alt="0001-Ticket-49008-Adjust-CI-test-for-new-memberOf-behavio.patch" src="/389-ds-base/issue/raw/7e35aa3c1a53726fc0b48d24bde9162ad6ab6baa2fc640c1237efcf16c87dad0-0001-Ticket-49008-Adjust-CI-test-for-new-memberOf-behavio.patch" />
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review
That's clever. I think I did something else to fix this but ack from me!
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to ack (was: review)
Yeah I'm worried about that, as we are making MO plugin bullet proof. But it's an easy plugin to trigger errors to test other conditions. Most other plugins ignore replicated ops. So if we keep enforcing proper values for auto objectclass it could actually break some CI tests like it did to this one. But that's a discussion for the other issue.
5d5b9c6..d74ba63 master -> master
04904b6..d77e285 389-ds-base-1.3.6 -> 389-ds-base-1.3.6
Metadata Update from @mreynolds: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Fix MO betxn CI test suite:
e1dca08..d42024e master -> master
2141ea7..6bd1b7d 389-ds-base-1.3.6 -> 389-ds-base-1.3.6
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2067
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Login to comment on this ticket.