Here is the test case I used:
Setup 2 masters M1 and M2 pause RA M1->M2 and RA M2->M1 On M1: delete an entry (.e.g cn=user1,cn=staged users,dc=example,dc=com) mod a test entry (.e.g cn=test1,dc=example,dc=com) sleep 1s so that delete.csn and modrdn.csn are different On M2: modrdn the entry on M2 mod a test entry on M2 (.e.g cn=test2,dc=example,dc=com) resume RA M1->M2 and RA M2->M1 Check replication is working Check the status of the DEL/MODRDN entry on both server
A good point is that I was not able reproduce a replication failure, so the replication is not broken (the mods on test entries are always successfully replicated). I did the following tests on the master branch, to check the final state of the tombstone on both server
rename (same rdn + delold=0 + same superior) the entry is identical on both servers
rename (same rdn + delold=1 + same superior) the entry is identical on both servers
rename (change rdn (new_account1 -> new_account1_modrdn)+ delold=0 + same superior) the entry differs M1[dn] = nsuniqueid=1708c18c-c56711e3-a07accf0-3a563faf,cn=new_account1,cn=staged user,dc=example,dc=com M2[dn] = nsuniqueid=1708c18c-c56711e3-a07accf0-3a563faf,cn=new_account1_modrdn,cn=staged user,dc=example,dc=com M2[cn] = new_account1_modrdn
*rename (change rdn (new_account2 -> new_account2_modrdn) + delold=1 + same superior) M1[dn] = nsuniqueid=1708c18d-c56711e3-a07accf0-3a563faf,cn=new_account2,cn=staged user,dc=example,dc=com M2[dn] = nsuniqueid=1708c18d-c56711e3-a07accf0-3a563faf,cn=new_account2_modrdn,cn=staged user,dc=example,dc=com M1[cn] = new_account2 M2[cn] = new_account2_modrdn
rename (same rdn + delold=0 + new superior) the entry differs M1[dn] = nsuniqueid=1708c190-c56711e3-a07accf0-3a563faf,cn=new_account5,cn=staged user,dc=example,dc=com M2[dn] = nsuniqueid=1708c190-c56711e3-a07accf0-3a563faf,cn=new_account5,cn=accounts,dc=example,dc=com M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563fafldjf
rename (same rdn + delold=1 + new superior) M1[dn] = nsuniqueid=1708c191-c56711e3-a07accf0-3a563faf,cn=new_account6,cn=staged user,dc=example,dc=com M2[dn] = nsuniqueid=1708c191-c56711e3-a07accf0-3a563faf,cn=new_account6,cn=accounts,dc=example,dc=com M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563faf
rename (change rdn (new_account7->new_account7_modrdn)+ delold=0 + new superior) M1[dn] = nsuniqueid=1708c192-c56711e3-a07accf0-3a563faf,cn=new_account7,cn=staged user,dc=example,dc=com M2[dn] = nsuniqueid=1708c192-c56711e3-a07accf0-3a563faf,cn=new_account7_modrdn,cn=accounts,dc=example,dc=com M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf M2[cn] = new_account7_modrdn M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563faf
attachment ticket47783_test.py
no cloning, upstream tests already written.
Hi Thierry, I tried to run the reproducer ticket47783_test.py, but so far no luck.
First, it failed since this constants not found. So, I removed the line. {{{ 20d19 < from constants import }}} Then, it fails to create an instance at the line 180. {{{ 179 # Create the instances 180 master1.create() }}} This is the last part of the output from the test script: {{{ /home/nhosoi/.dirsrv/dirsrv- /home/nhosoi/install/etc/sysconfig/dirsrv-* Adding group dirsrv Traceback (most recent call last): File "/export/src/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1163, in <module> run_isolated() File "/export/src/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1143, in run_isolated topo = topology(True) File "/export/src/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 180, in topology master1.create() File "/export/src/389tests/lib389/lib389/init.py", line 793, in create self._createDirsrv(verbose=self.verbose) File "/export/src/389tests/lib389/lib389/init.py", line 747, in _createDirsrv DirSrvTools.lib389User(user=DEFAULT_USER) File "/export/src/389tests/lib389/lib389/tools.py", line 860, in lib389User DirSrvTools.makeGroup(group=user) File "/export/src/389tests/lib389/lib389/tools.py", line 841, in makeGroup subprocess.Popen(cmd) File "/usr/lib64/python2.7/subprocess.py", line 711, in init errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 13] Permission denied }}} Could you please tell me what is wrong with my attempt to run the script? I guess I must miss something in the procedure... Please note that I'm using the master branch of lib389 which is up-to-date.
Hi Noriko,
The test case is running fine on my laptop, even with lib389 master up to date.
When running as 'root', it starts instances belonging to default user/group 'dirsrv'. When running as regular user (e.g. xyz), instances will belong to the user/group 'xyz'.. but it checks/creates user/group 'dirsrv' (although if it not useful in that case).
On my system I created that user/group -> dirsrv/dirsrv and I think it is the reason why it succeeded. Would you check if those user/group exist on your machine. If not, could you create them and rerun the test ?
Replying to [comment:4 tbordaz]:
Hi Noriko, The test case is running fine on my laptop, even with lib389 master up to date. When running as 'root', it starts instances belonging to default user/group 'dirsrv'. When running as regular user (e.g. xyz), instances will belong to the user/group 'xyz'.. but it checks/creates user/group 'dirsrv' (although if it not useful in that case). On my system I created that user/group -> dirsrv/dirsrv and I think it is the reason why it succeeded. Would you check if those user/group exist on your machine. If not, could you create them and rerun the test ?
I just made an update to lib389 to use 'dirsrv;dirsrv" when running lib389 as 'root'. You do not need to create this user/group, lib389 will do it for you if it does not already exist. Noriko, make sure you do a 'git pull' on your lib389 source.
Replying to [comment:6 mreynolds]:
Replying to [comment:4 tbordaz]: Hi Noriko, The test case is running fine on my laptop, even with lib389 master up to date. When running as 'root', it starts instances belonging to default user/group 'dirsrv'. When running as regular user (e.g. xyz), instances will belong to the user/group 'xyz'.. but it checks/creates user/group 'dirsrv' (although if it not useful in that case). On my system I created that user/group -> dirsrv/dirsrv and I think it is the reason why it succeeded. Would you check if those user/group exist on your machine. If not, could you create them and rerun the test ? I just made an update to lib389 to use 'dirsrv;dirsrv" when running lib389 as 'root'. You do not need to create this user/group, lib389 will do it for you if it does not already exist. Noriko, make sure you do a 'git pull' on your lib389 source.
Hi Mark,
yes, lib389 will create user/group... at the condition it is run as root. If on a brand new machine, a regular user runs a test case it will not be allowed create dirsrv/dirsrv user/groups.
Thank you, Thierry. That was it. I created a user dirsrv (I already had a group dirsrv), the test started running!
Now I'm getting this assertion failure. {{{ Update succeeded: status 0 Total update succeeded Traceback (most recent call last): File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1163, in <module> run_isolated() File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1148, in run_isolated test_ticket47783_2(topo) File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 609, in test_ticket47783_2 _status_entry_both_server(topology, name=name, desc="chg rdn + delold=0 + same superior", debug=DEBUG_FLAG) File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 297, in _status_entry_both_server assert ent_m1.dn == ent_m2.dn AssertionError }}} This assertion failure means I could reproduce the bug? Thanks!
It is an URP issue (or issues :).
Master2 {{{ [18/Aug/2015:16:14:36 -0700] conn=3 op=25 MODRDN dn="cn=new_account1,cn=staged user,dc=example,dc=com" newrdn="cn=new_account1_modrdn" newsuperior="(null)" [18/Aug/2015:16:14:36 -0700] conn=3 op=25 RESULT err=0 tag=109 nentries=0 etime=0 csn=55d3bc5d000000020000 ... [18/Aug/2015:16:14:37 -070}}}0] conn=6 op=3 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [18/Aug/2015:16:14:37 -0700] conn=6 op=3 RESULT err=0 tag=120 nentries=0 etime=0 [18/Aug/2015:16:14:37 -0700] conn=6 op=4 DEL dn="cn=new_account1,cn=staged user,dc=example,dc=com" <=== REPLICATED OP [18/Aug/2015:16:14:37 -0700] conn=6 op=4 RESULT err=0 tag=107 nentries=0 etime=0 csn=55d3bc5d000000010000 }}} Master1 {{{ [18/Aug/2015:16:14:36 -0700] conn=3 op=42 DEL dn="cn=new_account1,cn=staged user,dc=example,dc=com" [18/Aug/2015:16:14:36 -0700] conn=3 op=42 RESULT err=0 tag=107 nentries=0 etime=0 csn=55d3bc5d000000010000 ... [18/Aug/2015:16:14:39 -0700] conn=6 op=5 MODRDN dn="cn=new_account1,cn=staged user,dc=example,dc=com" newrdn="cn=new_account1_modrdn" newsuperior="(null)" <=== REPLICATED OP [18/Aug/2015:16:14:39 -0700] conn=6 op=5 RESULT err=0 tag=109 nentries=0 etime=0 csn=55d3bc5d000000020000 }}} In the URP code, conflict modrdn is skipped if the target entry is already deleted. {{{ 268 / 269 * Return 0 for OK, -1 for Error, >0 for action code 270 * Action Code Bit 0: Fetch existing entry. 271 * Action Code Bit 1: Fetch parent entry. 272 / 273 int 274 urp_modrdn_operation( Slapi_PBlock pb ) 275 { ... 377 slapi_log_error(SLAPI_LOG_FATAL, sessionid, 378 "urp_modrdn (%s): target entry is a tombstone.\n", 379 slapi_entry_get_dn_const(target_entry)); 380 rc = SLAPI_PLUGIN_NOOP; / Ignore the modrdn */ }}} But in this test case, delete on M1 is done prior to modrdn on M2. In the case, delete on M1 should win? On M2, URP should undo the modrdn, then apply the delete? Probably, the answer is yes. On Master2, conn=6 op=4 DEL dn="cn=new_account1..." deletes the entry even if the DN is different since repl op could use nsuniqueid to identify the entry. Probably, we have to do compare the DN and adjust it to the one which wins...
But the issue could be larger than that. If multiple modifications are done on each masters separately then replication was resumed, what should we do?
For instance, {{{ M1 M2 ----------+---------- add entry->replicated replication paused add attr1 val1 add attr1 val1 mod attr1 val1' del entry mod attr1 val1" add attr2 val2 rename to newentry replication resumed ----------+---------- }}} In this case, what is the correct tombstone to be created on the masters? {{{ dn: entry <== original, not newentry since it was renamed after deletion? attr1: val1' <== last value before the deletion? (no attr2 since it was added to M1 after the deletion on M2?) }}} If the replication is always enabled, then mod attr1: val" and the rest won't be accepted since the entry is already deleted. But if the replication resumed once all of the operations are done on each master, the order of the replay is not fixed. This requires the UPR code to traverse the history (maybe using the timestamps in the CSN?) and adjust with the current operation?
IMHO, the order of replay is fixed and follow the CSN order (possibly down to subseq number). In the described test case on M2 'attr1: val1"' and 'attr2: val2' are applied on the tombstone then the modrdn should trigger a 'rename' of the tombstone. IMHO this looks a bit complexe to implement and I do not see much benefit as the entry is now a tombstone.
on M1 'attr1: val1' and 'attr1: val1'' are applied on the renamed entry, then the renamed entry is deleted.
That is right, the entry diverge on both server. But considering it is a corner case and there is no benefit (the entry is deleted), I think it is a minor issue and could be risky to fix.
Thank you for the input, Thierry. I'm pushing this to 1.3.6 backlog.
Metadata Update from @tbordaz: - Issue set to the milestone: 1.3.6 backlog
Metadata Update from @mreynolds: - Custom field rhbz reset (from 0) - Issue set to the milestone: 1.3.7 backlog (was: 1.3.6 backlog)
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to None - Issue set to the milestone: 1.4 backlog (was: 1.3.7 backlog)
Metadata Update from @mreynolds: - Issue set to the milestone: 1.4.4 (was: 1.4 backlog)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1115
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.