In a master-master-consumer topology where the account policy plugin is enabled on all servers with always record lastlogin time itcn happe in specific scenarios that on the consumer invalid/ unneeded state information accumulates like:
nscpentrywsi: lastLoginTime;adcsn-579b491a000100c80000;vdcsn-579b491a000100c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;adcsn-579b48ff000100c80000;vdcsn-579b48ff000100c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;vdcsn-579b48dd000100c80000;deletedattribute;deleted:
This can be reproduced by the following steps:
Stop M1,M2,C1 Start M1 bind user1 on M1, lastlogin will be updated with csn time T1 Stop M1 Start M2 bind user1 on M2, lastlogin will be updated with csn time T2>T1 Stop M2 Start C1 Start M2 to ensure the change with T2 is replicated earlier than T1 we now have this change properly updated nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000;vucsn-5799e628000000c80000: 20160728110200Z nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted: Now bind user1 to C1, for a readonly replica no csn is generated, the value is replaced and the adcsn is kept. nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000: 20160728110449Z nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted: Now start M1 to receive the older csn, and the weirdness starts: nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000;vdcsn-5799e628000000c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted: there is no more proper lastlogintimevalue !!! Bind on read only replica again: nscpentrywsi: lastLoginTime: 20160728110525Z nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000;vdcsn-5799e628000000c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted: nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted: There is a new value without any csn, the deleted are still there Bind to M1 again and there will be a normal value plus deleted,
The core problem is that if the BIND is done on the read only replica, the lastlogintime is updated on the replica and it does not get a csn assigned, So the entry is updated without entry state resolution and an existing adscn is preserved, but the newly added value does not have a vdcsn.
When the next update for this attribute from a master is received it depends on the comparison of the csn of the update to the existing adcsn. If it is older, entry state resolution fails.
How to fix this ?
One option would be to assign a csn to local mods with replica ID ffff, then entry state resoultion should work properly, but it would have to be investigated if the presence of csns with the readonly replica id would cause problems
The other option is to fix entry state resolution for this specific scenario. This would probably resolve the current problem, but would not address the general problem of updates on readonly replicas
We have this same problem with password policy attributes. How do we handle that case?
Another option is - only support read-only replicas if using chain on bind or update - http://www.port389.org/docs/389ds/howto/howto-chainonupdate.html
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1364277
Target milestones: 1.3.5.13 and 1.2.11.33.
Pushed to 1.3.6.0 and 1.2.11.33.
I have another scenario to reproduce this, and did generate more accumulated values than with the manual steps.
Create a two master, one consumer topology Enable account policy plugin, always write last login Pick a set of users and create a binddn file for ldclt, eg
{{{ cn=Abbate,ou=people,dc=example,dc=com password cn=Abbate,ou=people,dc=example,dc=com password cn=Abbate,ou=people,dc=example,dc=com password cn=Abbate,ou=people,dc=example,dc=com password cn=Abbie,ou=people,dc=example,dc=com password cn=Abbie,ou=people,dc=example,dc=com password cn=Abbie,ou=people,dc=example,dc=com password cn=Abbie,ou=people,dc=example,dc=com password cn=Abbott,ou=people,dc=example,dc=com password cn=Abbott,ou=people,dc=example,dc=com password cn=Abbott,ou=people,dc=example,dc=com password cn=Abbott,ou=people,dc=example,dc=com password }}} list the same dn several times to increase the chance of concurrent binds with the same dn. (not sure if this is really needed).
run an ldclt load test against all three instances.
{{{ ldclt -h localhost -p 4011 -e "bindeach,bindonly" -e randombinddnfromfile=bindfile4s.txt -n 10 -W 1 -e esearch -f "objectclass=organizationalunit" }}}
after some time stop the load and check last login time, it will produce output like:
{{{ ================================== cn=Adeney ============================== nscpentrywsi: lastLoginTime;adcsn-57d27802000100030000: 20160909085120Z nscpentrywsi: lastLoginTime;adcsn-57d2752f001500020000;vdcsn-57d2752f001500020 nscpentrywsi: lastLoginTime;adcsn-57d2706f001100030000;vdcsn-57d2706f001100030 ================================== cn=Aderhold ============================== nscpentrywsi: lastLoginTime;adcsn-57d27800000800030000: 20160909085124Z ================================== cn=Adey ============================== nscpentrywsi: lastLoginTime;adcsn-57d27806001f00020000;vucsn-57d27806001f00020 nscpentrywsi: lastLoginTime;adcsn-57d2776f000400020000;vdcsn-57d2776f000400020 nscpentrywsi: lastLoginTime;adcsn-57d2766b001400020000;vdcsn-57d2766b001400020 nscpentrywsi: lastLoginTime;adcsn-57d27669001200020000;vdcsn-57d27669001200020 nscpentrywsi: lastLoginTime;adcsn-57d27542002600020000;vdcsn-57d27542002600020 nscpentrywsi: lastLoginTime;adcsn-57d274ec000000020000;vdcsn-57d274ec000000020 nscpentrywsi: lastLoginTime;adcsn-57d274bf000f00020000;vdcsn-57d274bf000f00020 nscpentrywsi: lastLoginTime;adcsn-57d26f17001c00030000;vdcsn-57d26f17001c00030 ================================== cn=Adhem ============================== nscpentrywsi: lastLoginTime;adcsn-57d27806001d00020000: 20160909085124Z nscpentrywsi: lastLoginTime;adcsn-57d275bd001c00030000;vdcsn-57d275bd001c00030 }}}
attachment 0001-Ticket-48944-on-a-read-only-replica-invalid-state-in.patch
the attached patch is for master, code is different from 1.2.11.
it stops the accumulation of invalid state, but does not remove existing invalid data - this would have to be cleaned by import/export
Replying to [comment:8 lkrispen]:
after the purgedelay is passed the entry is cleaned up
Ludwig, your fix looks good to me.
attachment 0001-Ticket-48944-backport-1.2.11-on-a-read-only-replica-.patch
backport to 1.2.11 attached
Thank you, Ludwig! You have my ack.
Could you please push it to the 389-ds-base-1.2.11 branch?
Hi Ludwig,
Is it ok to close this ticket as fixed? (hint hint :)
Thanks!
Metadata Update from @nhosoi: - Issue assigned to lkrispen - Issue set to the milestone: 1.3.6.0
Metadata Update from @sramling: - Custom field reviewstatus adjusted to review (was: ack) - Issue close_status updated to: None
Adding test script to test this from Pytest.
<img alt="0001-Add-tests-for-ticket48944.patch" src="/389-ds-base/issue/raw/432908d5b54a1caa5f01b1fea5f80490789db7339e0387d5caf059a025296c70-0001-Add-tests-for-ticket48944.patch" />
Fixed master1 to master2. <img alt="0001-Add-tests-for-ticket48944.patch" src="/389-ds-base/issue/raw/15e00718f29a41f61c00e052022b78d8f2700ecbe8af0a98d9f55771bd2e9678-0001-Add-tests-for-ticket48944.patch" />
Thanks for the test case, it looks good and it passes. A few thing we can improve, so it will be easier to others to understand what's going on.
I think it will be better 'as topo' here:
from lib389.topologies import topology_m2c2 as topo
Please, use 'logging' instead of writting comments:
# Checking if entries are synced across masters and cosumers log.info('Checking if entries are synced across masters and cosumers') # you have typo here, BTW
Use one standard for your variables. You have plenty of formats - n1c1entry, m2nlastLogin, nm2entry, c2n3lastLogin. All these 'n1', 'n2', 'n' are really confusing. I think, when you do a search, it is okay to use just 'entry', there is no need to make it unique. And for 'lastLoginTime' you can use something like this:
entry = topo.cs['consumer1'].search_s(userdn, ldap.SCOPE_BASE, 'objectClass=*', ['lastLoginTime']) lastLogin_c1_4 = entry[0].lastLoginTime
Also, you can wrap these two lines into one function and remove some redundant code.
Thanks for the test case, it looks good and it passes. A few thing we can improve, so it will be easier to others to understand what's going on. Sure I think it will be better 'as topo' here: from lib389.topologies import topology_m2c2 as topo this is how other replication tests are written. So, thought will stick in to that format. In fact, I initially wrote as topo. Please, use 'logging' instead of writting comments: Checking if entries are synced across masters and cosumers log.info('Checking if entries are synced across masters and cosumers') # you have typo here, BTW Sure, will fix that Use one standard for your variables. You have plenty of formats - n1c1entry, m2nlastLogin, nm2entry, c2n3lastLogin. All these 'n1', 'n2', 'n' are really confusing. I think, when you do a search, it is okay to use just 'entry', there is no need to make it unique. And for 'lastLoginTime' you can use something like this: entry = topo.cs['consumer1'].search_s(userdn, ldap.SCOPE_BASE, 'objectClass=*', ['lastLoginTime']) lastLogin_c1_4 = entry[0].lastLoginTime Sure, I will make changes. Also, you can wrap these two lines into one function and remove some redundant code. Sure, will do.
Sure
I think it will be better 'as topo' here: from lib389.topologies import topology_m2c2 as topo
this is how other replication tests are written. So, thought will stick in to that format. In fact, I initially wrote as topo.
log.info('Checking if entries are synced across masters and cosumers') # you have typo here, BTW Sure, will fix that
Use one standard for your variables. You have plenty of formats - n1c1entry, m2nlastLogin, nm2entry, c2n3lastLogin. All these 'n1', 'n2', 'n' are really confusing. I think, when you do a search, it is okay to use just 'entry', there is no need to make it unique. And for 'lastLoginTime' you can use something like this: entry = topo.cs['consumer1'].search_s(userdn, ldap.SCOPE_BASE, 'objectClass=*', ['lastLoginTime']) lastLogin_c1_4 = entry[0].lastLoginTime Sure, I will make changes.
Also, you can wrap these two lines into one function and remove some redundant code. Sure, will do.
@spichugi , I have incorporated all your comments.
<img alt="0001-Add-tests-for-ticket48944.patch" src="/389-ds-base/issue/raw/e0037598fd75ac8ddb383abf80193b156756e540f4dc861615ed3bc4792d4755-0001-Add-tests-for-ticket48944.patch" />
Metadata Update from @spichugi: - Custom field reviewstatus adjusted to ack (was: review)
3cf8563..e1dca08 master -> master commit e1dca08 Author: Sankar Ramalingam sramling@redhat.com Date: Thu Jun 8 20:36:59 2017 +0530
Metadata Update from @spichugi: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2003
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Login to comment on this ticket.