There is a scenario where the added keep alive entry is in the database but is missing from the changelog. When this entry is updated the replication of the MOD fails because the ADD of the entry is not replicated.
It looks like the changelog is recreated just after adding the keep alive entry
[29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - Need to create replication keep alive entry <cn=repl keep alive 3,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com> [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - add dn: cn=repl keep alive 3,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com objectclass: top objectclass: ldapsubentry objectclass: extensibleObject cn: repl keep alive 3 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 56ab4413000100030000 into pending list [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetDBFileByReplicaName: no DB object found for database /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: semaphore /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d.sema [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: maxConcurrentWrites=2 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetEntryCount: 0 changes for replica 2abae513-c67611e5-b4f3d71a-a8c5143d [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5AddDBFile: Added new DB object 55a280d8fb30 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpenFileByReplicaName: created new DB object 55a280d8fb30 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetDBFileByReplicaName: found DB object 55a280d8fb30 for database /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 56ab4413000100030000 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - conn=0 op=0 repl="dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com": Released replica held by locking_purl=conn=5 id=5 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - replica_enable_replication: replica dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com is relinquished [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): No linger to cancel on the connection [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): Disconnected from the consumer [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): State: start -> ready_to_acquire_replica [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): Trying non-secure slapi_ldap_init_ext [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): binddn = , passwd = [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): Unable to acquire replica: permission denied. The bind dn "" does not have permission to supply replication updates to the replica. Will retry later. [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): Beginning linger on the connection [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - agmt="cn=meTovm-058-073.abc.idm.lab.eng.brq.redhat.com" (vm-058-073:389): State: ready_to_acquire_replica -> start_backoff [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetDBFile: found DB object 55a280d8fb30 for database /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBDeleteFile: removed DB object 55a280d8fb30 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBCloseFile: Closing database /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBCloseFile: Closed the changelog database handle for /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db (rc: 0) [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBCloseFile: removing the changelog /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db (flag 256) [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBCloseFile: Deleted the changelog database file /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4413000000030000.db [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetDBFileByReplicaName: no DB object found for database /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d_56ab4397000000040000.db [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: semaphore /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-a8c5143d.sema [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: maxConcurrentWrites=2 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetEntryCount: 0 changes for replica 2abae513-c67611e5-b4f3d71a-a8c5143d [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5AddDBFile: Added new DB object 7f02342026d0 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpenFileByReplicaName: created new DB object 7f02342026d0 [29/Jan/2016:11:50:59 +0100] NSMMReplicationPlugin - changelog program - _cl5GetDBFileByReplicaName: found DB object 7f02342026d0 for database /var/lib/dirsrv/slapd-ABC-IDM-LAB-ENG-BRQ-REDHAT-COM/cldb/2abae513-c67611e5-b4f3d71a-
On F23 freeipa 4.3 / 389-ds 1.3.4.6-1 a Test case is:
{{{ Master: ipa-server-install --hostname=<VM-MASTER-FQDN> -n <domain> -r <realm> -p Secret123 -a Secret123 -U
Replica: edit /usr/share/dirsrv/data/template-dse.ldif to set nsslapd-errorlog-level: 8192 ipa-client-install --server=<VM-MASTER-FQDN> --domain=<domain> --realm=<realm> -p admin -w Secret123 -U echo Secret123 | kinit admin ipa-replica-install
}}}
attachment 0001-Tentative-fix-for-keep-alive-ADD-missing.patch
This fix prevent the missing of 'ADD keep alive' issue, but need more thinking about other possible impact to not create a keep alive entry when the replica is enabled
To verify that this fix still allows the creation (on need) of the keep alive entry. We can run the following test case:
{{{
ldapsearch -LLL -D "cn=directory manager" -w Secret123 -s one -b "<suffix>" "(objectclass=ldapsubentry)"
ipa user-add --first=t --last=b tb1 --password <password Secret123>
CPT=0 while [ $CPT -ne 100 ] do # do this update 'krblastsuccessfulauth' that is not replicated # so after more than 100 update, it will create the consumer keep alive entry # echo Secret123 | kinit tb1 CPT=CPT+1 done
The fix was successfully tested during IPA tests with a large number of instances (31). All (30) the instances having the fix, successfully created the keep alive entry and replicated it. The instance without the fix, cleared the ADD operation of the keep alive during CL recreation and it was the only keep alive entry that was missing.
attachment 0001-Ticket-48445-keep-alive-entries-can-break-replicatio.patch
git push origin master Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 1.04 KiB, done. Total 7 (delta 5), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git d7b598d..71a891f master -> master
commit 71a891f Author: Thierry Bordaz tbordaz@redhat.com Date: Wed Feb 10 15:17:02 2016 +0100
On behalf of Thierry,
Pushed to 389-ds-base-1.3.4: bd254a2..02af085 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 02af085
Pushed to 389-ds-base-1.3.3: 1cfd332..3bbcf20 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 3bbcf20
Pushed to 389-ds-base-1.2.11: e122758..70e6c01 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 70e6c01
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1307151
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1307152
Just to
Metadata Update from @tbordaz: - Issue assigned to tbordaz - Issue set to the milestone: 1.2.11.33
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1767
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.