Tested in the context of IPA.
I installed two masters. On one master I disabled the replication agreement, then did a full re-init. The re-initialization never completes.
To reproduce:
389-ds-base-1.3.0.2-1.fc18.x86_64
The context of this is backup and restore in a replicated environment. I'm looking for a way to pause replication when doing a restore, so that all non-restored masters can get re-initialized from the restored master.
Anything interesting in the errors log on the supplier or consumer?
Sorry, I've already removed the VMs.
My reason for wanting to do it this way is in the context of restoring a master to a known good state. I want to disable replication on all masters, restore one of them, then re-initialize all the other masters against the restored one.
I can enable replication again, then immediately do a re-initialize operation, but I have the feeling there is a small window where the replication plugin could start sending out updates before re-init happens.
My initialization is failing, not hanging, and it looks like this behavior is by design:
[27/Mar/2013:16:21:30 -0400] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=to master 2" (localhost:16483)" can not be updated while the replica is disabled
Could this failure be the reason "ipa-replica-manage" is hanging? How does it detect the initialization worked/failed?
I guess this is coming down to the meaning of "disable" a replication agreement. Should that still block a total update? Tough to say, but I think it's a valid request, and it would make disabling replication agreements more robust.
Well I need to do some more investigation on this, but let me know more about how IPA detects the initialization status. Did you really hang, or was the script waiting on a response that never came due to the error above?
Ok, so looks like we need a new option. A disabled agmt is a disable agmt, period. There's just too much going on, or not going on, to allow for just total updates.
So what you really need is a start/stop replication agmt feature. This means it would allow updates(incremental/total), but it wouldn't send any updates out. Is that something that would work for you?
Replying to [comment:6 mreynolds]:
Ok, so looks like we need a new option. A disabled agmt is a disable agmt, period. There's just too much going on, or not going on, to allow for just total updates. So what you really need is a start/stop replication agmt feature. This means it would allow updates(incremental/total), but it wouldn't send any updates out. Is that something that would work for you?
What about if we had a nsds5ReplDisable: incoming/outgoing/both?
For the hang, yes, we check the replication status and it never errored out apparently (I didn't look a the response code). IPA considered it as still updating.
I'm actually fine having to re-enable replication when I do a re-initialization, it just feels like there would be a window where the db could send out updates in the current two-step process I use (enable, then do a re-init).
Would doing everything in the same update make a difference? Or is there a way I should wipe the changelog/database, then enable replication, then re-init?
Replying to [comment:8 rcritten]:
For the hang, yes, we check the replication status and it never errored out apparently (I didn't look a the response code).
When you say check the replication status what are you referring to? nsds5replicaLastUpdateStatus?
IPA considered it as still updating. I'm actually fine having to re-enable replication when I do a re-initialization, it just feels like there would be a window where the db could send out updates in the current two-step process I use (enable, then do a re-init).
IPA considered it as still updating.
But even if updates do go out, that replica would get reinitialized anyway. So does it really matter in the end?
Would doing everything in the same update make a difference?
It should reduce the possibility of updates going out.
Or is there a way I should wipe the changelog/database, then enable replication, then re-init?
If you remove the changelog, then that master replica would need to be reinitialized as well. Not sure if I completely understand this sequence.
The IPA code checks for nsds5BeginReplicaRefresh. If it has no value yet we print "Update in progress". I saw this status for over 5 minutes when re-init generally takes 4 seconds. I didn't look at nsds5replicaUpdateInProgress or nsds5ReplicaLastInitStatus.
What I'm afraid of is where I have masters A and B. Lets say there have been a lot of recent changes, then I restore A. I don't want B to "catch up" A with the changes.
To do this I disable all replication agreements. I then want to re-initialize B from A but in order to do that I need to re-enable the agreement, then re-init. During that short period I don't want any changes on B to flow to A.
Ok, as for the IPA script and the disabled agmt, I do see nsds5BeginReplicaRefresh is set to "start" even though it failed:
nsds5ReplicaEnabled: off nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: agreement disabled nsds5BeginReplicaRefresh: start nsds5replicaLastInitStatus: 12 Total update aborted: Replication agreement for agmt="cn=MARK" (localhost:22222) can not be updated while the replica is disabled. (If the suffix is disabled you must enable it then restart the server for replication to take place).
Are you saying nsds5BeginReplicaRefresh is not even present?
As for master B sending updates to master A before it can be initialized, you can always disable the agmt on B that points to A. Keep the repl agmt on A that points to B enabled, and initialize. Then on B re-enable the agmt to A.
Would this work for you?
I've done some code changes that change the behavior. Now when you try and initialize a disable agmt you get an error(before it returned success):
ldapmodify -D cn=dm -w password dn: cn=MARK,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config changetype: modify replace: nsds5beginreplicaRefresh nsds5beginreplicaRefresh: start
modifying entry "cn=MARK,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config" ldap_modify: Server is unwilling to perform (53) additional info: Replication agreement is disabled
[root@localhost ldif]# ldapsearch -xLLL -D cn=dm -w password -b "cn=MARK,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config" objectclass=top nsds5beginreplicaRefresh dn: cn=MARK,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config nsds5beginreplicaRefresh: Total update aborted
I could also simply remove the attribute "nsds5beginreplicaRefresh".
The real question is, what can we do to make the IPA script detect the error?
attachment 0001-Ticket-47304-reinitialization-of-a-master-with-a-dis.patch
Here is the new behavior....
An error 53 is returned to the client(along with error text), and the nsds5BeginReplicaRefresh attribute is removed from the replication agreement. Previously, the attribute was never removed, and it was always set to "start".
Sending patch out for review...
git merge ticket47304 Updating 9d5dedd..db6bcd7 Fast-forward ldap/servers/plugins/replication/repl5_agmt.c | 28 +++++++++++++------- ldap/servers/plugins/replication/repl5_agmtlist.c | 23 ++++++++++------ 2 files changed, 32 insertions(+), 19 deletions(-)
git push origin master Counting objects: 15, done. Delta compression using up to 4 threads. Compressing objects: 100% (8/8), done. Writing objects: 100% (8/8), 1.37 KiB, done. Total 8 (delta 6), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 9d5dedd..db6bcd7 master -> master
commit db6bcd7
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=949361
Metadata Update from @mreynolds: - Issue assigned to mreynolds - Issue set to the milestone: 1.3.1
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/641
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.