Issue #4756: IPA Replicate creation fails with error "Update failed! Status: [10 Total update abortedLDAP error: Referral]" - freeipa

freeipa

#4756 IPA Replicate creation fails with error "Update failed! Status: [10 Total update abortedLDAP error: Referral]"

Closed: Invalid None Opened 9 years ago by jcholast.

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 7): Bug 1166265

Please note that this Bug is private and may not be accessible as it contains confidential Red Hat customer information.

Description of problem:

IPA replica creation is failing in RHEL 7 with error "Update failed! Status:
[10 Total update abortedLDAP error: Referral]"

The replica issue is observed only if the replica server is a VM.

MASTER          REPLICA        Result
=======         =======        =======
Physical        physical       working

Physical        Virtual        Not working


1) In Master - nsds5replicaLastUpdateStatus for replica

-------------------------------------------------------------------------------
nsds5replicaLastInitStatus: 10 Total update abortedLDAP error: Referral
-------------------------------------------------------------------------------

2) In Replica - nsds5replicaLastUpdateStatus for master

-------------------------------------------------------------------------------
nsds5replicaLastUpdateStatus: 402 Replication error acquiring replica: unknown
  error - Replica has different database generation ID, remote replica may nee
 d to be initialized
-------------------------------------------------------------------------------

The issue is observed when the data already exist in IPA master and the number
of user/group/netgroup records are above 1000 (tested with 1500).

Replication work successfully when the number were less


Version-Release number of selected component (if applicable):

ipa-python-3.3.3-28.el7.x86_64
ipa-client-3.3.3-28.el7.x86_64
ipa-admintools-3.3.3-28.el7.x86_64
ipa-server-3.3.3-28.el7.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install IPA Master in hardware server with 1000+ records.
2. Initiate replica creation process on VM server.

Actual results:

The replica creation process fails with error

----------------------------------------------------------------------------
Update in progress, 128 seconds elapsed^[[A
Update in progress yet not in progress

[ipaserver.example.com] reports: Update failed! Status: [10 Total update
abortedLDAP error: Referral]
----------------------------------------------------------------------------

Expected results:

Replica creation should succeed.

Additional info:

mkosek commented 9 years ago

Thierry's assessment:

In IPA, we can have multiple approaches possibly mixing them.

Increase the nsds5replicaTimeout but we never know how much time we have to wait to state that there is a problem.
IPA could (in ipa-replica-prepare ?) tests if there are big entries and adapt the timeout.
ipa-replica-install could test the init status. If it fails for timeout reason, it could retry with a larger timeout. Testing a maximum limit (i.e. 1/2h) to report that initialisation of the replica is not possible. This kind of fix, requires the fix in DS.

A rapid and likely good enough fix, nsds5replicaTimeout could be set to 600 (10min) instead of 120.

tbordaz commented 9 years ago

The possible fixes described in https://fedorahosted.org/freeipa/ticket/4756#comment:1 are valid when there is timeout issue during a full initialisation

But going further in investigating https://bugzilla.redhat.com/show_bug.cgi?id=1166265, I think there is a bug in DS. The bug is not systematic (actually was only seen on VM) and slowing down the master (adding breakpoints) make the full init successful. During a full update, the replica agreement is testing (poll) the connection before sending the next entry. When hitting that bug, the poll does return on timeout. That triggers the initialisation abort.
So changing nsds5replicaTimemout is NOT a workaround or a fix for the bug.

mkosek commented 9 years ago

Thierry, I am also wondering - isn't this a duplicate of #4048? It was also happening in similar scenarios.

tbordaz commented 9 years ago

I think #4048 (and #3314) are not directly related to full initialisation failure.

They are related to failing replication due to large updates. So nsslapd-maxbersize need to be adapted. #4048 provides the ability to tune it in addition to others cache parameters.

The current ticket is more related to the dynamic of replica agreement that send/receive updates/results and the consumer ability to handle the load.
The supplier (RA) looks not flexible enough in the way it sends/receives the updates/results. It sends tons of entries until it hangs waiting for more room to send the remaining entries. This may prevent the RA.receiver to read the results.

tbordaz commented 9 years ago

A fix is identified (https://fedorahosted.org/389/ticket/47942) waiting for triage on that DS ticket.

mkosek commented 9 years ago

Right. This is a 389 fix (https://bugzilla.redhat.com/show_bug.cgi?id=1166265#c23), so we can now close the FreeIPA ticket.

Metadata Update from @jcholast:
- Issue assigned to someone
- Issue set to the milestone: FreeIPA 4.1.3

7 years ago

Metadata

Assignee

someone

Tags

None

Blocking

None

Depending on

None

Priority

normal

Milestone

FreeIPA 4.1.3

None

affects_doc

None

source

None

knownissue

None

type

defect

blockedby

None

test_case

None

component

Replication

blocking

None

on_review

keywords

None

test_coverage

None

reviewer

None

external_tracker

None

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=1166265

tester

None

changelog

None

design

None

freeipa

Source Code

#4756 IPA Replicate creation fails with error "Update failed! Status: [10 Total update abortedLDAP error: Referral]" Closed: Invalid None Opened 9 years ago by jcholast.

Metadata

#4756 IPA Replicate creation fails with error "Update failed! Status: [10 Total update abortedLDAP error: Referral]"

Closed: Invalid None Opened 9 years ago by jcholast.