#442 Ipa master system initiated more than a dozen simultaneous replication sessions, shut itself down and wiped out its db
Closed: wontfix None Opened 11 years ago by rmeggins.

https://bugzilla.redhat.com/show_bug.cgi?id=852202 (Red Hat Enterprise Linux 6)

Description of problem:
Ipa ldap shutdown itself while running, admin unable to restart.  Dev indicates
the system initiated more than a dozen simultaneous replication sessions where
only 1 should be running.  Db was wiped pretty clean.  Currently unable to
recover the server with ipa-replica-manage re-initialize.

Test Env:
2 Ipa Masters
2 Ipa Clients

What was Active:
-Sudo Client running 10 runtime allowes sudo command threads, each thread with
5sec delay.
-One admin client creating 10 sudo rules with 5 min delay after echa rule
-2 UI were currently up, mo load applied via the Ipa UI

Version-Release number of selected component (if applicable):
ipa-server-2.2.0-16.el6.x86_64
389-ds-base-1.2.10.2-19.el6_3.x86_64

How reproducible:
intermittent, have not yet reproduced...

Symptoms:

* Sudo client load began failing, Can't contact LDAP server
* UI not connecting, kinit failing
* Admin load not connecting, kinit failing

*Could not restart via ipactl:
[root@sti-high-1 slapd-TESTRELM-COM]# ipactl start
Starting Directory Service
Starting dirsrv:
    PKI-IPA...[  OK  ]
    TESTRELM-COM...[  OK  ]
Failed to read data from Directory Service: Failed to get list of services to
probe status!
Configured hostname 'sti-high-1.testrelm.com' does not match any master server
in LDAP:
No master found because of error: {'matched': 'dc=testrelm,dc=com', 'desc': 'No
such object'}
Shutting down
Shutting down dirsrv:
    PKI-IPA...[  OK  ]
    TESTRELM-COM...[  OK  ]



***/var/log/dirsrv/slapd-TESTRELM-COM/errors ,last few messages as it went
down...

[27/Aug/2012:12:32:21 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Warning: unable to send
endReplication extended operation (Timed out)
[27/Aug/2012:12:32:26 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth resumed
[27/Aug/2012:12:35:45 -0400] NSMMReplicationPlugin -
multimaster_be_state_change: replica dc=testrelm,dc=com is going offline;
disabling replication
[27/Aug/2012:12:36:11 -0400] NSMMReplicationPlugin -
replica_replace_ruv_tombstone: failed to update replication update vector for
replica dc=testrelm,dc=com: LDAP error - 1
[27/Aug/2012:12:36:12 -0400] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the
database
[27/Aug/2012:12:36:32 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:36:52 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:37:12 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:37:32 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:37:52 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:38:12 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:38:32 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:38:46 -0400] - ERROR bulk import abandoned
[27/Aug/2012:12:38:46 -0400] - import userRoot: Aborting all Import threads...
[27/Aug/2012:12:38:52 -0400] - import userRoot: Import threads aborted.
[27/Aug/2012:12:38:52 -0400] - import userRoot: Closing files...
[27/Aug/2012:12:38:52 -0400] - import userRoot: Import failed.


*Ldap db was wiped out on one of the Ipa Master servers
[root@sti-high-1 slapd-TESTRELM-COM]# ldapsearch -x -D "cn=Directory Manager"
-w Secret123 -b "dc=testrelm,dc=com"
# extended LDIF
#
# LDAPv3
# base <dc=testrelm,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# compat, testrelm.com
dn: cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: compat

# groups, compat, testrelm.com
dn: cn=groups,cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: groups

# ng, compat, testrelm.com
dn: cn=ng,cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: ng

# users, compat, testrelm.com
dn: cn=users,cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: users

# sudoers, testrelm.com
dn: ou=sudoers,dc=testrelm,dc=com
objectClass: extensibleObject
ou: sudoers

# search result
search: 2
result: 32 No such object

# numResponses: 6
# numEntries: 5


*The other ipa master server info:
[root@sti-high-2 ~]# ldapsearch -x -D "cn=Directory Manager" -w Secret123 -b
"cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com"
# extended LDIF
#
# LDAPv3
# base <cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# masters, ipa, etc, testrelm.com
dn: cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com
cn: masters
objectClass: nsContainer
objectClass: top

# sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com
cn: sti-high-1.testrelm.com
objectClass: top
objectClass: nsContainer

# CA, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=CA,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=c
 om
cn: CA
ipaConfigString: enabledService
ipaConfigString: startOrder 50
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# KDC, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KDC,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
cn: KDC
ipaConfigString: enabledService
ipaConfigString: startOrder 10
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# KPASSWD, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KPASSWD,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm
 ,dc=com
cn: KPASSWD
ipaConfigString: enabledService
ipaConfigString: startOrder 20
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# MEMCACHE, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=MEMCACHE,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrel
 m,dc=com
cn: MEMCACHE
ipaConfigString: enabledService
ipaConfigString: startOrder 39
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# HTTP, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=HTTP,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc
 =com
cn: HTTP
ipaConfigString: enabledService
ipaConfigString: startOrder 40
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# DNS, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=DNS,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
cn: DNS
ipaConfigString: enabledService
ipaConfigString: startOrder 30
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com
objectClass: top
objectClass: nsContainer
cn: sti-high-2.testrelm.com

# KDC, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KDC,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 10
cn: KDC

# KPASSWD, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KPASSWD,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm
 ,dc=com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 20
cn: KPASSWD

# MEMCACHE, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=MEMCACHE,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrel
 m,dc=com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 39
cn: MEMCACHE

# HTTP, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=HTTP,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc
 =com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 40
cn: HTTP

# DNS, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=DNS,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 30
cn: DNS

# search result
search: 2
result: 0 Success

# numResponses: 15
# numEntries: 14


Additional Dev Comments:
Rich Megginson's chat comments indicate the issue looks like a combination of;
https://fedorahosted.org/389/ticket/374
https://fedorahosted.org/freeipa/ticket/2842

This is a duplicate of ticket #374.

Metadata Update from @nkinder:
- Issue assigned to rmeggins
- Issue set to the milestone: N/A

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/442

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Duplicate)

3 years ago

Login to comment on this ticket.

Metadata