#1486 Document workaround for 1454 in 'pkispawn' man page
Closed: Fixed None Opened 8 years ago by mharmsen.

Based upon the findings in PKI TRAC Ticket #1454 - pkispawn clone CA using existing base DN and pki_ds_remove_data=True in inf is failing, the following work-around needs to be documented in the 'pkispawn' man page:

OK: After some experimentation this is what I found.

    This problem happens if you create the clone, destroy it, and then immediately try to re-create the exact same clone with the same deployment.cfg file. 

This fails during the ldif importation process, specifically the vlv.index file. This happens shortly after replication. When this importation fails, the ldap server can no longer be contacted by the CA clone being installed. We get a bunch of these:

Still checking wait_dn 'cn=index1160589769, cn=index, cn=tasks, cn=config' (netscape.ldap.LDAPException: failed to connect to server ldap://sparks.idmqe.lab.eng.bos.redhat.com:1901 (91))

    I found a condition in the DS logs that might be important: 

07/Jul/2015:20:35:16 -0400] - ldbm: Bringing pki-ca-ldap offline... [07/Jul/2015:20:35:16 -0400] - ldbm: removing 'pki-ca-ldap'. [07/Jul/2015:20:35:16 -0400] - Destructor for instance pki-ca-ldap called [07/Jul/2015:20:35:19 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=pki-ca is going offline; disabling replication [07/Jul/2015:20:35:20 -0400] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-sparks.idmqe.lab.eng.bos.redhat.com-clone1" (sparks:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. [07/Jul/2015:20:35:20 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database

It looks like some condition has been found and the server is going down, but in our case it never really comes back.

    I have found a workaround for this that seems to work every time. 

After doing the pkidestroy on the first clone, simply restart the DS server. Try the clone again and it works flawlessly.

My theory is that after a clone is destroyed, something is out of sync with the previous replication agreement that shows up when the exact same agreement is attempted again. If we restart the DS server, things get cleared up and then the subsequent cloning operation is fine.

Further digging would be needed to figure out exactly what is going on here.

checkin:

83b954648254a100c7c3b390089626ce68351c5a

Metadata Update from @mharmsen:
- Issue assigned to jmagne
- Issue set to the milestone: 10.2 Backlog

7 years ago

Dogtag PKI is moving from Pagure issues to GitHub issues. This means that existing or new
issues will be reported and tracked through Dogtag PKI's GitHub Issue tracker.

This issue has been cloned to GitHub and is available here:
https://github.com/dogtagpki/pki/issues/2045

If you want to receive further updates on the issue, please navigate to the
GitHub issue and click on Subscribe button.

Thank you for understanding, and we apologize for any inconvenience.

Login to comment on this ticket.

Metadata