Reported on http://lists.fedoraproject.org/pipermail/389-users/2012-January/013944.html
In my environment I have a total of 4 directory servers, 2 multi-masters in production (ServerA, ServerB) and 2 multi-masters to test with (ServerC, ServerD). Basically here’s what I did:
Took a backup of one of the production directory servers, ServerA
Copied ServerA’s backup to ServerC (test).
Deleted the replication agreement on ServerC to ServerD (but not the agreement from ServerD to ServerC)
Ran /usr/lib64/dirsrv/slapd-ServerC/bak2db 2011_12_29_15_27_35
The restore started, and never stopped running. I eventually killed it and tried again, this time capturing the output:
[03/Jan/2012:15:06:43 -0700] 389-Directory/1.2.9.9 - debug level: backend (524288)
[03/Jan/2012:15:06:43 -0700] - Deleting log file: (/var/lib/dirsrv/slapd-ServerC/db/log.0000000021)
[03/Jan/2012:15:06:43 -0700] - Restoring file 1 (/var/lib/dirsrv/slapd-ServerC/db/DBVERSION)
[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/DBVERSION to /var/lib/dirsrv/slapd-ServerC/db/DBVERSION
[03/Jan/2012:15:06:43 -0700] - Restoring file 2 (/var/lib/dirsrv/slapd-ServerC/db/log.0000000021)
[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/log.0000000021 to /var/lib/dirsrv/slapd-ServerC/db/log.0000000021
[ lines removed to reduce size ]
[03/Jan/2012:15:06:43 -0700] - Restoring file 33 (/var/lib/dirsrv/slapd-ServerC/db/userRoot/uid.db4)
[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/userRoot/uid.db4 to /var/lib/dirsrv/slapd-ServerC/db/userRoot/uid.db4
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=entryrdn,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=entryrdn,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nscpEntryDN,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nscpEntryDN,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsds5ReplConflict,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsds5ReplConflict,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsuniqueid,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsuniqueid,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=numsubordinates,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=numsubordinates,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=objectclass,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=parentid,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=parentid,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
That is, after the message about deleting cn=parentid, it starts over again with cn=aci, skipping the other default indexes cn=seealso and cn=sn and cn=telephoneNumber and cn=uid and cn=uniquemember
389-ds-base-1.2.9.9-1.el5 RedHat EL 5.5
In your backup directory /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35 there is a file called dse_index.ldif - can you please attach that file to this ticket?
{{{ Steps to reproduce: 1) install 389-ds from epel on an el5 system (389-ds-base-1.2.9.9 and other packages) 2) setup-ds-admin.pl 3) service dirsrv stop 4) db2bak /path/to/mybakdir 5) edit /path/to/mybakdir/dse_index.ldif - remove all entries associated with NetscapeRoot 6) bak2db /path/to/mybakdir
You will reproduce the same issue.
In your case - is ServerC a configuration directory server (has the NetscapeRoot database)? And is ServerA not a configuration directory server? That would reproduce the situation I was able to "fake" - attempting to restore from a backup which has different databases. If you are unsure, do this:
on ServerA: grep \^nsslapd-backend /etc/dirsrv/slapd-INSTANCE/dse.ldif
on ServerC: grep \^nsslapd-backend /etc/dirsrv/slapd-INSTANCE/dse.ldif
That being said, there is still a looping problem which needs to be fixed. }}}
I will attach the file in a couple minutes, but both ServerA and ServerC have the netscapeRoot database:
{{{ root@ServerA# grep ^nsslapd-backend /etc/dirsrv/slapd-ServerA/dse.ldif nsslapd-backend: userRoot nsslapd-backend: NetscapeRoot }}}
{{{ root@ServerC# grep ^nsslapd-backend /etc/dirsrv/slapd-ServerC/dse.ldif nsslapd-backend: userRoot nsslapd-backend: NetscapeRoot }}}
dse_index.ldif file from backup of ServerA dse_index.ldif
In the reproducer steps, how about adding "-n BACKEND_INSTNACE" to the step (6)? 6) bak2db /path/to/mybakdir
For instance, if the backend instance name is userRoot, it'd be bak2db /path/to/mybakdir -n userRoot
Please see also: http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Shell_Scripts.html#Shell_Scripts-bak2db_Restore_database_from_backup
The problem is that slapi_entries_diff, used to add/modify/delete indexes in the config, is b0rken. I have a patch, but I think there is a larger question - should bak2db attempt to "fix" the dse.ldif at all? That is, if I have added/deleted/modified indexes, bak2db will completely wipe out my changes.
should bak2db attempt to "fix" the dse.ldif at all? That is, if I have added/deleted/modified indexes, bak2db will completely wipe out my changes.
The original intention was to solve the mismatch (if any) between the dse.ldif for the backup and the current dse.ldif. I guess the backup does not contain the corresponding index files...
0001-bak2db-gets-stuck-in-infinite-loop.patch 0001-bak2db-gets-stuck-in-infinite-loop.patch
So modifying indexes caused this? I added/deleted some indexes on both ServerC and ServerD, but just tried restoring ServerA backup onto ServerD and it completed just fine. The only difference from this try and the one that failed above is that I completely disabled replication instead of only removing the replication agreements.
I should say I'm not 100% sure I deleted the indexes on both servers, but I followed the same steps on ServerD that I did on ServerC.
Replying to [comment:9 rike255]:
So modifying indexes caused this?
Most likely, yes.
I added/deleted some indexes on both ServerC and ServerD, but just tried restoring ServerA backup onto ServerD and it completed just fine. The only difference from this try and the one that failed above is that I completely disabled replication instead of only removing the replication agreements.
I don't see why replication agreements would have anything to do with this problem.
The problem definitely has to do with either too many or too few indexes in the backup, compared to the target to which you are restoring.
The larger question is - should restore "fix" your current index configuration to match what's in the backup? Or should it instead just tell you "hey, since you made this backup, you changed indexes A, B, C - you added index D - you deleted index E" and let you apply those changes?
Replying to [comment:10 rmeggins]:
Replying to [comment:9 rike255]: So modifying indexes caused this? Most likely, yes. I added/deleted some indexes on both ServerC and ServerD, but just tried restoring ServerA backup onto ServerD and it completed just fine. The only difference from this try and the one that failed above is that I completely disabled replication instead of only removing the replication agreements. I don't see why replication agreements would have anything to do with this problem. Sorry I forgot to take that part out after reading your fix. I should say I'm not 100% sure I deleted the indexes on both servers, but I followed the same steps on ServerD that I did on ServerC. The problem definitely has to do with either too many or too few indexes in the backup, compared to the target to which you are restoring. Ok ya ServerD and ServerA definitely have the same indexes. I can't check ServerC unfortunately but sounds like you guys have all you need from this issue, I'll make sure to keep my indexes in sync across all environments to avoid the issue. Thanks for the help! The larger question is - should restore "fix" your current index configuration to match what's in the backup? Or should it instead just tell you "hey, since you made this backup, you changed indexes A, B, C - you added index D - you deleted index E" and let you apply those changes?
Sorry I forgot to take that part out after reading your fix.
The problem definitely has to do with either too many or too few indexes in the backup, compared to the target to which you are restoring. Ok ya ServerD and ServerA definitely have the same indexes. I can't check ServerC unfortunately but sounds like you guys have all you need from this issue, I'll make sure to keep my indexes in sync across all environments to avoid the issue. Thanks for the help!
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=772779
To ssh://git.fedorahosted.org/git/389/ds.git 1bbbb3e..0836138 master -> master commit changeset:0836138/389-ds-base Author: Rich Megginson rmeggins@redhat.com Date: Wed Jan 4 15:40:57 2012 -0700
bak2db gets stuck in infinite loop https://fedorahosted.org/389/ticket/4 Reviewed by: nkinder (Thanks!) Branch: master Fix Description: The logic was just plain faulty, so I rewrote the algorithm and made it much simpler. for each curr_ent: /* entry in current dse.ldif */ for each old_ent: /* entry in dse_index.ldif in backup */ if curr_ent.dn == old_ent.dn: modify curr_ent to look like old_ent curr_ent.inboth = old_ent.inboth = True for each old_ent: if old_ent.inboth: clear old_ent.inboth else: /* old_ent not in current config */ add old_ent to current dse.ldif for each curr_ent: if curr_ent.inboth: clear curr_ent.inboth else: /* curr_ent not in old config */ delete curr_ent from current dse.ldif Platforms tested: RHEL6 x86_64, RHEL5 i386 Flag Day: no Doc impact: no
Added initial screened field value.
Metadata Update from @rmeggins: - Issue assigned to rmeggins - Issue set to the milestone: 1.2.10.a7
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/4
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.