#4 bak2db gets stuck in infinite loop
Closed: wontfix None Opened 12 years ago by rmeggins.

Reported on http://lists.fedoraproject.org/pipermail/389-users/2012-January/013944.html

In my environment I have a total of 4 directory servers, 2 multi-masters in production (ServerA, ServerB) and 2 multi-masters to test with (ServerC, ServerD). Basically here’s what I did:

Took a backup of one of the production directory servers, ServerA

Copied ServerA’s backup to ServerC (test).

Deleted the replication agreement on ServerC to ServerD (but not the agreement from ServerD to ServerC)

Ran /usr/lib64/dirsrv/slapd-ServerC/bak2db 2011_12_29_15_27_35

The restore started, and never stopped running. I eventually killed it and tried again, this time capturing the output:

/usr/lib64/dirsrv/slapd-ServerC/bak2db 2011_12_29_15_27_35

[03/Jan/2012:15:06:43 -0700] 389-Directory/1.2.9.9 - debug level: backend (524288)

[03/Jan/2012:15:06:43 -0700] - Deleting log file: (/var/lib/dirsrv/slapd-ServerC/db/log.0000000021)

[03/Jan/2012:15:06:43 -0700] - Restoring file 1 (/var/lib/dirsrv/slapd-ServerC/db/DBVERSION)

[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/DBVERSION to /var/lib/dirsrv/slapd-ServerC/db/DBVERSION

[03/Jan/2012:15:06:43 -0700] - Restoring file 2 (/var/lib/dirsrv/slapd-ServerC/db/log.0000000021)

[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/log.0000000021 to /var/lib/dirsrv/slapd-ServerC/db/log.0000000021

[ lines removed to reduce size ]

[03/Jan/2012:15:06:43 -0700] - Restoring file 33 (/var/lib/dirsrv/slapd-ServerC/db/userRoot/uid.db4)

[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/userRoot/uid.db4 to /var/lib/dirsrv/slapd-ServerC/db/userRoot/uid.db4

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=entryrdn,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=entryrdn,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nscpEntryDN,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nscpEntryDN,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsds5ReplConflict,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsds5ReplConflict,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsuniqueid,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsuniqueid,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=numsubordinates,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=numsubordinates,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=objectclass,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=parentid,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=parentid,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

That is, after the message about deleting cn=parentid, it starts over again with cn=aci, skipping the other default indexes cn=seealso and cn=sn and cn=telephoneNumber and cn=uid and cn=uniquemember

389-ds-base-1.2.9.9-1.el5
RedHat EL 5.5


In your backup directory /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35 there is a file called dse_index.ldif - can you please attach that file to this ticket?

{{{
Steps to reproduce:
1) install 389-ds from epel on an el5 system (389-ds-base-1.2.9.9 and other packages)
2) setup-ds-admin.pl
3) service dirsrv stop
4) db2bak /path/to/mybakdir
5) edit /path/to/mybakdir/dse_index.ldif - remove all entries associated with NetscapeRoot
6) bak2db /path/to/mybakdir

You will reproduce the same issue.

In your case - is ServerC a configuration directory server (has the NetscapeRoot database)? And is ServerA not a configuration directory server? That would reproduce the situation I was able to "fake" - attempting to restore from a backup which has different databases. If you are unsure, do this:

on ServerA:
grep \^nsslapd-backend /etc/dirsrv/slapd-INSTANCE/dse.ldif

on ServerC:
grep \^nsslapd-backend /etc/dirsrv/slapd-INSTANCE/dse.ldif

That being said, there is still a looping problem which needs to be fixed.
}}}

I will attach the file in a couple minutes, but both ServerA and ServerC have the netscapeRoot database:

{{{
root@ServerA# grep ^nsslapd-backend /etc/dirsrv/slapd-ServerA/dse.ldif
nsslapd-backend: userRoot
nsslapd-backend: NetscapeRoot
}}}

{{{
root@ServerC# grep ^nsslapd-backend /etc/dirsrv/slapd-ServerC/dse.ldif
nsslapd-backend: userRoot
nsslapd-backend: NetscapeRoot
}}}

dse_index.ldif file from backup of ServerA
dse_index.ldif

In the reproducer steps,
how about adding "-n BACKEND_INSTNACE" to the step (6)?
6) bak2db /path/to/mybakdir

For instance, if the backend instance name is userRoot, it'd be
bak2db /path/to/mybakdir -n userRoot

Please see also:
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Shell_Scripts.html#Shell_Scripts-bak2db_Restore_database_from_backup

The problem is that slapi_entries_diff, used to add/modify/delete indexes in the config, is b0rken. I have a patch, but I think there is a larger question - should bak2db attempt to "fix" the dse.ldif at all? That is, if I have added/deleted/modified indexes, bak2db will completely wipe out my changes.

should bak2db attempt to "fix" the dse.ldif at all? That is, if I have added/deleted/modified indexes, bak2db will completely wipe out my changes.

The original intention was to solve the mismatch (if any) between the dse.ldif for the backup and the current dse.ldif. I guess the backup does not contain the corresponding index files...

So modifying indexes caused this?
I added/deleted some indexes on both ServerC and ServerD, but just tried restoring ServerA backup onto ServerD and it completed just fine.
The only difference from this try and the one that failed above is that I completely disabled replication instead of only removing the replication agreements.

I should say I'm not 100% sure I deleted the indexes on both servers, but I followed the same steps on ServerD that I did on ServerC.

Replying to [comment:9 rike255]:

So modifying indexes caused this?

Most likely, yes.

I added/deleted some indexes on both ServerC and ServerD, but just tried restoring ServerA backup onto ServerD and it completed just fine.
The only difference from this try and the one that failed above is that I completely disabled replication instead of only removing the replication agreements.

I don't see why replication agreements would have anything to do with this problem.

I should say I'm not 100% sure I deleted the indexes on both servers, but I followed the same steps on ServerD that I did on ServerC.

The problem definitely has to do with either too many or too few indexes in the backup, compared to the target to which you are restoring.

The larger question is - should restore "fix" your current index configuration to match what's in the backup? Or should it instead just tell you "hey, since you made this backup, you changed indexes A, B, C - you added index D - you deleted index E" and let you apply those changes?

Replying to [comment:10 rmeggins]:

Replying to [comment:9 rike255]:

So modifying indexes caused this?

Most likely, yes.

I added/deleted some indexes on both ServerC and ServerD, but just tried restoring ServerA backup onto ServerD and it completed just fine.
The only difference from this try and the one that failed above is that I completely disabled replication instead of only removing the replication agreements.

I don't see why replication agreements would have anything to do with this problem.

Sorry I forgot to take that part out after reading your fix.

I should say I'm not 100% sure I deleted the indexes on both servers, but I followed the same steps on ServerD that I did on ServerC.

The problem definitely has to do with either too many or too few indexes in the backup, compared to the target to which you are restoring.
Ok ya ServerD and ServerA definitely have the same indexes. I can't check ServerC unfortunately but sounds like you guys have all you need from this issue, I'll make sure to keep my indexes in sync across all environments to avoid the issue. Thanks for the help!

The larger question is - should restore "fix" your current index configuration to match what's in the backup? Or should it instead just tell you "hey, since you made this backup, you changed indexes A, B, C - you added index D - you deleted index E" and let you apply those changes?

To ssh://git.fedorahosted.org/git/389/ds.git
1bbbb3e..0836138 master -> master
commit changeset:0836138/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Jan 4 15:40:57 2012 -0700

bak2db gets stuck in infinite loop

https://fedorahosted.org/389/ticket/4
Reviewed by: nkinder (Thanks!)
Branch: master
Fix Description: The logic was just plain faulty, so I rewrote the algorithm
and made it much simpler.
for each curr_ent: /* entry in current dse.ldif */
  for each old_ent: /* entry in dse_index.ldif in backup */
    if curr_ent.dn == old_ent.dn:
      modify curr_ent to look like old_ent
      curr_ent.inboth = old_ent.inboth = True

for each old_ent:
  if old_ent.inboth:
    clear old_ent.inboth
  else: /* old_ent not in current config */
    add old_ent to current dse.ldif
for each curr_ent:
  if curr_ent.inboth:
    clear curr_ent.inboth
  else: /* curr_ent not in old config */
    delete curr_ent from current dse.ldif

Platforms tested: RHEL6 x86_64, RHEL5 i386
Flag Day: no
Doc impact: no

Added initial screened field value.

Metadata Update from @rmeggins:
- Issue assigned to rmeggins
- Issue set to the milestone: 1.2.10.a7

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/4

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata