Original summary: 2Master replication with SASL/GSSAPI auth broken
Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 6): Bug 1006846
Description of problem: When 2Master replication is set up with SASL/GSSAPI authentication, I can reproduce these 3 problems: 1) subtree search on replicated suffix fails to return child entries 2) after DS restart entries under replicated suffix are missing, though db2ldif shows they are still in database 3) it is possible to add the same entries under replicated suffix, creating conflict for replication and duplicating content of database Version-Release number of selected component (if applicable): 389-ds-base-1.2.11.15-23.el6.x86_64 How reproducible: always Steps to Reproduce: 1. Set up 2 instances of directory server (I used dstet.example.com and dstet2.example.com) and set them up for Kerberos auth and replication (SASL mappings, create replication manager entries ..) 2. On both servers: create subsuffix ou=foo,dc=example,dc=com with new backend foo. 3. Enable replica foo: set both servers to multiple master, set Replica IDs and replication managers. 4. Create new replication agreements for replica foo: use LDAP (no encryption), SASL/GSSAPI for authentication 5. initialize consumers on both servers 6. On server1 add top level entry ou=foo,dc=example,dc=com. Search on server2 to verify replication: [jrusnack@dstet ~]$ ldapsearch -h dstet2.example.com -p 22222 -D "cn=directory manager" -w Secret123 -LLL -b "ou=foo,dc=example,dc=com" dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit Entry successfully replicated from server1 to server2. 7. On server2 add user uid=testuser1,ou=foo,dc=example,dc=com. Search on server1 to verify replication: [jrusnack@dstet 389-scripts]$ ldapsearch -h dstet.example.com -p 22221 -D "cn=directory manager" -w Secret123 -LLL -b "uid=testuser,ou=foo,dc=example,dc=com" dn: uid=testuser,ou=foo,dc=example,dc=com uid: testuser givenName: testuser objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson sn: testuser cn: testuser Entry successfully replicated from server2 to server1. 8. Issue subtree search on ou=foo,dc=example,dc=com: [jrusnack@dstet 389-scripts]$ ldapsearch -h dstet2.example.com -p 22222 -D "cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" -s sub # extended LDIF # # LDAPv3 # base <ou=foo,dc=example,dc=com> with scope subtree # filter: (objectclass=*) # requesting: ALL # # foo, example.com dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit # search result search: 2 result: 0 Success # numResponses: 2 # numEntries: 1 [jrusnack@dstet 389-scripts]$ ldapsearch -h dstet.example.com -p 22221 -D "cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" -s sub # extended LDIF # # LDAPv3 # base <ou=foo,dc=example,dc=com> with scope subtree # filter: (objectclass=*) # requesting: ALL # # foo, example.com dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit # search result search: 2 result: 0 Success # numResponses: 2 # numEntries: 1 Search does not show entry uid=testuser1,ou=foo,dc=example,dc=com, and returns only ou=foo,dc=example,dc=com. With verbose logging this search generates this output in logs (error and access log): [11/Sep/2013:13:12:21 +0200] - => get_filter_internal [11/Sep/2013:13:12:21 +0200] - PRESENT [11/Sep/2013:13:12:21 +0200] - <= get_filter_internal 0 [11/Sep/2013:13:12:22 +0200] get_filter - before optimize: (objectClass=*) [11/Sep/2013:13:12:22 +0200] get_filter - after optimize: (objectClass=*) [11/Sep/2013:13:12:22 +0200] index_subsys_assign_filter_decoders - before: (objectClass=*) [11/Sep/2013:13:12:22 +0200] index_subsys_assign_filter_decoders - after: (objectClass=*) [11/Sep/2013:13:12:22 +0200] - slapi_str2filter "objectclass=referral" [11/Sep/2013:13:12:22 +0200] - slapi_str2filter: default [11/Sep/2013:13:12:22 +0200] - str2simple "objectclass=referral" [11/Sep/2013:13:12:22 +0200] - OR [11/Sep/2013:13:12:22 +0200] - PRESENT [11/Sep/2013:13:12:22 +0200] - EQUALITY [11/Sep/2013:13:12:22 +0200] - => slapi_attr_assertion2keys_ava_sv [11/Sep/2013:13:12:22 +0200] - <= slapi_attr_assertion2keys_ava_sv 0 [11/Sep/2013:13:12:22 +0200] - slapi_filter_free type 0xA1 [11/Sep/2013:13:12:22 +0200] - slapi_filter_free type 0xA3 [11/Sep/2013:13:12:22 +0200] - slapi_filter_dup type 0x87 [11/Sep/2013:13:12:22 +0200] - => slapi_vattr_filter_test_ext [11/Sep/2013:13:12:22 +0200] - => test_substring_filter [11/Sep/2013:13:12:22 +0200] - PRESENT [11/Sep/2013:13:12:22 +0200] - <= slapi_vattr_filter_test 0 [11/Sep/2013:13:12:22 +0200] - slapi_filter_free type 0x87 [11/Sep/2013:13:12:22 +0200] - slapi_filter_free type 0x87 [11/Sep/2013:13:16:08 +0200] conn=21 fd=69 slot=69 connection from ::1 to ::1 [11/Sep/2013:13:16:08 +0200] conn=21 op=0 BIND dn="cn=directory manager" method=128 version=3 [11/Sep/2013:13:16:08 +0200] conn=21 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager" [11/Sep/2013:13:16:08 +0200] conn=21 op=1 SRCH base="ou=foo,dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL [11/Sep/2013:13:16:08 +0200] conn=21 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U [11/Sep/2013:13:16:08 +0200] conn=21 op=2 UNBIND 9. Export db to ldif: [jrusnack@dstet 389-scripts]$ /usr/lib64/dirsrv/slapd-dstet/db2ldif -n foo Exported ldif file: /var/lib/dirsrv/slapd-dstet/ldif/dstet-foo-2013_09_11_131412.ldif ldiffile: /var/lib/dirsrv/slapd-dstet/ldif/dstet-foo-2013_09_11_131412.ldif [11/Sep/2013:13:14:12 +0200] - export foo: Processed 2 entries (100%). [11/Sep/2013:13:14:13 +0200] - All database threads now stopped [jrusnack@dstet 389-scripts]$ cat /var/lib/dirsrv/slapd-dstet/ldif/dstet-foo-2013_09_11_131412.ldif version: 1 # entry-id: 2 dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20130911110308Z modifyTimestamp: 20130911110308Z nsUniqueId: b5b0d981-1ad111e3-9b54eb82-8658ba38 # entry-id: 3 dn: uid=testuser,ou=foo,dc=example,dc=com uid: testuser givenName: testuser objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson sn: testuser cn: testuser creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20130911110450Z modifyTimestamp: 20130911110450Z nsUniqueId: db3de301-1ad111e3-9ef18a1c-6d145b1d 10. Restart server1 and search for ou=foo.. : [jrusnack@dstet 389-scripts]$ sudo service dirsrv stop Shutting down dirsrv: dstet... [ OK ] [jrusnack@dstet 389-scripts]$ sudo service dirsrv start Starting dirsrv: dstet... [ OK ] [jrusnack@dstet 389-scripts]$ ldapsearch -h dstet.example.com -p 22221 -D "cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" -s sub # extended LDIF # # LDAPv3 # base <ou=foo,dc=example,dc=com> with scope subtree # filter: (objectclass=*) # requesting: ALL # # search result search: 2 result: 32 No such object # numResponses: 1 [jrusnack@dstet 389-scripts]$ ldapsearch -h dstet2.example.com -p 22222 -D "cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" -s sub -LLL dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit Entry is "missing" on server1 but is present on server2. 11. Add entry ou=foo,dc=example,dc=com on server1 and search: [jrusnack@dstet 389-scripts]$ ldapsearch -h dstet.example.com -p 22221 -D "cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" -s sub -LLL dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit [jrusnack@dstet 389-scripts]$ tail /var/log/dirsrv/slapd-dstet/errors ... [11/Sep/2013:13:14:53 +0200] - slapd stopped. [11/Sep/2013:13:14:58 +0200] - 389-Directory/1.2.11.15 B2013.240.1920 starting up [11/Sep/2013:13:14:58 +0200] - I'm resizing my cache now...cache was 20000000 and is now 8000000 [11/Sep/2013:13:15:00 +0200] - slapd started. Listening on All Interfaces port 22221 for LDAP requests [11/Sep/2013:13:18:24 +0200] NSMMReplicationPlugin - agmt="cn=foo" (dstet2:22222): Consumer failed to replay change (uniqueid cfebb701-1ad311e3-9b54eb82-8658ba38, CSN 5230517f000000010000): Operations error (1). Will retry later. 12. Export db to ldif: [jrusnack@dstet 389-scripts]$ /usr/lib64/dirsrv/slapd-dstet/db2ldif -n foo Exported ldif file: /var/lib/dirsrv/slapd-dstet/ldif/dstet-foo-2013_09_11_132022.ldif ldiffile: /var/lib/dirsrv/slapd-dstet/ldif/dstet-foo-2013_09_11_132022.ldif [11/Sep/2013:13:20:22 +0200] - export foo: Processed 3 entries (100%). [11/Sep/2013:13:20:22 +0200] - All database threads now stopped [jrusnack@dstet 389-scripts]$ cat /var/lib/dirsrv/slapd-dstet/ldif/dstet-foo-2013_09_11_132022.ldif version: 1 # entry-id: 2 dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20130911110308Z modifyTimestamp: 20130911110308Z nsUniqueId: b5b0d981-1ad111e3-9b54eb82-8658ba38 # entry-id: 3 dn: uid=testuser,ou=foo,dc=example,dc=com uid: testuser givenName: testuser objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson sn: testuser cn: testuser creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20130911110450Z modifyTimestamp: 20130911110450Z nsUniqueId: db3de301-1ad111e3-9ef18a1c-6d145b1d # entry-id: 4 dn: ou=foo,dc=example,dc=com ou: foo objectClass: top objectClass: organizationalunit creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20130911111822Z modifyTimestamp: 20130911111822Z nsUniqueId: cfebb701-1ad311e3-9b54eb82-8658ba38 There are 2 entries ou=foo,dc=example,dc=com that differ only in nsUniqueId.
Bug description: If a replication is configured against a backend before initializing the backend with a suffix entry, an RUV entry is inserted first with the entryid 1. The RUV entry's entryrdn is added to the entryrdn index with a suffix entry which is a parent entry of the RUV entry having a temporary entryid 0, which was to be replaced with the real entryid when the real suffix entry is added. But the replacement code was not executed.
Fix description: When a real suffix is added to the entryrdn index, it returns DB_KEYEXIST, which used to be ignored by resetting 0 (== SUCCESS). This patch returns DB_KEYEXIST to the caller and let _entryrdn_insert_key use the info to replace the temporary entryid
with the real one. The error code is ignored by the other callers.
git patch file (master) 0001-Ticket-47523-Set-up-replcation-agreement-before-init.patch
Reviewed by Nathan (Thank you!!)
Pushed to master: da59cff..e6eab21 master -> master commit e6eab21
Pushed to 389-ds-base-1.3.1: f7156e0..6b35dc7 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 6b35dc7
Pushed to 389-ds-base-1.2.11: 373e36a..df1f0ed 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit df1f0ed
Does this still handle correctly other cases where we want DB_KEYEXIST to be a real error? e.g. modrdn where the new rdn already exists? Other cases?
Replying to [comment:7 rmeggins]:
Yes, MODRDN handles it correctly: {{{ $ ldapsearch -LLLx [...] -b "ou=people,dc=example,dc=com" "(uid=tuser1)" dn dn: uid=tuser1,ou=subou1,ou=ou1,ou=People,dc=example,dc=com dn: uid=tuser1,ou=People,dc=example,dc=com dn: uid=tuser1,ou=ou1,ou=People,dc=example,dc=com
$ ldapmodify [...] dn: uid=tuser1,ou=ou1,ou=People,dc=example,dc=com changetype: modrdn newrdn: uid=tuser1 deleteoldrdn: 1 newsuperior: ou=People,dc=example,dc=com
modifying RDN of entry uid=tuser1,ou=ou1,ou=People,dc=example,dc=com and/or moving it beneath a new parent
ldap_rename: Already exists
dn: uid=tuser1,ou=subou1,ou=ou1,ou=People,dc=example,dc=com changetype: modrdn newrdn: uid=tuser1 deleteoldrdn: 1 newsuperior: ou=People,dc=example,dc=com
modifying RDN of entry uid=tuser1,ou=subou1,ou=ou1,ou=People,dc=example,dc=com and/or moving it beneath a new parent
[..] conn=9 op=1 MODRDN dn="uid=tuser1,ou=subou1,ou=ou1,ou=People,dc=example,dc=com" newrdn="uid=tuser1" newsuperior="ou=People,dc=example,dc=com" [..] conn=9 op=1 RESULT err=68 tag=109 nentries=0 etime=0 }}}
ok - ack
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.2.11.23
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/860
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.