Learn more about these different git repos.
Other Git URLs
Hello,
under certain circumstances (cold cache, empty cache) the nss responder answers with the wrong number of secondary groups when queried with a lot of parallel requests.
There has been a discussion about this issue on the mailing list :
https://lists.fedorahosted.org/pipermail/sssd-users/2015-April/002823.html
Someone else manage to reproduce the bug (Chris Petty, using ldap + ad. We are only using ldap with rfc2307 schema on our side) with my python script, so I think it is a good time to open an official bug.
For me, that's a blocker (it means we cannot use sssd on our compute cluster, we'll use a dump of passwd/group/shadow) but feel free to adjust the priority of this bug.
I've got a test case without involving a third party software. It is quite reproductible on my machine. Since it looks like a race, you may need to tweak the parameter of the python script.
The basic idea is to run a bunch of process and wait for a slight amount of time before calling the initgroups libc function for a specific user
You have to log as root and not use sudo to prevent sssd cache to be populated before the test is started. You also need to cleanup sssd state before running the test.
usage:
## log as root ## check the number secondary group for a user using id for example # id jbdenis uid=21489(jbdenis) gid=110(sis) groups=110(sis),3044(CIB),19(floppy),1177(dump-projets),56(netadm),3125(vpn-ssl-admin)
Here I've got 5 secondary groups (sis is my primary group)
## !! VERY IMPORTANT !! cleanup sssd state # /etc/init.d/sssd stop && rm -f /var/lib/sss/mc/* /var/lib/sss/db/* && /etc/init.d/sssd start ## run this program # python initgroups.py jbdenis 110 5 24 200 wrong number of secondary groups in process 17145 : 0 instead of 5 (sleep 55ms) wrong number of secondary groups in process 17149 : 0 instead of 5 (sleep 55ms) 2/24 failed # first parameter is a login # second parameter is your primary gid (could be anything) # third parameter is your number of secondary groups # fourth parameter is the number of process you want to run concurrently # the last parameter is the maximum delay in milliseconds before calling initgroups (the delay is randomized up to this maximum)
I've got good results with 24 processes and randomized delay of 200ms between startup. Those parameters are somewhat relative to the machine you're running the script on I guess. You may have to run this test multiple time before triggering the bug.
I'm unable to reproduce the bug when I use 0 delay and I think that why we could reproduce it with our initial test case.
I've reproduced the bug with 1.12.4, 1.11.6 and 1.9.7.
Here is the output from Chris Petty on the mailing list :
I actually tried it and it was reproducible on my system using sssd 1.11.6 ( ad and ldap config ).
[root@dirac linux]# python initgroups.py cmp12 119549 95 24 200 wrongs number of secondary groups in process 4363 : 5 instead of 95 (sleep 78ms) wrongs number of secondary groups in process 4366 : 5 instead of 95 (sleep 95ms) wrongs number of secondary groups in process 4353 : 5 instead of 95 (sleep 90ms) wrongs number of secondary groups in process 4362 : 5 instead of 95 (sleep 108ms) wrongs number of secondary groups in process 4358 : 5 instead of 95 (sleep 110ms) wrongs number of secondary groups in process 4371 : 5 instead of 95 (sleep 121ms)
attachment sssd.conf
attachment initgroups.py
Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1215765 (Red Hat Enterprise Linux 6)
rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=1215765 1215765]
Lukas was able to reproduce with the help of the reporter.
owner: somebody => lslebodn
attachment ldap-init.ldif
attachment slapd-minimal.conf
attachment sssd-minimal.conf
Just to keep everything within this ticket :
We've got a "recipie" and configuration files to reproduce the bug from scratch, on a vanilla CentOS 6 distro (the ldap part is inspired from http://wiki.openiam.com/pages/viewpage.action?pageId=7635198)
# yum install sssd sssd-common openldap-servers openldap-clients perl-LDAP.noarch # cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG # chown -R ldap:ldap /var/lib/ldap # cd /etc/openldap && mv slapd.d slapd.d.original # cp /root/slapd-minimal.conf /etc/openldap/slapd.conf # use the one provided with this message # chown ldap:ldap /etc/openldap/slapd.conf # chmod 600 /etc/openldap/slapd.conf # Add this line is /etc/sysconfig/ldap SLAPD_OPTIONS="-h \"ldap://127.0.0.1 ldaps://127.0.0.1\"" # service slapd start # chkconfig slapd on
Check that you can connect (the Manager password is "openldap") :
# ldapsearch -h localhost -x -w openldap -D 'cn=Manager,dc=example,dc=com' -b 'dc=example,dc=com' 'objectclass=*'
Time to populate our ldap server with our provided file (one user "user1" with password "openldap" belonging to 29 secondary groups):
# ldapadd -h localhost -x -w openldap -D 'cn=Manager,dc=example,dc=com' -f /root/ldap-init.ldif
You can check that everything went fine with the previous ldapsearch command.
Copy our sssd configuration file:
# cp /root/sssd-minimal.conf /etc/sssd/sssd.conf # chown root:root /etc/sssd/sssd.conf && chmod 600 /etc/sssd/sssd.conf # service sssd start # chkconfig sssd on # # not sure if the authconfig is strictly necessary here # authconfig --enablesssd --enablesssdauth --enablelocauthorize --enablemkhomedir --enablepamaccess --updateall --nostart # service sssd restart
In /etc/nsswitch.conf, check for :
passwd: files sss shadow: files sss group: files sss # cat /etc/sssd/sssd.conf [sssd] config_file_version = 2 services = nss, pam domains = ldap_local [nss] filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd override_shell = /bin/bash [pam] [domain/ldap_local] override_homedir = /home/%u auth_provider = ldap ldap_schema = rfc2307 ldap_search_base = ou=people,dc=example,dc=com ldap_group_search_base = ou=group,dc=example,dc=com id_provider = ldap ldap_uri = ldap://localhost/
You can now run your script or mine. Just adapt the initgroups.py call or use the one provided with this message:
python initgroups.py user1 50001 29 $num_proc $delay)
And run:
# ./run_initgroups.sh Stopping sssd: [ OK ] Starting sssd: [ OK ] .wrongs number of secondary groups in process 17626 : 0 instead of 29 (sleep 16ms) wrongs number of secondary groups in process 17630 : 0 instead of 29 (sleep 26ms) wrongs number of secondary groups in process 17634 : 0 instead of 29 (sleep 49ms) wrongs number of secondary groups in process 17615 : 0 instead of 29 (sleep 53ms) 4/24 failed
OR
# ./reproduce.sh Stopping sssd: [ OK ] Starting sssd: [ OK ] wrongs number of secondary groups in process 15664 : 0 instead of 29 (sleep 10ms) wrongs number of secondary groups in process 15672 : 0 instead of 29 (sleep 9ms) wrongs number of secondary groups in process 15673 : 0 instead of 29 (sleep 10ms) 3/20 failed Stopping sssd: [ OK ] Starting sssd: [ OK ] wrongs number of secondary groups in process 15747 : 0 instead of 29 (sleep 3ms) wrongs number of secondary groups in process 15734 : 0 instead of 29 (sleep 4ms) wrongs number of secondary groups in process 15735 : 0 instead of 29 (sleep 10ms) wrongs number of secondary groups in process 15748 : 0 instead of 29 (sleep 3ms) wrongs number of secondary groups in process 15743 : 0 instead of 29 (sleep 7ms) wrongs number of secondary groups in process 15745 : 0 instead of 29 (sleep 7ms) wrongs number of secondary groups in process 15736 : 0 instead of 29 (sleep 5ms) wrongs number of secondary groups in process 15742 : 0 instead of 29 (sleep 4ms) wrongs number of secondary groups in process 15731 : 0 instead of 29 (sleep 10ms) wrongs number of secondary groups in process 15732 : 0 instead of 29 (sleep 14ms) wrongs number of secondary groups in process 15739 : 0 instead of 29 (sleep 4ms) wrongs number of secondary groups in process 15749 : 0 instead of 29 (sleep 4ms)
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.12.5
patch: 0 => 1 status: new => assigned
Fixed upstream. master: - dca7411 - d0cc678 - fd60528 - 390de02 and sssd-1-12: - cd4e784 - 9ae6567 - 521eb7c - 21431d9 - c3d7e06 - 17f2f1c - eb6be4e
resolution: => fixed status: assigned => closed
Metadata Update from @jbd: - Issue assigned to lslebodn - Issue set to the milestone: SSSD 1.12.5
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/3675
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.