#1229 sssd_nss gets hung processing identical search requests
Closed: Fixed None Opened 12 years ago by jraquino.

There is a reoccurring intermittent issue that I have been experiencing with sssd 1.7 where SSSD either during or after a failure & recovery of a FreeIPA server where sssd will fail to process a user and forget to mark the status. Thus when the FreeIPA server comes back, an effected user cannot login. (This is paticular painful when this user is present in pam_access as a (Do not allow this user to login) since it causes pam to stop processing while it waits for sssd to finish processing the pam_access user and never gets to the user logging in... This issue seems only to occur during some timing based situations where the user is being accessed in an outage of the FreeIPA server.

( I am updating to 1.8 in the hopes that that solves this problem )

Here is a snip it from the sssd_nss.log:

(Mon Mar  5 14:43:08 2012) [sssd[nss]] [accept_fd_handler] (0x0100): Client connected!
(Mon Mar  5 14:43:08 2012) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Mon Mar  5 14:43:08 2012) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Mon Mar  5 14:43:08 2012) [sssd[nss]] [nss_cmd_getpwnam] (0x0100): Requesting info for [brokenuser] from [<ALL>]
(Mon Mar  5 14:43:08 2012) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/USER/example.com/brokenuser]
(Mon Mar  5 14:43:08 2012) [sssd[nss]] [nss_cmd_getpwnam_search] (0x0100): Requesting info for [brokenuser@example.com]
(Mon Mar  5 14:43:08 2012) [sssd[nss]] [sss_dp_get_account_send] (0x0400): Identical request in progress: [1:brokenuser@example.com]

This problem seems to stick around for indefinitely (until you restart sssd, which is tough if you can't actually login to the system)

Fields changed

owner: somebody => simo
status: new => assigned

This was a nasty little bug to find out.
Turned out someone forgot to update a cleanup function when the hash table location was moved elsewhere.

patch: 0 => 1

To reproduce do the following.
Clear your caches (or expire them all) and start sssd.
ps xa|grep sssd find the backend pid and do a kill -STOP <pid>
now do a getent passwd username the sssd_nss responder will send a message to the backend but it is blocked so it will not act.
Now kill -9 <pid> the backend.
Do a new getent passwd username in anothe terminal. Look at the nss logs (level 8) and you'll see that the call is stalled waiting for a reply that will never come.

With the patch the second name resolution will cause the cleanup function to fire and a brand new call to the backend is issued.
no more stalling.

milestone: NEEDS_TRIAGE => SSSD 1.8.1 (LTM)

Fixed by:
- 65976ea (master)
- de9b723 (sssd-1-8)

component: SSSD => NSS
description: There is a reoccurring intermittent issue that I have been experiencing with sssd 1.7 where SSSD either during or after a failure & recovery of a FreeIPA server where sssd will fail to process a user and forget to mark the status. Thus when the FreeIPA server comes back, an effected user cannot login. (This is paticular painful when this user is present in pam_access as a (Do not allow this user to login) since it causes pam to stop processing while it waits for sssd to finish processing the pam_access user and never gets to the user logging in... This issue seems only to occur during some timing based situations where the user is being accessed in an outage of the FreeIPA server.

( I am updating to 1.8 in the hopes that that solves this problem )

Here is a snip it from the sssd_nss.log:
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [accept_fd_handler] (0x0100): Client connected!
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [nss_cmd_getpwnam] (0x0100): Requesting info for [brokenuser] from [<ALL>]
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/USER/example.com/brokenuser]
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [nss_cmd_getpwnam_search] (0x0100): Requesting info for [brokenuser@example.com]
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_dp_get_account_send] (0x0400): Identical request in progress: [1:brokenuser@example.com]
=> There is a reoccurring intermittent issue that I have been experiencing with sssd 1.7 where SSSD either during or after a failure & recovery of a FreeIPA server where sssd will fail to process a user and forget to mark the status. Thus when the FreeIPA server comes back, an effected user cannot login. (This is paticular painful when this user is present in pam_access as a (Do not allow this user to login) since it causes pam to stop processing while it waits for sssd to finish processing the pam_access user and never gets to the user logging in... This issue seems only to occur during some timing based situations where the user is being accessed in an outage of the FreeIPA server.

( I am updating to 1.8 in the hopes that that solves this problem )

Here is a snip it from the sssd_nss.log:
{{{
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [accept_fd_handler] (0x0100): Client connected!
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [nss_cmd_getpwnam] (0x0100): Requesting info for [brokenuser] from [<ALL>]
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/USER/example.com/brokenuser]
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [nss_cmd_getpwnam_search] (0x0100): Requesting info for [brokenuser@example.com]
(Mon Mar 5 14:43:08 2012) [sssd[nss]] [sss_dp_get_account_send] (0x0400): Identical request in progress: [1:brokenuser@example.com]
}}}
resolution: => fixed
status: assigned => closed

Ignore comment 7, it was meant for ticket #1227

Metadata Update from @jraquino:
- Issue assigned to simo
- Issue set to the milestone: SSSD 1.8.1 (LTM)

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/2271

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata