#5464 ipa-extdom-extop plugin can exhaust DS worker threads
Closed: fixed 6 years ago Opened 8 years ago by tbordaz.

ipa-extdom-extop is used to resolve AD trust users/groups. It does this using libnss calls like getpwnam, getgrname, etc.

libnss calls are serialized by a simple lock and each call can last a long time because it has to get info from SSSD/AD.

If a DS server is flooded with "IPA trusted domain ID mapper" extop, many worker threads will be busy for long time. The worse condition is when all the workers are busy with such extop.
Then DS is no longer to process others requests and DS appears to have transient hang.

ipa-extdom-extop should manage those extop with its own threads (possibly like persistant searches) to not impact DS.


4.4.0 was released, moving open tickets to 4.4.1

This should be implemented by creating a worker thread controlled by the plugin that queries the synchronous interface serially, and then when an answer is received passes the data to a thread pool of worker threads used to send results to clients.

unfortunately in order to do that we need helpers from DS to be able to manage connections, this 389 ticket blocks progres: https://fedorahosted.org/389/ticket/47661

moving out tickets not implemented in 4.4.1

4.4.2 is a stabilization milestone. If this bug is important stabilization bug then please put it to NEEDS TRIAGE milestone for retriage.

Just for recording:
- Even a low number of sssd client can exhaust workers because a same client can abandon EXTDOM and through a new connection request a new EXTDOM. The problem is that 389-ds will not detect the abandon until SSSD/AD lookup returns, so the abandoned req will consume a worker until AD/SSSD timesout (default is 5min).

  • A successful workaround is to reduce the timeout SSSD->AD to 5s
  • Improvement on SSSD will be to provide a nss call with timeout, so that 389-ds will timeout on its own rather than on SSSD/AD tuning
  • An other discussed fix is to manage a global counter at EXTDOM plugin level. When the counter hit a configurable limit, the worker loop on sleep until either the counter goes down or the operation is abandoned. This fix should be quite easy to implement

Metadata Update from @tbordaz:
- Issue assigned to tbordaz
- Issue set to the milestone: FreeIPA 4.5 backlog

7 years ago

master:

  • 78ad1cf ipa-extdom-extop: refactor nsswitch operations

Metadata Update from @abbra:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

ipa-4-6:

  • d1dd794 ipa-extdom-extop: refactor nsswitch operations

ipa-4-5:

  • a2da9f9 ipa-extdom-extop: refactor nsswitch operations

Login to comment on this ticket.

Metadata