#2252 Document that `sssd` cache needs to be cleared manually, if ID mapping configuration changes
Closed: Fixed None Opened 10 years ago by jhrozek.

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 7): Bug 1060389

Description of problem:

Sometimes we  need to clear the `sssd` cache manually, if we change the `sssd`
configuration or `sssd` is failing to start. So, if we need to clear the `sssd`
cache (manually) because `sssd` finds it unusable (is a common issue), customer
would expect either one of two things :

a) sssd logs the reason for startup failure somewhere less hidden.  It should
not be necessary to set debug_level on the daemon and infer the meaning from
one of the files inside /var/log/sssd/. Since systemd already suggests checking
the logs with 'systemctl status sssd.service' or 'journalctl -xn', rather than
simply logging:

    Jan 15 10:36:13 abc.xyz.com systemd[1]: Starting System Security Services
Daemon...
    Jan 15 10:36:13 abc.xyz.com sssd[20912]: Starting up
    Jan 15 10:36:13 abc.xyz.com sssd[be[20913]: Starting up
    Jan 15 10:36:13 abc.xyz.com sssd[be[20914]: Starting up
    Jan 15 10:36:15 abc.xyz.com sssd[be[20915]: Starting up
    Jan 15 10:36:19 abc.xyz.com sssd[be[20920]: Starting up
    Jan 15 10:36:19 abc.xyz.com systemd[1]: sssd.service: control process
exited, code=exited status=1
    Jan 15 10:36:19 abc.xyz.com systemd[1]: Failed to start System Security
Services Daemon.
    Jan 15 10:36:19 abc.xyz.com systemd[1]: Unit sssd.service entered failed
state.

Note: `sssd` should log an additional entry along the lines of:
    Jan 15 10:36:19 abc.xyz.com sssd[20920]: Unable to start, local cache is
unusable.  Try rm -f /var/lib/sss/db/*cache* and restarting sssd.

Right now, even after knowing to turn up debug_level and look inside
/var/log/sssd/, it is not readily apparent that the failures reported are the
result of a cache that sssd finds unusable.

b) Don't fail on an unusable cache, but clean it up for the user.  If the cache
file is unusable/corrupt to the point that sssd literally cannot even start,
then the cache is providing no benefit to the user or system.  Rather than
crashing, sssd should clear the cache files (or move them to
/var/lib/sss/db-corrupt) and continue its startup.  This action should be
logged to the system log so log monitoring tools can capture the event.  I am
sure somewhere someone will not want the cache auto-cleared, but that is easily
addressed by a config file flag to disable the auto-cleanup.

Options (a) and (b) are mutually exclusive.  While (a) is a minimal and
hopefully relatively easy option to accomplish, (b) would provide the host a
more robust identity lookup service and reduce admin frustration.

Regards,
Chinmay Paradkar

Fields changed

blockedby: =>
blocking: =>
changelog: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
milestone: NEEDS_TRIAGE => SSSD 1.13 beta
review: True => 0
selected: =>
testsupdated: => 0

Fields changed

milestone: SSSD 1.13 beta => SSSD 1.11.5

Bumping the priority as this patch was requested by downstream.

priority: major => critical

Fields changed

owner: somebody => jhrozek
patch: 0 => 1
status: new => assigned

resolution: => fixed
status: assigned => closed

Metadata Update from @jhrozek:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.11.5

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/3294

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata