Learn more about these different git repos.
Other Git URLs
Memory consumption of process sssd_be grows indefinitely when:
Unfortunately I am not aware of any further details. Have lots of machines running this version of sssd, but only those using enumeration seem to be affected.
Based on a version in the ticket I assume it is el6. I am not aware of leaks in sssd but. IIRC there might be a leak in libtevent which is fixed in el7.
https://bugzilla.redhat.com/show_bug.cgi?id=1324387
BTW the leak might be also caused by some corner case with AD and enabled enumeration.
I hit enter too fast. Is it reproducible on el7 as well? Could you try to rebuild libtevent-0.9.26-1.el7_2.1 and use it on el6?
If it does not help it will be good to generate talloc report with gdb. The assumption is that leak is in sssd_be
gdb -ex 'call talloc_enable_null_tracking()' \ -ex 'call talloc_report_full(0, debug_file)' \ -ex 'detach' /usr/libexec/sssd/sssd_be \ -ex 'quit' `pgrep sssd_be`
Try to run gdb commands immediately after start and later when sssd_be is in idle mode ; otherwise it would be difficult to compare results.
_comment0: I hit enter too fast. Is it reproducible on el7 as well? Could you try to rebuild libtevent-0.9.26-1.el7_2.1 and use it on el6?
If it does not help it will be good to generate talloc report with gdb. The assumption is that leak is in sssd_be {{{ gdb -ex 'call talloc_enable_null_tracking()' -ex 'call talloc_report_full(0, debug_file)' -ex 'detach' /usr/libexec/sssd/sssd_be -ex 'quit' pgrep sssd_be }}}
pgrep sssd_be
Try to run gdb commands immediately after start and later when sssd_be is in idle mode ; otherwise it would be difficult to compare results. => 1481113069781663 cc: => lslebodn
ping
Hi Sorry, No time for this any more - I just disabled enumeration (it's a bad habit, anyway) so no time to investigate this any more...
I am closing this ticket as cannot fix due to insufficient data. Feel free to re-open an provide log file with talloc dumps from the 2nd comment.
resolution: => cantfix status: new => closed
Hi,
I think we're hitting this bug on Debian Jessie with SSSD 1.11.7-3, I've already manually patched the libtevent bug (#1324387 mentioned in comment 1) which is also present on Debian, but we still see the memory slowly increasing day by day in the sssd_be process.
Comparing two talloc_report dumps I see many repetitions of "struct ldb_dn" for the same group, e.g:
On Nov 4th - 11:56 sssd was started.
This is the situation on Nov 4th at 14:59:
# grep "name=.*GROUP1 " talloc_dump_nov4.log name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x2004460 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x2026230 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x2021bb0 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x200f930 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1fc4090 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1f48800 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1fa6650 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1f78130 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1f84d30 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1f14da0 name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains 82 bytes in 1 blocks (ref 0) 0x1f15460
And on Nov 8th at 14:49, this increased to 363:
# grep "name=.*GROUP1 " talloc_dump_nov8.log |wc -l 363
This is just the output for 1 group "GROUP1" but it's happening for all groups we have defined in AD / that are looked up by SSSD. I assume this should not be happening? To be honest I don't know exactly how to interpret the talloc report, but I guess every group should only be there once and it should come from the SSSD cache?
Is their anything specific you need from the talloc report? It looks a bit difficult to anonymize it completely, if this info is not yet enough I can try to setup a test environment and reproduce it there.
Maybe/probably related: On this machine a keepalived/VRRP service check is doing an authentication check every 20 seconds of an AD user via SSSD.
This is SSSD with LDAP backend connected to AD and enumeration enabled.
Thx, Geert
resolution: cantfix => status: closed => reopened
I assume you use enumeration as well. My assumption is that there was some error to process few groups and therefore there might be leaks.
Could you provide log file with high debug level + few talloc reports (with reasonable long delays). If the file is big you can upload it somewhere and send me a private mail with link to the log. my_nick at fedoraproject dot org
BTW it would be also good If you could test with latest 1.12 or 1.13
For completeness: issue I'm seeing is a leak of "struct sysdb_attrs" allocations on null_context, at least present in Debian package sssd-1.11.7-3.
Issue can not be reproduced with latest 1.14.2 & 1.13.5 (git).
Please close again and sorry for noise :) Thx for your help Lukas!
Since the issue is confirmed to be resolved by upgrading, I'm closing this ticket.
resolution: => worksforme status: reopened => closed
Metadata Update from @ondrejv2: - Issue set to the milestone: NEEDS_TRIAGE
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/4209
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.