#3176 Memory leak in SSSD
Closed: Invalid None Opened 7 years ago by ondrejv2.

Memory consumption of process sssd_be grows indefinitely when:

  • AD backend selected
  • enumeration turned on

Unfortunately I am not aware of any further details. Have lots of machines running this version of sssd, but only those using enumeration seem to be affected.


Based on a version in the ticket I assume it is el6.
I am not aware of leaks in sssd but. IIRC there might be a leak in libtevent which is fixed in el7.

https://bugzilla.redhat.com/show_bug.cgi?id=1324387

BTW the leak might be also caused by some corner case with AD and enabled enumeration.

I hit enter too fast.
Is it reproducible on el7 as well?
Could you try to rebuild libtevent-0.9.26-1.el7_2.1 and use it on el6?

If it does not help it will be good to generate talloc report with gdb.
The assumption is that leak is in sssd_be

gdb -ex 'call talloc_enable_null_tracking()' \
    -ex 'call talloc_report_full(0, debug_file)' \
    -ex 'detach' /usr/libexec/sssd/sssd_be \
    -ex 'quit' `pgrep sssd_be`

Try to run gdb commands immediately after start and later when sssd_be is in idle mode ; otherwise it would be difficult to compare results.

_comment0: I hit enter too fast.
Is it reproducible on el7 as well?
Could you try to rebuild libtevent-0.9.26-1.el7_2.1 and use it on el6?

If it does not help it will be good to generate talloc report with gdb.
The assumption is that leak is in sssd_be
{{{
gdb -ex 'call talloc_enable_null_tracking()' -ex 'call talloc_report_full(0, debug_file)' -ex 'detach' /usr/libexec/sssd/sssd_be -ex 'quit' pgrep sssd_be
}}}

Try to run gdb commands immediately after start and later when sssd_be is in idle mode ; otherwise it would be difficult to compare results. => 1481113069781663
cc: => lslebodn

Hi Sorry,
No time for this any more - I just disabled enumeration (it's a bad habit, anyway) so no time to investigate this any more...

I am closing this ticket as cannot fix due to insufficient data. Feel free to re-open an provide log file with talloc dumps from the 2nd comment.

resolution: => cantfix
status: new => closed

Hi,

I think we're hitting this bug on Debian Jessie with SSSD 1.11.7-3, I've already manually patched the libtevent bug (#1324387 mentioned in comment 1) which is also present on Debian, but we still see the memory slowly increasing day by day in the sssd_be process.

Comparing two talloc_report dumps I see many repetitions of "struct ldb_dn" for the same group, e.g:

On Nov 4th - 11:56 sssd was started.

This is the situation on Nov 4th at 14:59:

# grep "name=.*GROUP1 " talloc_dump_nov4.log
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x2004460
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x2026230
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x2021bb0
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x200f930
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1fc4090
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f48800
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1fa6650
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f78130
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f84d30
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f14da0
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f15460

And on Nov 8th at 14:49, this increased to 363:

# grep "name=.*GROUP1 " talloc_dump_nov8.log |wc -l
363

This is just the output for 1 group "GROUP1" but it's happening for all groups we have defined in AD / that are looked up by SSSD.
I assume this should not be happening? To be honest I don't know exactly how to interpret the talloc report, but I guess every group should only be there once and it should come from the SSSD cache?

Is their anything specific you need from the talloc report? It looks a bit difficult to anonymize it completely, if this info is not yet enough I can try to setup a test environment and reproduce it there.

Maybe/probably related: On this machine a keepalived/VRRP service check is doing an authentication check every 20 seconds of an AD user via SSSD.

This is SSSD with LDAP backend connected to AD and enumeration enabled.

Thx,
Geert

resolution: cantfix =>
status: closed => reopened

I assume you use enumeration as well.
My assumption is that there was some error to process few groups and therefore there might be leaks.

Could you provide log file with high debug level + few talloc reports (with reasonable long delays).
If the file is big you can upload it somewhere and send me a private mail with link to the log. my_nick at fedoraproject dot org

BTW it would be also good If you could test with latest 1.12 or 1.13

For completeness: issue I'm seeing is a leak of "struct sysdb_attrs" allocations on null_context, at least present in Debian package sssd-1.11.7-3.

Issue can not be reproduced with latest 1.14.2 & 1.13.5 (git).

Please close again and sorry for noise :) Thx for your help Lukas!

Since the issue is confirmed to be resolved by upgrading, I'm closing this ticket.

resolution: => worksforme
status: reopened => closed

Metadata Update from @ondrejv2:
- Issue set to the milestone: NEEDS_TRIAGE

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4209

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata