Issue #3176: Memory leak in SSSD - sssd

SSSD / sssd

#3176 Memory leak in SSSD

Closed: Invalid None Opened 7 years ago by ondrejv2.

Memory consumption of process sssd_be grows indefinitely when:

AD backend selected
enumeration turned on

Unfortunately I am not aware of any further details. Have lots of machines running this version of sssd, but only those using enumeration seem to be affected.

lslebodn commented 7 years ago

Based on a version in the ticket I assume it is el6.
I am not aware of leaks in sssd but. IIRC there might be a leak in libtevent which is fixed in el7.

https://bugzilla.redhat.com/show_bug.cgi?id=1324387

BTW the leak might be also caused by some corner case with AD and enabled enumeration.

lslebodn commented 7 years ago

I hit enter too fast.
Is it reproducible on el7 as well?
Could you try to rebuild libtevent-0.9.26-1.el7_2.1 and use it on el6?

If it does not help it will be good to generate talloc report with gdb.
The assumption is that leak is in sssd_be

gdb -ex 'call talloc_enable_null_tracking()' \
    -ex 'call talloc_report_full(0, debug_file)' \
    -ex 'detach' /usr/libexec/sssd/sssd_be \
    -ex 'quit' `pgrep sssd_be`

Try to run gdb commands immediately after start and later when sssd_be is in idle mode ; otherwise it would be difficult to compare results.

_comment0: I hit enter too fast.
Is it reproducible on el7 as well?
Could you try to rebuild libtevent-0.9.26-1.el7_2.1 and use it on el6?

If it does not help it will be good to generate talloc report with gdb.
The assumption is that leak is in sssd_be
{{{
gdb -ex 'call talloc_enable_null_tracking()' -ex 'call talloc_report_full(0, debug_file)' -ex 'detach' /usr/libexec/sssd/sssd_be -ex 'quit' pgrep sssd_be
}}}

Try to run gdb commands immediately after start and later when sssd_be is in idle mode ; otherwise it would be difficult to compare results. => 1481113069781663
cc: => lslebodn

lslebodn commented 7 years ago

ping

ondrejv2 commented 7 years ago

Hi Sorry,
No time for this any more - I just disabled enumeration (it's a bad habit, anyway) so no time to investigate this any more...

lslebodn commented 7 years ago

I am closing this ticket as cannot fix due to insufficient data. Feel free to re-open an provide log file with talloc dumps from the 2nd comment.

resolution: => cantfix
status: new => closed

glorang commented 7 years ago

Hi,

I think we're hitting this bug on Debian Jessie with SSSD 1.11.7-3, I've already manually patched the libtevent bug (#1324387 mentioned in comment 1) which is also present on Debian, but we still see the memory slowly increasing day by day in the sssd_be process.

Comparing two talloc_report dumps I see many repetitions of "struct ldb_dn" for the same group, e.g:

On Nov 4th - 11:56 sssd was started.

This is the situation on Nov 4th at 14:59:

# grep "name=.*GROUP1 " talloc_dump_nov4.log
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x2004460
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x2026230
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x2021bb0
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x200f930
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1fc4090
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f48800
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1fa6650
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f78130
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f84d30
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f14da0
        name=GROUP1,cn=groups,cn=REALM.DOMAIN.COM,cn=sysdb contains     82 bytes in   1 blocks (ref 0) 0x1f15460

And on Nov 8th at 14:49, this increased to 363:

# grep "name=.*GROUP1 " talloc_dump_nov8.log |wc -l
363

This is just the output for 1 group "GROUP1" but it's happening for all groups we have defined in AD / that are looked up by SSSD.
I assume this should not be happening? To be honest I don't know exactly how to interpret the talloc report, but I guess every group should only be there once and it should come from the SSSD cache?

Is their anything specific you need from the talloc report? It looks a bit difficult to anonymize it completely, if this info is not yet enough I can try to setup a test environment and reproduce it there.

Maybe/probably related: On this machine a keepalived/VRRP service check is doing an authentication check every 20 seconds of an AD user via SSSD.

This is SSSD with LDAP backend connected to AD and enumeration enabled.

Thx,
Geert

resolution: cantfix =>
status: closed => reopened

lslebodn commented 7 years ago

I assume you use enumeration as well.
My assumption is that there was some error to process few groups and therefore there might be leaks.

Could you provide log file with high debug level + few talloc reports (with reasonable long delays).
If the file is big you can upload it somewhere and send me a private mail with link to the log. my_nick at fedoraproject dot org

lslebodn commented 7 years ago

BTW it would be also good If you could test with latest 1.12 or 1.13

glorang commented 7 years ago

For completeness: issue I'm seeing is a leak of "struct sysdb_attrs" allocations on null_context, at least present in Debian package sssd-1.11.7-3.

Issue can not be reproduced with latest 1.14.2 & 1.13.5 (git).

Please close again and sorry for noise :) Thx for your help Lukas!

jhrozek commented 7 years ago

Since the issue is confirmed to be resolved by upgrading, I'm closing this ticket.

resolution: => worksforme
status: reopened => closed

Metadata Update from @ondrejv2:
- Issue set to the milestone: NEEDS_TRIAGE

7 years ago

pbrezina commented 3 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4209

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

NEEDS_TRIAGE

type

defect

component

SSSD

version

1.13.3

selected

None

testsupdated

patch

rhbz

None

design_review

review

changelog

None

keywords

None

coverity

None

mark

blocking

None

design

None

sensitive

lslebodn

blockedby

None

feature_milestone

None

SSSD / sssd

Source Code

Documentation

#3176 Memory leak in SSSD Closed: Invalid None Opened 7 years ago by ondrejv2.

Metadata

#3176 Memory leak in SSSD

Closed: Invalid None Opened 7 years ago by ondrejv2.