Learn more about these different git repos.
Other Git URLs
Dear SSSD team,
we are experiencing an issue with SSSD, where sssd_be is consuming a lot of memory. The RAM consumption grows continuously in a certain setup where a SFTP/SSH login happens every 30 seconds.
sssd_be
During a time period of 17 hours the memory usage of SSSD increased by 20% respectively 200MB (1024MB system memory). After restarting SSSD the memory consumption goes back to normal. \The attached debug log file shows the login sequence of the mentioned SFTP user.
debug log file
Since we don't have a large LDAP directory (~60 Unix users / ~20 Unix groups) I suppose we might have a misconfiguration in our sssd.conf (see attachments).
sssd.conf
\
When I remove the cache file of SSSD and run an id on 50 LDAP users the memory consumption grows only about ~5MB. The memory usage stays the same even when I run the id command over and over again (executed at least 20 times). The commands getent passwd and getent group also do not increase the memory usage of SSSD.
cache
id
getent passwd
getent group
Information about the environment/system: - LDAP is ID and AUTH provider - LDAP schema is rfc2307bis - RHEL 6.2 / CentOS 6.2 - sssd-client-1.5.1-66.el6_2.3.i686 - sssd-1.5.1-66.el6_2.3.i686
Attachments: - sssd.conf (sanitized) - sssd_EXAMPLE.log (sanitized) - sssd_mem_usage.png (graph)
If you need any further debug information please let me know.\Many thanks for looking into this issue.
Kind regards, mayak
sssd.conf (sanitized) sssd.conf
sssd_EXAMPLE.log (sanitized) sssd_EXAMPLE.log
sssd_mem_usage.png (graph) <img alt="sssd_mem_usage.png" src="/SSSD/sssd/issue/raw/files/08c39e97f06898b17937fec7b196dc7520f4a46b68bc311a796d7d8bd8f60a9e-sssd_mem_usage.png" />
Do you happen to know which particular process consumes the memory? SSSD spawns several processes - with your configuration that would be sssd, sssd_nss, sssd_pam and sssd_be.
Replying to [comment:1 jhrozek]:
The original report reads: "where sssd_be is consuming a lot of memory".
component: SSSD => Data Provider milestone: NEEDS_TRIAGE => SSSD 1.8.2 (LTM) priority: major => critical
Fields changed
owner: somebody => jhrozek
status: new => assigned
daily memory growth (cronjob restarts sssd) <img alt="memutil.png" src="/SSSD/sssd/issue/raw/files/bfc840c6c66b84211f2bf636bf7ea6ea145304fcd86cddfe0c941e2f190055f4-memutil.png" />
In case you were waiting for my confirmation - yes, the process is called sssd_be. \ Sorry for the late reply. Thanks for your efforts.
_comment0: In case you were waiting for my confirmation - yes, the process is called sssd_be. Thank you. => 1332319702347356 _comment1: In case you were waiting for my confirmation - yes, the process is called sssd_be. \ Sorry for the late reply. Thank for your efforts. => 1332319721689213
Replying to [comment:5 mayak]:
Thank you, I'm looking into the issue, but so far I've been unable to reproduce a memory growth in my test environment. I'll run another round of tests today.
Replying to [comment:6 jhrozek]:
Thank you, I'm looking into the issue, but so far I've been unable to reproduce a memory growth in my test environment. I'll run another round of tests today. I have to admit that I am also unable to reproduce this issue under '//normal//' circumstances. It only appears on this one host, where a proprietary software establishes a SFTP connection every 30 seconds. When I manually open several SFTP connections with WinSCP/SCP the memory consumption of sssd_be does not increase. \ Please let me know if I have to provide more valuable information or need to debug something for you. \ Thank you.
WinSCP
SCP
Replying to [comment:7 mayak]:
Replying to [comment:6 jhrozek]: Thank you, I'm looking into the issue, but so far I've been unable to reproduce a memory growth in my test environment. I'll run another round of tests today. I have to admit that I am also unable to reproduce this issue under '//normal//' circumstances. It only appears on this one host, where a proprietary software establishes a SFTP connection every 30 seconds. When I manually open several SFTP connections with WinSCP/SCP the memory consumption of sssd_be does not increase. \ Please let me know if I have to provide more valuable information or need to debug something for you. \ Thank you.
Would you be willing to run valgrind for us and send us the results? You can do this by installing valgrind (via yum) and then adding the following line to your sssd.conf in the [domain/EXAMPLE] section (substituting EXAMPLE with your actual SSSD domain name):
command = /usr/bin/valgrind --log-file=/tmp/EXAMPLE-grind.%p.log /usr/libexec/sssd/sssd_be --domain EXAMPLE --debug-to-files
Run 'service sssd restart' and note the PID of the sssd_be process that is running (using ps -ef |grep sssd_be). Run the test for a few minutes, then do 'service sssd stop' and attach the /tmp/EXAMPLE-grind.<PID>.log to this ticket.
Replying to [comment:8 sgallagh]:
Would you be willing to run valgrind for us and send us the results?
Yes, I will try to provide you those debug information.\ I will need a few minutes/hours. \ Thanks for your patience.
Please find the generated EXAMPLE-grind.<PID>.log in the attachments. The sssd daemon was running for about 5 minutes with this additional command. The command of the valgrind process was called (ps -ef): \ /usr/bin/valgrind --log-file=/tmp/EXAMPLE-grind.%p.log /usr/libexec/sssd/sssd_be --domain EXAMPLE --debug-to-files \ \ I hope this helps. Many thanks.
EXAMPLE-grind.<PID>.log
sssd
valgrind
ps -ef
/usr/bin/valgrind --log-file=/tmp/EXAMPLE-grind.%p.log /usr/libexec/sssd/sssd_be --domain EXAMPLE --debug-to-files
_comment0: Please find the generated EXAMPLE-grind.<PID>.log in the attachments. The sssd daemon was running for about 5 minutes with this additional command. The command of the valgrind process was called (ps -ef): \ /usr/bin/valgrind --log-file=/tmp/GSCF-grind.%p.log /usr/libexec/sssd/sssd_be --domain EXAMPLE --debug-to-files \ \ I hope this helps. Many thanks. => 1332424207357672
/usr/bin/valgrind --log-file=/tmp/GSCF-grind.%p.log /usr/libexec/sssd/sssd_be --domain EXAMPLE --debug-to-files
attachment EXAMPLE-grind.8141.log
Replying to [comment:10 mayak]:
Hi, sorry, but you also need to add --leak-check=full into the list of the valgrind options. Without this switch, valgrind only reports memory access issues (such as use-after-free), but not the leaks.
--leak-check=full
Sorry about the inconvenience. We would still very much appreciate the data.
btw, I've seen the invalid file descriptor message during my testing with 1.5.x, but never with 1.8
Dear jhrozek, please find the requested log file in the attachments. This time I run valgrind with the --leak-check=full option. The sssd service was running at least 5 minutes. \ Let me know if you need more information. Thanks.
valgrind output (--leak-check=full) EXAMPLE-grind.13700.log
That's quite helpful, thank you. These are the two biggest leaks:
==13700== 163,300 (800 direct, 162,500 indirect) bytes in 10 blocks are definitely lost in loss record 560 of 564 ==13700== at 0x40053B3: calloc (vg_replace_malloc.c:467) ==13700== by 0xCBBD00: ber_memcalloc_x (in /lib/liblber-2.4.so.2.5.6) ==13700== by 0x12249F: ldap_send_server_request (in /lib/libldap-2.4.so.2.5.6) ==13700== by 0x123323: ldap_chase_v3referrals (in /lib/libldap-2.4.so.2.5.6) ==13700== by 0x10D4EB: ldap_result (in /lib/libldap-2.4.so.2.5.6) ==13700== by 0x485E262: ??? (in /usr/lib/sssd/libsss_ldap.so.1.0.0) ==13700== by 0xB71E8A: ??? (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB74124: ??? (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB70F17: _tevent_loop_once (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB70FAE: ??? (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB70C88: _tevent_loop_wait (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0x8080D1C: server_loop (in /usr/libexec/sssd/sssd_be) ==13700== ==13700== 987,424 (4,720 direct, 982,704 indirect) bytes in 59 blocks are definitely lost in loss record 564 of 564 ==13700== at 0x40053B3: calloc (vg_replace_malloc.c:467) ==13700== by 0xCBBD00: ber_memcalloc_x (in /lib/liblber-2.4.so.2.5.6) ==13700== by 0x12249F: ldap_send_server_request (in /lib/libldap-2.4.so.2.5.6) ==13700== by 0x123323: ldap_chase_v3referrals (in /lib/libldap-2.4.so.2.5.6) ==13700== by 0x10D4EB: ldap_result (in /lib/libldap-2.4.so.2.5.6) ==13700== by 0x485E262: ??? (in /usr/lib/sssd/libsss_ldap.so.1.0.0) ==13700== by 0xB74262: ??? (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB70F17: _tevent_loop_once (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB70FAE: ??? (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0xB70C88: _tevent_loop_wait (in /usr/lib/libtevent.so.0.9.8) ==13700== by 0x8080D1C: server_loop (in /usr/libexec/sssd/sssd_be) ==13700== by 0x8055BB8: main (in /usr/libexec/sssd/sssd_be)
May I guess that your LDAP server is Microsoft Active Directory? I see memory leaks during referral chasing from the valgrind log and MSAD utilizes quite a few of those.
If your environment does not use referrals, can you check if setting ldap_referrals = false makes the memory consumption better?
ldap_referrals = false
Also there is a big number of leaks coming from moznss/nspr. I saw those when I tested on RHEL6, but not Fedora. I will follow up with the openldap maintainers to see if they are aware of any leaks.
There is also a resolver related memory leak that is already fixed in 6.3. That one shouldn't be a problem, though because hostname resolution is a relatively rare operation.
Replying to [comment:13 jhrozek]:
That's right. We use MSAD as directory server.
For testing purposes I have added the option ldap_referrals = false to sssd.conf on this one host. I don't think this is a coincidence but the execution time of the id command (on about 50 LDAP users) was much quicker!
Some additional testing:
I cleaned the cache file of sssd and run the id command (50 users) with ldap_referrals = true the process sssd_be was consuming 1.6% of the system memory (~16 MB).
ldap_referrals = true
When I did the same with ldap_referrals = false the process sssd_be was consuming only 0.5% of the system memory (~5 MB).
To check if the option ldap_referrals = false makes the memory consumption better I will keep sssd running this way until next Monday (~2 days). I will keep you updated. \ Thank you for all hints and efforts.
Replying to [comment:14 mayak]:
To check if the option ldap_referrals = false makes the memory consumption better I will keep sssd running this way until next Monday (~2 days).
The option ldap_referrals = false massively improved the memory consumption of sssd_be. Logins and LDAP queries are also faster. I will keep this new configuration for SSSD. \ Please let me know if you need more details. Many thanks.
SSSD
memory utilization with 'ldap_referrals = false' <img alt="ldap_referrals-false.png" src="/SSSD/sssd/issue/raw/files/729ed46e525c055672b065c98cd6dfced312fdb0d1cebfef2fd47589af3b559c-ldap_referrals-false.png" />
Replying to [comment:15 mayak]:
Replying to [comment:14 mayak]: To check if the option ldap_referrals = false makes the memory consumption better I will keep sssd running this way until next Monday (~2 days). The option ldap_referrals = false massively improved the memory consumption of sssd_be. Logins and LDAP queries are also faster. I will keep this new configuration for SSSD. \ Please let me know if you need more details. Many thanks.
As both the graph and the valgrind log show, there seems to be a memory leak somewhere in the referral support of openldap libraries. I will follow with the openldap maintainer to check if this is a known issue.
Also, Stephen found a memory leak in SSSD's TLS setup, which might have contributed to the growth. Those will be fixed in the next SSSD release.
The memory leak in openldap was not known to the openldap maintainer and is now being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=807363
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=808064
rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=808064 808064]
The TLS leak was fixed in:
The openldap referral memory leak is being tracked in the Red Hat Bugzilla now.
patch: 0 => 1
Marking as complete. The openldap issue is out of our control.
resolution: => fixed status: assigned => closed
Metadata Update from @mayak: - Issue assigned to jhrozek - Issue set to the milestone: SSSD 1.8.2 (LTM)
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/2293
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.