#1885 Possible DS memory leak when using FreeIPA
Closed: Fixed None Opened 12 years ago by jzeleny.

I wrote a simple script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/bin/bash

ipa group-add large-group --desc="Large group"

for i in `seq 1 3000`; do
    echo "Adding user $i to IPA ... "
    ipa user-add largeuser$i --first=Large --last=User$i > /dev/null
    echo "Adding user $i to large-group ... "
    ipa group-add-member large-group --user=largeuser$i > /dev/null
done

When I run this script, somewhere around $i == 1100 the DS consumes all system memory and IPA commands don't work any more. The only option at that point is to kill ns-slapd with SIGKILL.


I'm guessing that you are making all of the worker threads busy as they serialize the operations due to the memberOf lock. Once all of the worker threads are waiting, you will notice the client start to stall until a thread frees up. None of the ipa-* commands will be responsive if ns-slapd is unable to respond. I will need to confirm this by checking out the threads in gdb once the process is in this state.

As for the memory usage, how large is ns-slapd getting? How much physical memory does your test system have? When ns-slapd runs out of memory (malloc calls fail), it will detect this and exit the process. This must not be happing in your case since the process is still running.

I am attempting to reproduce this on a F15 x86_64 system using the following packages:

389-ds-base-1.2.9-10-2.fc15.x86_64
freeipa-server-2.1.0-1.20110927T1551Zgit08ec4b0.fc15.x86_64

My test is still running, but it would be useful if you could get a stacktrace from all of the ns-slapd threads when you encounter this problem. Install the 389-ds-base-debuginfo package, and do a 'thread apply all bt' with gdb attached to ns-slapd when it is non-responsive.

In my test, the client stalled out adding user 771 to large-group. The resident memory used by ns-slapd did rise steadily to 305m, with virtual memory going from 1520m to about 1537m. In my case, it doesn't seem to be a memory related issue, but instead it looks to be CPU related. My ns-slapd process is sitting at 99-100% CPU usage when it appears to be hung.

I attached to the ns-slapd process with gdb, and I notice that most of the worker threads are idle and waiting for new connections. I see no contention over the memberOf operation lock. There is one modify operation that is doing work around memberOf, and one search operation in another thread. I see 3 interesting threads that indicate that this is an issue within libdb:

Thread 39 (Thread 0x7f3b2d065700 (LWP 5730)):

0 0x0000003dd00d91f3 in select () from /lib64/libc.so.6

#1 0x0000003ddc133175 in __os_yield () from /lib64/libdb-4.8.so

2 0x0000003ddc12eb3f in __memp_sync_int () from /lib64/libdb-4.8.so

#3 0x0000003ddc140f3a in __txn_checkpoint () from /lib64/libdb-4.8.so
#4 0x0000003ddc1412b1 in __txn_checkpoint_pp () from /lib64/libdb-4.8.so
#5 0x00007f3b314611e4 in checkpoint_threadmain (param=0x948f00)
at ldap/servers/slapd/back-ldbm/dblayer.c:3858
#6 0x0000003ddb828553 in ?? () from /lib64/libnspr4.so
#7 0x0000003dd0407b31 in start_thread () from /lib64/libpthread.so.0
#8 0x0000003dd00dfd2d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f3b167f4700 (LWP 5750)):

0 0x0000003dd00c7b57 in sched_yield () from /lib64/libc.so.6

#1 0x0000003ddc1331ad in __os_yield () from /lib64/libdb-4.8.so

2 0x0000003ddc025124 in __db_tas_mutex_lock () from /lib64/libdb-4.8.so

#3 0x0000003ddc121fca in __memp_fget () from /lib64/libdb-4.8.so
#4 0x0000003ddc041b1a in __bam_get_root () from /lib64/libdb-4.8.so
#5 0x0000003ddc0420bd in __bam_search () from /lib64/libdb-4.8.so
#6 0x0000003ddc02e1b6 in ?? () from /lib64/libdb-4.8.so
#7 0x0000003ddc0325e4 in ?? () from /lib64/libdb-4.8.so
#8 0x0000003ddc0d496c in __dbc_iput () from /lib64/libdb-4.8.so
#9 0x0000003ddc0d66a9 in __dbc_put () from /lib64/libdb-4.8.so
#10 0x0000003ddc0c8c5e in __db_put () from /lib64/libdb-4.8.so
#11 0x0000003ddc0e13cb in __db_put_pp () from /lib64/libdb-4.8.so
#12 0x00007f3b31469cdf in idl_new_insert_key (be=<optimized out>,
db=<optimized out>, key=<optimized out>, id=1743, txn=<optimized out>,
a=<optimized out>, disposition=0x0)
at ldap/servers/slapd/back-ldbm/idl_new.c:424
#13 0x00007f3b31475e9f in addordel_values_sv (be=0xa4dfd0, db=0x7f3b0404cff0,
type=<optimized out>, indextype=<optimized out>, vals=0x1d37ee8, id=1743,
flags=1, txn=0x7f3b167e9470, a=0xafdde0, idl_disposition=0x0,
buffer_handle=0x0) at ldap/servers/slapd/back-ldbm/index.c:1780
#14 0x00007f3b31476b04 in index_addordel_values_ext_sv (be=0xa4dfd0,
type=<optimized out>, vals=0x1d3d970, evals=0x0, id=1743, flags=1,
txn=0x7f3b167e9470, idl_disposition=0x0, buffer_handle=0x0)
at ldap/servers/slapd/back-ldbm/index.c:1917
#15 0x00007f3b31476e84 in index_addordel_values_sv (be=<optimized out>,
type=<optimized out>, vals=<optimized out>, evals=<optimized out>,
id=<optimized out>, flags=<optimized out>, txn=0x7f3b167e9470)
at ldap/servers/slapd/back-ldbm/index.c:1839
#16 0x00007f3b31477020 in index_add_mods (be=0xa4dfd0, mods=0x1d39270,
olde=0x7f3ad9ddc060, newe=0x1d38910, txn=0x7f3b167e9470)
at ldap/servers/slapd/back-ldbm/index.c:607
#17 0x00007f3b31490430 in ldbm_back_modify (pb=0x1d3aee0)
at ldap/servers/slapd/back-ldbm/ldbm_modify.c:443
#18 0x00007f3b35c4abef in op_shared_modify (pb=<optimized out>, pw_change=0,
old_pw=0x0) at ldap/servers/slapd/modify.c:888
#19 0x00007f3b35c4b1e1 in modify_internal_pb (pb=0x1d3aee0)
at ldap/servers/slapd/modify.c:562
#20 0x00007f3b30ba78cb in memberof_fix_memberof_callback (e=<optimized out>,
callback_data=<optimized out>)
at ldap/servers/plugins/memberof/memberof.c:2371
#21 0x00007f3b30ba8078 in memberof_modop_one_replace_r (pb=0x1bd2650,
config=0x7f3b167ed9b0, mod_op=0,
group_dn=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com", op_this=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com",
replace_with=0x0,
op_to=0x1d38980 "uid=largeuser771,cn=users,cn=accounts,dc=example,dc=com",
stack=0x0) at ldap/servers/plugins/memberof/memberof.c:1261
#22 0x00007f3b30ba8895 in memberof_modop_one_r (stack=0x0,
op_to=0x1d38980 "uid=largeuser771,cn=users,cn=accounts,dc=example,dc=com",
op_this=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com", group_dn=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com",
mod_op=0, config=0x7f3b167ed9b0, pb=0x1bd2650)
at ldap/servers/plugins/memberof/memberof.c:1047
#23 memberof_modop_one (
op_to=0x1d38980 "uid=largeuser771,cn=users,cn=accounts,dc=example,dc=com",
op_this=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com", mod_op=0, config=0x7f3b167ed9b0, pb=0x1bd2650)
at ldap/servers/plugins/memberof/memberof.c:1036
#24 memberof_mod_smod_list (pb=0x1bd2650, config=0x7f3b167ed9b0, mod=0,
group_dn=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com", smod=0x1b48b50) at ldap/servers/plugins/memberof/memberof.c:1378
#25 0x00007f3b30ba9317 in memberof_add_smod_list (smod=0x1b48b50,
groupdn=0x1d09190 "cn=large-group,cn=groups,cn=accounts,dc=example,dc=com",
config=0x7f3b167ed9b0, pb=0x1bd2650)
at ldap/servers/plugins/memberof/memberof.c:1398
#26 memberof_postop_modify (pb=0x1bd2650)
at ldap/servers/plugins/memberof/memberof.c:830
#27 0x00007f3b35c57bcd in plugin_call_func (list=0x95ac50, operation=505,
pb=0x1bd2650, call_one=0) at ldap/servers/slapd/plugin.c:1439
#28 0x00007f3b35c57ddd in plugin_call_list (pb=0x1bd2650, operation=505,
list=<optimized out>) at ldap/servers/slapd/plugin.c:1401
#29 plugin_call_plugins (pb=0x1bd2650, whichfunction=505)
at ldap/servers/slapd/plugin.c:383
#30 0x00007f3b35c4aca7 in op_shared_modify (pb=<optimized out>, pw_change=0,
old_pw=0x0) at ldap/servers/slapd/modify.c:946
#31 0x00007f3b35c4b9a1 in do_modify (pb=0x1bd2650)
at ldap/servers/slapd/modify.c:384
#32 0x0000000000416d0e in connection_dispatch_operation (pb=<optimized out>,
op=0x1bb55b0, conn=0x7f3b295c4280) at ldap/servers/slapd/connection.c:583
#33 connection_threadmain () at ldap/servers/slapd/connection.c:2328
#34 0x0000003ddb828553 in ?? () from /lib64/libnspr4.so
#35 0x0000003dd0407b31 in start_thread () from /lib64/libpthread.so.0
#36 0x0000003dd00dfd2d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f3b0e7e4700 (LWP 5766)):

0 0x0000003dd040b4b5 in pthread_cond_wait@@GLIBC_2.3.2 ()

from /lib64/libpthread.so.0
#1 0x0000003ddc025b5b in __db_pthread_mutex_lock () from /lib64/libdb-4.8.so

2 0x0000003ddc02553e in __db_tas_mutex_readlock () from /lib64/libdb-4.8.so

#3 0x0000003ddc121ed1 in __memp_fget () from /lib64/libdb-4.8.so
#4 0x0000003ddc0d5485 in __dbc_iget () from /lib64/libdb-4.8.so
#5 0x0000003ddc0e063d in __dbc_get_pp () from /lib64/libdb-4.8.so
#6 0x00007f3b31469a86 in idl_new_fetch (be=0xa4dfd0, db=<optimized out>,
inkey=<optimized out>, txn=<optimized out>, a=0xafdde0,
flag_err=0x7f3b0e7db30c) at ldap/servers/slapd/back-ldbm/idl_new.c:271
#7 0x00007f3b31478cdd in index_read_ext (be=0xa4dfd0, type=<optimized out>,
indextype=0x36 <Address 0x36 out of bounds>, val=<optimized out>, txn=0x0,
err=0x7f3b0e7db30c, unindexed=0x7f3b0e7d8cd8)
at ldap/servers/slapd/back-ldbm/index.c:948
#8 0x00007f3b31462c08 in keys2idl (be=0xa4dfd0,
type=0x7f3b244893c0 "memberOf", indextype=0x7f3b314b09cb "eq",
ivals=<optimized out>, err=0x7f3b0e7db30c, unindexed=0x7f3b0e7d8cd8)
at ldap/servers/slapd/back-ldbm/filterindex.c:912
#9 0x00007f3b3146332a in ava_candidates (pb=0x1d3c2f0, be=0xa4dfd0,
f=<optimized out>, ftype=<optimized out>, nextf=<optimized out>,
range=<optimized out>, err=0x7f3b0e7db30c)
at ldap/servers/slapd/back-ldbm/filterindex.c:286
#10 0x00007f3b314639e4 in filter_candidates (pb=0x1d3c2f0, be=0xa4dfd0,
base=0x7f3b24738030 "uid=largeuser16,cn=users,cn=accounts,dc=example,dc=com", f=0x7f3b244c76e0, nextf=0x0, range=0, err=0x7f3b0e7db30c)
at ldap/servers/slapd/back-ldbm/filterindex.c:129
#11 0x00007f3b31464683 in list_candidates (pb=0x1d3c2f0, be=0xa4dfd0,
base=0x7f3b24738030 "uid=largeuser16,cn=users,cn=accounts,dc=example,dc=com", flist=0x7f3b2474d7d0, ftype=<optimized out>, err=0x7f3b0e7db30c)
at ldap/servers/slapd/back-ldbm/filterindex.c:756
#12 0x00007f3b31463b0a in filter_candidates (pb=0x1d3c2f0, be=0xa4dfd0,
base=0x7f3b24738030 "uid=largeuser16,cn=users,cn=accounts,dc=example,dc=com", f=0x7f3b2474d7d0, nextf=0x0, range=0, err=0x7f3b0e7db30c)
at ldap/servers/slapd/back-ldbm/filterindex.c:162
#13 0x00007f3b314940c5 in subtree_candidates (pb=0x1d3c2f0, be=0xa4dfd0,
base=0x7f3b24738030 "uid=largeuser16,cn=users,cn=accounts,dc=example,dc=com"
, e=0x7f3b0000fa20, filter=0x7f3b244c76e0, managedsait=<optimized out>,
allids_before_scopingp=0x7f3b0e7db2fc, err=0x7f3b0e7db30c)
at ldap/servers/slapd/back-ldbm/ldbm_search.c:966
#14 0x00007f3b3149540d in build_candidate_list (candidates=0x7f3b0e7db278,
lookup_returned_allidsp=0x7f3b0e7db2fc, scope=2,
base=0x7f3b24738030 "uid=largeuser16,cn=users,cn=accounts,dc=example,dc=com", e=0x7f3b0000fa20, be=<optimized out>, pb=0x1d3c2f0)
at ldap/servers/slapd/back-ldbm/ldbm_search.c:775
#15 ldbm_back_search (pb=0x1d3c2f0)
at ldap/servers/slapd/back-ldbm/ldbm_search.c:475
#16 0x00007f3b35c509c7 in op_shared_search (pb=0x1d3c2f0, send_result=1)
at ldap/servers/slapd/opshared.c:686
#17 0x0000000000424b92 in do_search (pb=<optimized out>)
at ldap/servers/slapd/search.c:393
#18 0x0000000000416d90 in connection_dispatch_operation (pb=<optimized out>,
op=0x1d01090, conn=0x7f3b295c4280) at ldap/servers/slapd/connection.c:611
#19 connection_threadmain () at ldap/servers/slapd/connection.c:2328
#20 0x0000003ddb828553 in ?? () from /lib64/libnspr4.so
#21 0x0000003dd0407b31 in start_thread () from /lib64/libpthread.so.0
#22 0x0000003dd00dfd2d in clone () from /lib64/libc.so.6

Should I still attach the stack trace from my machine? I'm trying this on a virtual machine, 1G RAM, F15 x86_64.

When calculating indirect membership it would run through the entire list of members looking to see if they too had members. This is is unnecessary for entry types that do not have members (like users and hosts).

What one would observe prior to this patch is each member dn would be queried for a member attribute, including users:

[04/Oct/2011:13:38:54 -0700] conn=2055 op=9 SRCH base="uid=largeuser1,cn=users,cn=accounts,dc=example,dc=com" scope=2 filter="(memberOf=cn=large-group,cn=groups,cn=accounts,dc=example,dc=com)" attrs="member"
[04/Oct/2011:13:38:54 -0700] conn=2055 op=9 RESULT err=0 tag=101 nentries=1 etime=0
[04/Oct/2011:13:38:54 -0700] conn=2055 op=10 SRCH base="uid=largeuser2,cn=users,cn=accounts,dc=example,dc=com" scope=2 filter="(memberOf=cn=large-group,cn=groups,cn=accounts,dc=example,dc=com)" attrs="member"
[04/Oct/2011:13:38:54 -0700] conn=2055 op=10 RESULT err=0 tag=101 nentries=1 etime=0
[04/Oct/2011:13:38:54 -0700] conn=2055 op=11 SRCH base="uid=largeuser3,cn=users,cn=accounts,dc=example,dc=com" scope=2 filter="(memberOf=cn=large-group,cn=groups,cn=accounts,dc=example,dc=com)" attrs="member"
[04/Oct/2011:13:38:54 -0700] conn=2055 op=7 RESULT err=0 tag=103 nentries=0 etime=0

There are several things to test here:

  1. Run the script to confirm that these unnecessary queries aren't being done
  2. Run the nesting unit test
  3. Create several users and add them to a group. Add this group to another group. Display this group to confirm that all the users are indirect members.

Reduce the # of queries we made when calculating indirect membership:

master: d6698eb2585999f0cdeec0fab02390ab66e9f8f0

ipa-2-1: c5384f2

Nathan found that we are doigng a sub scope search when finding members, we should do a base scope search. He thinks this will also alleviate the performance problem as we won't be hammering the memberOf index.

Honza, please look into the scoping isuses, patch where necessary and re-assign to me when that part is complete.

Honza is pretty busy in 2.1.3, reassigning to myself as I have some experience with our ldap2 framework.

master: e5389ff

ipa-2-1: 5aa6e99

Reassigning to myself to track 389-ds improvements so we can update spec file.

Moving the ticket to the next month iteration.

Moving to next month iteration.

As Nathan mentioned in his comments above, he has a reproducer for this bug. We could duplicate the bug on FC15 with db4.8.30; we could not on FC16 with db5.2.36. The bug was in libdb and fixed by db5.2.36.

Could you please tell us on which version of db and Fedora you were running when this bug was found?

Related ticket: https://fedorahosted.org/389/ticket/25

I wonder if this was related to slapi-nis too where a number of memory leaks were fixed.

I disabled the compat plugin and was able to run the script successfully.

Marking as complete. Using updated 389-ds (1.2.10) and slapi-nis (0.36) not seeing memory explosion.

Metadata Update from @jzeleny:
- Issue assigned to rcritten
- Issue set to the milestone: FreeIPA 2.2 Core Effort - 2012/02

7 years ago

Login to comment on this ticket.

Metadata