#47612 ns-slapd eats all the memory in IPA replicated environment on F20
Closed: wontfix None Opened 10 years ago by jcholast.

After installing IPA server and replica on F20, the ns-slapd process on both master and the replica eats all the available memory in a matter of seconds and is subsequently killed by OOM killer.

This happens with freeipa-server-3.3.3-2.fc20 and 389-ds-base-1.3.2.7-1.fc20. After downgrading to 389-ds-base-1.3.1.10-1.fc20 (http://koji.fedoraproject.org/koji/buildinfo?buildID=467667) the problem goes away.


Is there any chance that you could run ns-slapd under valgrind (with debuginfo for 389-ds-base and freeipa-server installed)?

I see the following resource leaks in the tip of the 1.3.2 branch running a Coverity scan. These leak reports do not show up when scanning the 1.3.1 branch:

{{{
Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:636: alloc_fn: Storage is returned from allocation function "slapi_ch_calloc(unsigned long, unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: alloc_fn: Storage is returned from allocation function "calloc(size_t, size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: var_assign: Assigning: "newmem" = "calloc(nelem, size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:266:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:636: var_assign: Assigning: "ectrls" = storage returned from "slapi_ch_calloc(2UL, 8UL)".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:639: noescape: Resource "ectrls + 0" is not freed or pointed-to in function "sync_create_state_control(Slapi_Entry , LDAPControl , int, Sync_Cookie )".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_util.c:158:58: noescape: "sync_create_state_control(Slapi_Entry
, LDAPControl
, int, Sync_Cookie
)" does not free or save its pointer parameter "ctrlp".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:640: noescape: Resource "ectrls" is not freed or pointed-to in function "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)".
389-ds-base-1.3.2.7/ldap/servers/slapd/plugin.c:599:79: noescape: "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)" does not free or save its pointer parameter "ectrls".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:648: noescape: Resource "ectrls" is not freed or pointed-to in unimplemented function "ldap_controls_free(LDAPControl **)".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:656: leaked_storage: Variable "ectrls" going out of scope leaks the storage it points to.

Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:663: alloc_fn: Storage is returned from allocation function "slapi_ch_calloc(unsigned long, unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: alloc_fn: Storage is returned from allocation function "calloc(size_t, size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: var_assign: Assigning: "newmem" = "calloc(nelem, size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:266:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:663: var_assign: Assigning: "ctrl" = storage returned from "slapi_ch_calloc(2UL, 8UL)".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:664: noescape: Resource "ctrl + 0" is not freed or pointed-to in function "sync_create_state_control(Slapi_Entry , LDAPControl , int, Sync_Cookie )".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_util.c:158:58: noescape: "sync_create_state_control(Slapi_Entry
, LDAPControl
, int, Sync_Cookie
)" does not free or save its pointer parameter "ctrlp".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:665: noescape: Resource "ctrl" is not freed or pointed-to in function "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)".
389-ds-base-1.3.2.7/ldap/servers/slapd/plugin.c:599:79: noescape: "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)" does not free or save its pointer parameter "ectrls".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:666: noescape: Resource "ctrl" is not freed or pointed-to in unimplemented function "ldap_controls_free(LDAPControl **)".
389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:667: leaked_storage: Variable "ctrl" going out of scope leaks the storage it points to.

Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:249: alloc_fn: Storage is returned from allocation function "dblayer_get_full_inst_dir(struct ldbminfo , ldbm_instance , char , int)".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1427:13: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1427:13: var_assign: Assigning: "buf" = "slapi_ch_malloc(mylen)".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1428:9: noescape: Resource "buf" is not freed or pointed-to in function "sprintf(char * restrict, char const * restrict, ...)".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1447:5: return_alloc: Returning allocated memory "buf".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:249: var_assign: Assigning: "inst_dirp" = storage returned from "dblayer_get_full_inst_dir(inst->inst_li, inst, inst_dir, 8192)".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:252: noescape: Resource "inst_dirp" is not freed or pointed-to in function "dbversion_read(struct ldbminfo
, char const , char , char )".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dbversion.c:158:49: noescape: "dbversion_read(struct ldbminfo
, char const *, char , char )" does not free or save its pointer parameter "directory".
389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:269: leaked_storage: Variable "inst_dirp" going out of scope leaks the storage it points to.

Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:753: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "attrs".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char
, char )" allocates memory that is stored into "a".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "
a" = "slapi_ch_malloc(16UL)".
389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:811: leaked_storage: Variable "attrs" going out of scope leaks the storage it points to.

Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:753: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "attrs".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char
, char )" allocates memory that is stored into "a".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "
a" = "slapi_ch_malloc(16UL)".
389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:813: leaked_storage: Variable "attrs" going out of scope leaks the storage it points to.

Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:3979: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "extensions->values".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char
, char )" allocates memory that is stored into "a".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "
a" = "slapi_ch_malloc(16UL)".
389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:4008: overwrite_var: Overwriting "extensions->values" in "extensions = head" leaks the storage that "extensions->values" points to.

Error: RESOURCE_LEAK (CWE-772):
389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:4272: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "extensions->values".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char
, char )" allocates memory that is stored into "a".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)".
389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem".
389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "
a" = "slapi_ch_malloc(16UL)".
389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:4300: overwrite_var: Overwriting "extensions->values" in "extensions = head" leaks the storage that "extensions->values" points to.
}}}

I looked into the coverty results, bu I think they are either false or will not account for eatimng up all memory very fast.

  • the first two refer to repl sync and it says that ldap_controls_free is unimplemented, but it is and is widely called, so the controls should be freed, does coverty look into the components like libldap ?
  • there are two related to entry state controls, and there is a leak but only if the attr list in the control contains 1.1 AND attrs

Both cases would also only apply if these controls are used.

There is one leak in the upfrade chack, but should only be called at startup

There are two reports regarding schema parsing, but I am not sure if coverty is right, and it should only be visible in schema updates

I think it would be good to get a valgrind report or instructions to reproduce

How do I run valgrind to produce output useful for you?

I would:

stop the server to check and then run

valgrind -q --log-file=<outputfile> --leak-check=full /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-<instance> -i /var/run/dirsrv/slapd-<instance>.pid

let it run for some time until you think thee leak has occurred and stop the server, upload the output file

Unfortunately I was not able to reproduce with valgrind, because SASL/GSSAPI authentication stopped working with valgrind.

Additional info:

  • To reproduce the issue, do the following:
    1. Install IPA server on machine A with ipa-server-install
    2. Add IPA user on machine A with ipa user-add user
    3. Prepare replica on machine A with ipa-replica-prepare
    4. Install IPA replica on machine B with ipa-replica-install
    5. Delete IPA user on machine B with ipa user-del user
    6. Show user info on machine A with ipa user-show user

This makes ns-slapd on both the machines eat all the memory up and is killed (not sure if it happens 100% of the time though).

  • After starting the dirsrv service on both machines again, I was unable to re-add the user, because the private user group (managed by MEP) of the user was not deleted along with the user. The group remains in undeletable state, because in order to delete it, you first must detach it from the user (with ipa group-detach), but there is no longer a user to detach it from. Could this be some MEP + replication issue?

  • After adding a new user on machine A and deleting it on machine B, ns-slapd on machine B eats all the memory and is killed.

I can provide you logs if you are interested in them.

yes, do you have auditlogs as well ? And the dse.ldif files from both servers.
Is it possible to get a few pstacks from the process just before it has to be killed ?

I could reproduce the issue on the machines Jan provided.
Starting the replica as soon as a replicated mod for a group arrives the process grows quickly until it dies. I didn't get it in gdb so far, but pstacks show the it is in backentry_dup and is copying an entry. The entry is small, so it maybe corruted somehow so that the loop over attributes or values is infinite.

Will try again to catch it in gdb

my suspicion was correct, the entry contains an attribute, which has its next pointer pointing to itself, so when copying this attribute it will infinitely copy it until all memory is exhausted

(gdb) p e->ep_entry->e_deleted_attrs
$37 = (Slapi_Attr ) 0x7f2848007f20
(gdb) p
e->ep_entry->e_deleted_attrs
$38 = {a_type = 0x7f2848007fa0 "member", a_present_values = {num = 0, max = 0, sorted = 0x0, va = 0x0}, a_flags = 131588, a_plugin = 0x7f2882166170, a_deleted_values = {num = 1, max = 2, sorted = 0x0,
va = 0x7f2848007fc0}, a_listtofree = 0x0, a_next = 0x7f2848007f20, a_deletioncsn = 0x0, a_mr_eq_plugin = 0x7f28821669e0, a_mr_ord_plugin = 0x0, a_mr_sub_plugin = 0x0}
(gdb) p *e->ep_entry->e_deleted_attrs->a_next
$39 = {a_type = 0x7f2848007fa0 "member", a_present_values = {num = 0, max = 0, sorted = 0x0, va = 0x0}, a_flags = 131588, a_plugin = 0x7f2882166170, a_deleted_values = {num = 1, max = 2, sorted = 0x0,
va = 0x7f2848007fc0}, a_listtofree = 0x0, a_next = 0x7f2848007f20, a_deletioncsn = 0x0, a_mr_eq_plugin = 0x7f28821669e0, a_mr_ord_plugin = 0x0, a_mr_sub_plugin = 0x0}

Have to find how it gets into this state

the replicated mod which brought the server into this state was the delete of an already deleted user.

I was able to reproduce this (or very similar behaviour) by deleting a user (the only user in the group) simultaneously on both servers.

Is it possible that ipa user-del could have sent the delete to both servers ?

$ git merge ticket47612
Updating 11898c3..2cc32e6
Fast-forward
ldap/servers/slapd/entrywsi.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
$ git push origin master
Counting objects: 11, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 915 bytes, done.
Total 6 (delta 4), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
11898c3..2cc32e6 master -> master

This was checked into 389-ds-base-1.3.2 branch as well:

    9258b1836d7f5a5c22cab2d961044400a9d8857b

Marking as closed.

Metadata Update from @nkinder:
- Issue assigned to lkrispen
- Issue set to the milestone: 1.3.2.8

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/949

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata