After installing IPA server and replica on F20, the ns-slapd process on both master and the replica eats all the available memory in a matter of seconds and is subsequently killed by OOM killer.
This happens with freeipa-server-3.3.3-2.fc20 and 389-ds-base-1.3.2.7-1.fc20. After downgrading to 389-ds-base-1.3.1.10-1.fc20 (http://koji.fedoraproject.org/koji/buildinfo?buildID=467667) the problem goes away.
Is there any chance that you could run ns-slapd under valgrind (with debuginfo for 389-ds-base and freeipa-server installed)?
I see the following resource leaks in the tip of the 1.3.2 branch running a Coverity scan. These leak reports do not show up when scanning the 1.3.1 branch:
{{{ Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:636: alloc_fn: Storage is returned from allocation function "slapi_ch_calloc(unsigned long, unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: alloc_fn: Storage is returned from allocation function "calloc(size_t, size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: var_assign: Assigning: "newmem" = "calloc(nelem, size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:266:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:636: var_assign: Assigning: "ectrls" = storage returned from "slapi_ch_calloc(2UL, 8UL)". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:639: noescape: Resource "ectrls + 0" is not freed or pointed-to in function "sync_create_state_control(Slapi_Entry , LDAPControl , int, Sync_Cookie )". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_util.c:158:58: noescape: "sync_create_state_control(Slapi_Entry , LDAPControl , int, Sync_Cookie )" does not free or save its pointer parameter "ctrlp". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:640: noescape: Resource "ectrls" is not freed or pointed-to in function "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)". 389-ds-base-1.3.2.7/ldap/servers/slapd/plugin.c:599:79: noescape: "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)" does not free or save its pointer parameter "ectrls". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:648: noescape: Resource "ectrls" is not freed or pointed-to in unimplemented function "ldap_controls_free(LDAPControl **)". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_persist.c:656: leaked_storage: Variable "ectrls" going out of scope leaks the storage it points to.
Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:663: alloc_fn: Storage is returned from allocation function "slapi_ch_calloc(unsigned long, unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: alloc_fn: Storage is returned from allocation function "calloc(size_t, size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:243:2: var_assign: Assigning: "newmem" = "calloc(nelem, size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:266:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:663: var_assign: Assigning: "ctrl" = storage returned from "slapi_ch_calloc(2UL, 8UL)". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:664: noescape: Resource "ctrl + 0" is not freed or pointed-to in function "sync_create_state_control(Slapi_Entry , LDAPControl , int, Sync_Cookie )". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_util.c:158:58: noescape: "sync_create_state_control(Slapi_Entry , LDAPControl , int, Sync_Cookie )" does not free or save its pointer parameter "ctrlp". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:665: noescape: Resource "ctrl" is not freed or pointed-to in function "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)". 389-ds-base-1.3.2.7/ldap/servers/slapd/plugin.c:599:79: noescape: "slapi_send_ldap_search_entry(Slapi_PBlock , Slapi_Entry , LDAPControl , char , int)" does not free or save its pointer parameter "ectrls". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:666: noescape: Resource "ctrl" is not freed or pointed-to in unimplemented function "ldap_controls_free(LDAPControl **)". 389-ds-base-1.3.2.7/ldap/servers/plugins/sync/sync_refresh.c:667: leaked_storage: Variable "ctrl" going out of scope leaks the storage it points to.
Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:249: alloc_fn: Storage is returned from allocation function "dblayer_get_full_inst_dir(struct ldbminfo , ldbm_instance , char , int)". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1427:13: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1427:13: var_assign: Assigning: "buf" = "slapi_ch_malloc(mylen)". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1428:9: noescape: Resource "buf" is not freed or pointed-to in function "sprintf(char * restrict, char const * restrict, ...)". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dblayer.c:1447:5: return_alloc: Returning allocated memory "buf". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:249: var_assign: Assigning: "inst_dirp" = storage returned from "dblayer_get_full_inst_dir(inst->inst_li, inst, inst_dir, 8192)". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:252: noescape: Resource "inst_dirp" is not freed or pointed-to in function "dbversion_read(struct ldbminfo , char const , char , char )". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/dbversion.c:158:49: noescape: "dbversion_read(struct ldbminfo , char const *, char , char )" does not free or save its pointer parameter "directory". 389-ds-base-1.3.2.7/ldap/servers/slapd/back-ldbm/upgrade.c:269: leaked_storage: Variable "inst_dirp" going out of scope leaks the storage it points to.
Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:753: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "attrs". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char , char )" allocates memory that is stored into "a". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "a" = "slapi_ch_malloc(16UL)". 389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:811: leaked_storage: Variable "attrs" going out of scope leaks the storage it points to.
Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:753: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "attrs". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char , char )" allocates memory that is stored into "a". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "a" = "slapi_ch_malloc(16UL)". 389-ds-base-1.3.2.7/ldap/servers/slapd/result.c:813: leaked_storage: Variable "attrs" going out of scope leaks the storage it points to.
Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:3979: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "extensions->values". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char , char )" allocates memory that is stored into "a". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "a" = "slapi_ch_malloc(16UL)". 389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:4008: overwrite_var: Overwriting "extensions->values" in "extensions = head" leaks the storage that "extensions->values" points to.
Error: RESOURCE_LEAK (CWE-772): 389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:4272: alloc_arg: "charray_add(char , char )" allocates memory that is stored into "extensions->values". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:59:5: alloc_arg: "slapi_ch_array_add_ext(char , char )" allocates memory that is stored into "a". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: alloc_fn: Storage is returned from allocation function "slapi_ch_malloc(unsigned long)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: alloc_fn: Storage is returned from allocation function "malloc(size_t)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:155:2: var_assign: Assigning: "newmem" = "malloc(size)". 389-ds-base-1.3.2.7/ldap/servers/slapd/ch_malloc.c:179:2: return_alloc: Returning allocated memory "newmem". 389-ds-base-1.3.2.7/ldap/servers/slapd/charray.c:69:9: var_assign: Assigning: "a" = "slapi_ch_malloc(16UL)". 389-ds-base-1.3.2.7/ldap/servers/slapd/schema.c:4300: overwrite_var: Overwriting "extensions->values" in "extensions = head" leaks the storage that "extensions->values" points to. }}}
I looked into the coverty results, bu I think they are either false or will not account for eatimng up all memory very fast.
Both cases would also only apply if these controls are used.
There is one leak in the upfrade chack, but should only be called at startup
There are two reports regarding schema parsing, but I am not sure if coverty is right, and it should only be visible in schema updates
I think it would be good to get a valgrind report or instructions to reproduce
How do I run valgrind to produce output useful for you?
I would:
stop the server to check and then run
valgrind -q --log-file=<outputfile> --leak-check=full /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-<instance> -i /var/run/dirsrv/slapd-<instance>.pid
let it run for some time until you think thee leak has occurred and stop the server, upload the output file
Unfortunately I was not able to reproduce with valgrind, because SASL/GSSAPI authentication stopped working with valgrind.
Additional info:
ipa-server-install
ipa user-add user
ipa-replica-prepare
ipa-replica-install
ipa user-del user
ipa user-show user
This makes ns-slapd on both the machines eat all the memory up and is killed (not sure if it happens 100% of the time though).
ns-slapd
After starting the dirsrv service on both machines again, I was unable to re-add the user, because the private user group (managed by MEP) of the user was not deleted along with the user. The group remains in undeletable state, because in order to delete it, you first must detach it from the user (with ipa group-detach), but there is no longer a user to detach it from. Could this be some MEP + replication issue?
ipa group-detach
After adding a new user on machine A and deleting it on machine B, ns-slapd on machine B eats all the memory and is killed.
I can provide you logs if you are interested in them.
yes, do you have auditlogs as well ? And the dse.ldif files from both servers. Is it possible to get a few pstacks from the process just before it has to be killed ?
I could reproduce the issue on the machines Jan provided. Starting the replica as soon as a replicated mod for a group arrives the process grows quickly until it dies. I didn't get it in gdb so far, but pstacks show the it is in backentry_dup and is copying an entry. The entry is small, so it maybe corruted somehow so that the loop over attributes or values is infinite.
Will try again to catch it in gdb
my suspicion was correct, the entry contains an attribute, which has its next pointer pointing to itself, so when copying this attribute it will infinitely copy it until all memory is exhausted
(gdb) p e->ep_entry->e_deleted_attrs $37 = (Slapi_Attr ) 0x7f2848007f20 (gdb) p e->ep_entry->e_deleted_attrs $38 = {a_type = 0x7f2848007fa0 "member", a_present_values = {num = 0, max = 0, sorted = 0x0, va = 0x0}, a_flags = 131588, a_plugin = 0x7f2882166170, a_deleted_values = {num = 1, max = 2, sorted = 0x0, va = 0x7f2848007fc0}, a_listtofree = 0x0, a_next = 0x7f2848007f20, a_deletioncsn = 0x0, a_mr_eq_plugin = 0x7f28821669e0, a_mr_ord_plugin = 0x0, a_mr_sub_plugin = 0x0} (gdb) p *e->ep_entry->e_deleted_attrs->a_next $39 = {a_type = 0x7f2848007fa0 "member", a_present_values = {num = 0, max = 0, sorted = 0x0, va = 0x0}, a_flags = 131588, a_plugin = 0x7f2882166170, a_deleted_values = {num = 1, max = 2, sorted = 0x0, va = 0x7f2848007fc0}, a_listtofree = 0x0, a_next = 0x7f2848007f20, a_deletioncsn = 0x0, a_mr_eq_plugin = 0x7f28821669e0, a_mr_ord_plugin = 0x0, a_mr_sub_plugin = 0x0}
Have to find how it gets into this state
the replicated mod which brought the server into this state was the delete of an already deleted user.
I was able to reproduce this (or very similar behaviour) by deleting a user (the only user in the group) simultaneously on both servers.
Is it possible that ipa user-del could have sent the delete to both servers ?
attachment 0001-Ticket-47612-ns-slapd-eats-all-the-memory.patch
$ git merge ticket47612 Updating 11898c3..2cc32e6 Fast-forward ldap/servers/slapd/entrywsi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) $ git push origin master Counting objects: 11, done. Delta compression using up to 4 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 915 bytes, done. Total 6 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 11898c3..2cc32e6 master -> master
This was checked into 389-ds-base-1.3.2 branch as well:
9258b1836d7f5a5c22cab2d961044400a9d8857b
Marking as closed.
Metadata Update from @nkinder: - Issue assigned to lkrispen - Issue set to the milestone: 1.3.2.8
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/949
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.