#387 managed entry sometimes doesn't delete the managed entry
Closed: wontfix None Opened 11 years ago by rmeggins.

I don't believe this is related to ticket #358

TET managedentry tests pass - IPA server and replica install work

But the http://freeipa.org/page/Testing tests/test_xmlrpc/test_user_plugin.py fails sometimes. The test may pass, but it will leave the group entry behind. During test cleanup, it is supposed to remove the user entries, and the group entry should be removed by mep. But if this fails, the group entry cannot be deleted directly without first removing the mepManagedEntry objectclass. Subsequent tests runs will fail due to the presence of this group entry.

I've been able to reproduce this very seldom. Martin saw this:
find_entry_internal_uniqueid retry count exceeded; uniqueid = (%s)
which suggests either a mishandled deadlock or a cache problem.

At the very least we need some error reporting from mep_del_post_op - there is currently no error checking or logging for the delete.


If you can reproduce this problem easily, please try turning on the LDAP_DEBUG_CACHE (32768) level error logging and see if you can reproduce. This may help narrow down the problem to the cache.
http://port389.org/wiki/FAQ#Troubleshooting

I have been able to reproduce the issue using a debug build from the tip of the 389-ds-base-1.2.11 branch. The issue occurs intermittently. I have had times where it fails after 2 or 3 cycles of the IPA test_user_plugins.py tests, or after 15. With LDAP_DEBUG_CACHE logging enabled, it took nearly 50 test cycles for the issue to occur. This indicates that it is timing related.

When the problem occurs, I am left with a dangling managed entry for tuser1 (the group entry). I dumped id2entry.db4 using dbscan and confirmed that the managed entry does actually exist in the database and not just the entry cache.

I will attach the errors log with LDAP_DEBUG_CACHE logging enabled.

The errors log doesn't show much of interest, though I notice that the dncache return messages related to the tuser1 managed group entry stop once the problem occurs. This might simply be a byproduct of the operations that the test performs related to the tuser1 user entry no longer go through.

One other thing to note is that the managed entry plug-in thinks that the internal delete was successful. I applied a patch from Mark that would log an error in mep_del_post_op() if the internal delete operation of the managed entry returned anything other than LDAP_SUCCESS. That new logging statement never appears in my logfile.

Added more logging, see attachment (logging-diff) to try and test Nathans theory that another operation is deleting an entry while the post op from another thread is adding mep objectclass and attribute.

In the process of reproducing with a script...

I was able to reproduce this with "plain" 389. I say "plain" because I had to reproduce the ipa dse config, schema, and data tree - basically ipa without the ipa and slapi-nis plugins.

The problem occurs when we retry a modify operation due to a deadlock. The problem is that a be_txn_preop plugin can modify the entry and the mod list. The first problem is that we do not have a good way to keep track of this and restore it. For example, the code currently stores the original entry i.e. the entry before modifying. If we have to retry, we have to reproduce the after entry by applying the mods to the before entry, expanding objectclasses, checking for schema/syntax/etc. If we do this, we may lose any changes made to the "after" entry that were made in a bepreop. The original entry gets written to the db, which is a no-op, and the data we thought was supposed to be written is not, because it was discarded.

bepreop and betxnpreop code can alter the data that goes into the db in two ways - by altering the SLAPI_MODIFY_EXISTING_ENTRY Slapi_Entry*, and by altering the SLAPI_MODIFY_MODS mods list. After the plugins are called, the code applies the mods to the after entry, slapi_schema_expand_objectclasses, slapi_entry_schema_check, slapi_mods_syntax_check, slapi_entry_rdn_values_present, etc.

We should save the "after" entry in the original_entry. That way the betxnpreop code can further modify the entry modified by the bepreop plugins. The problem then comes in with applying the mods to this entry. Since we save the original mods, we would have to ''reapply all of the mods that we already applied'', in addition to any mods made in the betxnpreop. So instead, we just assume that the betxnpreop should '''not''' alter any of the original mods in any way, and instead should just append/extend the mods list. Then we can just apply any additional mods added by the betxnpreop (including mods that could ''undo'' any previous mods in the list). So a betxnpreop that wants to alter the mods list can do one of the following:

1) alter the mods list instead in the bepreop
2) extend the mods list with ''undo'' ops e.g. add a mod/delete to counter an earlier mod/add.

Your patches look good to me.
I think this is safe, but if you could give me some more details regarding the ec_in_cache case in the patch (0002-Ticket-387-managed-entry-sometimes-doesn-t-delete-th.2.patch), it makes me feel better. :) My question is when an error occurs AND ec is in cache, I don't see it's removed by CACHE_REMOVE, but it's just returned. Could it leave ec ("after" entry with mods applied) in the entry cache when the mod operation fails?

Replying to [comment:10 nhosoi]:

Your patches look good to me.

Thanks!

I think this is safe, but if you could give me some more details regarding the ec_in_cache case in the patch (0002-Ticket-387-managed-entry-sometimes-doesn-t-delete-th.2.patch), it makes me feel better. :)

Sure, no problem.

My question is when an error occurs AND ec is in cache, I don't see it's removed by CACHE_REMOVE, but it's just returned. Could it leave ec ("after" entry with mods applied) in the entry cache when the mod operation fails?

Yes, I think you are correct. I do need to cache_remove the ec entry in that case. Another patch coming up.

clean up cache handling in ldbm_back_modify - plus cache removal
0001-Ticket-387-managed-entry-sometimes-doesn-t-delete-th.patch

preserve the bepre "after" entry and restore it after txn failures
0002-Ticket-387-managed-entry-sometimes-doesn-t-delete-th.3.patch

master:
commit changeset:6c17ec5/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Jun 18 12:21:50 2012 -0600

commit changeset:c49496a/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Jun 13 21:03:37 2012 -0600

1.2.11:
commit changeset:8e67aac/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Jun 18 12:21:50 2012 -0600

commit changeset:f281882/389-ds-base
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Jun 13 21:03:37 2012 -0600

Added initial screened field value.

Metadata Update from @nkinder:
- Issue assigned to rmeggins
- Issue set to the milestone: 1.2.11.5

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/387

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata