Learn more about these different git repos.
Other Git URLs
https://bugzilla.redhat.com/show_bug.cgi?id=908759 (Fedora)
Description of problem: Ran ipa-client-install to join an IPA domain. 10 minutes later sssd crash notification came up. Version-Release number of selected component: sssd-1.9.3-1.fc18 Additional info: backtrace_rating: 4 cmdline: /usr/libexec/sssd/sssd_be --domain ipa.thewalter.lan --debug-to-files crash_function: talloc_abort executable: /usr/libexec/sssd/sssd_be kernel: 3.6.6-3.fc18.x86_64 remote_result: NOTFOUND uid: 0 var_log_messages: Feb 7 13:31:37 stef-rawhide-thewalter-lan abrt[9689]: Saved core dump of pid 8282 (/usr/libexec/sssd/sssd_be) to /var/spool/abrt/ccpp-2013-02-07-13:31:36-8282 (18923520 bytes) Truncated backtrace: Thread no. 1 (10 frames) #2 talloc_abort at ../talloc.c:317 #3 talloc_abort_access_after_free at ../talloc.c:336 #4 talloc_chunk_from_ptr at ../talloc.c:357 #6 talloc_get_name at ../talloc.c:1153 #7 talloc_check_name at ../talloc.c:1172 #8 ipa_dyndns_child_handler at src/providers/ipa/ipa_dyndns.c:1173 #9 child_invoke_callback at src/util/child_common.c:578 #10 tevent_common_loop_immediate at ../tevent_immediate.c:135 #11 std_event_loop_once at ../tevent_standard.c:556 #12 _tevent_loop_once at ../tevent.c:507
This might shed some light:
#2 0x00007fe2a544a2e6 in talloc_abort (reason=0x7fe2a5450718 "Bad talloc magic value - access after free") at ../talloc.c:317 No locals.
A use after free should be visible in valgrind during normal operation. Please also investigate in the corefile (based on the tevent_req return value perhaps) if the event completed successfully or after a timeout perhaps.
blockedby: => blocking: => coverity: => design: => design_review: => 0 feature_milestone: => fedora_test_page: => selected: => testsupdated: => 0
Putting to 1.9.5 for more investigation. We don't have logs so we should try to find the issue in code.
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.9.5
The problem here is that fork_nsupdate_send request was finished before nsupdate exited. Thus when SIGCHLD is received and ipa_dyndns_child_handler() tries to retrieve private date as struct tevent_req, it tries to access a request that was already freed.
There are only two scenarios when this can happen: 1. We reach IPA_DYNDNS_TIMEOUT (15 seconds) which calls tevent_req_error(req, ETIMEDOUT). 2. We fail to write date to pipe, then in ipa_dyndns_stdin_done() we get ret != EOK from write_pipe_recv() and we call tevent_req_error(req, ret).
=> callback is called, request is freed, but SIGCHLD handler still awaits the signal. When the handler is fired, we access already freed data which causes sssd_be to crash.
Possible solutions: 1. Do not call tevent_req_error() and tevent_req_done() outside SIGCHLD handler. However, this would make the timeout useless. 2. Provide a way to remove SIGCHLD handler and remove the handler before we mark the request as finished.
owner: somebody => lslebodn
Will be fixed along with the AD dyndns enhancement.
milestone: SSSD 1.9.5 => SSSD 1.10 beta owner: lslebodn => jhrozek review: => 0
patch: 0 => 1
This access-after-free was fixed as a byproduct of 9cb46bc
resolution: => fixed status: new => closed
Metadata Update from @jhrozek: - Issue assigned to jhrozek - Issue set to the milestone: SSSD 1.10 beta
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/2844
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.