This has been happening a while now. I am opening this bug to help track it and see if we can spot patterns or have debug ideas.
What I know of the problem:
From time to time the parent (root owned) httpd process dies on our app servers. The children processes are all still around, but unable to handle requests because they can no longer talk to wsgi.
Logs contain:
[Wed Oct 03 15:29:25 2012] [alert] Child 527 returned a Fatal error... Apache is exiting! Resource temporarily unavailable (thread.cpp:85)
You have to do a 'killall -9 httpd' (to clean out the old children) then 'service httpd restart' to return the app server to service. Note that due to our setup with haproxy, this causes no outages, the failing app server is simply dropped from rotation while it's failing.
Debugging ideas:
Check logs to see if there is any pattern of requests or errors before the crashes.
Disable services we don't need currently. This includes raffle as there are no pending raffles.
httpd-2.2.15-15.el6_2.1.i686
no. It seems to just randomly happen.
Is there any easy way to list them? :)
Replying to [comment:2 kevin]:
httpd -M
{{{ Loaded Modules: core_module (static) mpm_prefork_module (static) http_module (static) so_module (static) authn_default_module (shared) authz_default_module (shared) authz_host_module (shared) authz_user_module (shared) include_module (shared) log_config_module (shared) env_module (shared) expires_module (shared) headers_module (shared) setenvif_module (shared) mime_module (shared) status_module (shared) autoindex_module (shared) info_module (shared) dir_module (shared) alias_module (shared) rewrite_module (shared) proxy_module (shared) proxy_http_module (shared) cgi_module (shared) deflate_module (shared) php5_module (shared) ssl_module (shared) wsgi_module (shared) Syntax OK }}}
app01 just showed the issue.
There's a bodhi bugzilla traceback before the issue, but it's unclear if it's related or not.
testing something, please ignore.
remove blocked by test
This has not happened in a while, hopefully it's gone. ;)
Login to comment on this ticket.