#1326 monitor kojira stalled process
Closed: Fixed None Opened 15 years ago by jkeating.

Kojira has gotten into a stalled state a few times recently, where the log file hasn't been updated in hours. We should set a monitor on the kojira log file /var/log/kojira/kojira.log and alert if the log file has grown stale (an hour without update seems reasonable, double check with koji devs)


Well, this won't work, as the kojira process could be running on either koji01 or koji02. ;)

It's controlled by including the kojiactivehub class in the koji01 or koji02 puppet files.

So, we need a way to only monitor it on the machine that has the kojiactivehub include (right now thats koji01, but it could be koji02 anytime we change it).

How about this:

We make a custom plugin that runs on both koji01 and koji02 (from nrpe).

It does:

  • Check for kojira process running.
    if it's not, exit 0
    if it is:
  • Check log file and see if it's stalled

Thoughts?

Great idea. I can write this script if we decide implement this type of check.

I created this code:

system("ps -C kojira > /dev/null");

if ($? != 0) {
print "FILE_AGE OK: Kojira is not running here\n";
exit $ERRORS{'OK'};
}

What you think about put this in /usr/lib/nagios/plugins/check_file_age file? Before all checks.

Well, thats part of it... but we also need to check the log file to see if it's stalled.
We may have to ask Dennis what that might look like.

Also, I wouldn't put it in check_file_age... we should make it it's own script.

In meeting with Kevin and Dennis, we decided close this ticket, because is not clear what we are check for.

Login to comment on this ticket.

Metadata