#4290 [easyfix] ansible check-host playbook
Closed: Fixed None Opened 9 years ago by kevin.

I'd like to have a 'check-host' playbook (or we could call it something nicer, I don't care much). It would need to work for both Fedora and RHEL hosts. This playbook would do the following:

  • Operate on a list of target hosts/groups passed in (many of our other playbooks do this already).

We may want to make it have two 'modes'. A 'check' by default that doesn't change anything and a 'fix' that tries to fix some of the issues noted.

  • Check what services are set to start on boot. ie, if httpd is enabled in systemd or chkconfig.
  • Check the status of those services and if it's down/not running restart it.
  • Check for pending updates on the hosts and list them.
  • Run needs-restart and list services that need restarting.
  • Run 'rpm -Va' and list any output, possibly with a whitelist of things we know we modify.
  • Check that a firewall is loaded
  • Note if selinux is permissive or enforcing and if that matches the setting on boot.

This may be a nice ticket for multiple apprentices to collaborate on.


Alternatively, if not a playbook, it could be a python script that uses ansible's machinery. It could be modelled off of the selinux-info script we have: http://infrastructure.fedoraproject.org/cgit/ansible.git/tree/scripts/selinux-info

Is this still being worked on, or can I takeover and give it a shot?

I am too busy, currently, so you can takeover.

Is anyone working on this, If not, can i take it over ?

No one I know of. Feel free to start in on it. ;)

Can I work on this fix ?

I just added the playbook check-host.yml​. For checking you can use tag "check", for fixing you can use tag "fix". Example:
ansible-playbook check-host.yml -i hosts --tags="check"

There are other tags, like "iptables" and "selinux", that would be used to run only those tasks, or avoid running those tags (specially "verify" tasks, that take much time to end).

Please somebody with privileges to run ansible playbooks on fedora infrastructure servers, run this playbook and indicate problems or suggestions to improve it.

ramesh1909, if you don't mind, i would like to own this ticket.

I finally got to looking at this. Very sorry for the long delay. ;(

Overall it's a great start. :) Some comments:

  • I just want it to check things and report, not make any changes. We should remove the plays that restart/try and fix things. Perhaps they could be in another playbook, but there's likely lots of reasons why something might need fixing and it might be in different places.

  • Some of the tasks, for instance the ones calling systemctl should use a 'when: ' clause to only run on systems that use systemd (rhel7 or fedora). See other playbooks we have that already use those conditionals.

Could you re-arrange things based on that?

Hi.

Sorry too for long delay. I has been travelling and very busy at work.

On the description it was stated that it could be good to have two modes of operation: check and fix. The two modes were added to the playbook.

I am going to modify it to be as Kevin suggests.

If no one is working on this, I'd like to

This version outputs: host:updateCount,updatePackageNames for each host on a single line. I've tested it on a group of two machines
needs-updates.diff.3

Starting to look pretty nice. ;)

Can you attach the entire current version of the playbook (just a bit handier than downloading all the diffs).

Replying to [comment:18 kevin]:

Starting to look pretty nice. ;)

Can you attach the entire current version of the playbook (just a bit handier than downloading all the diffs).

there are two items for this fix: the playbook (doing the checking) and a modified version of the needs-updates script found under /srv/web/infra/ansible/scripts
Attaching both as:
chk-host.yml (playbook:new)
needs-updates (modified version of existing script)

Kindly, check and provide feedback.

What is the needs-updates change for? To allow listing out all the pending updates?

A few more minor comments:

  • No need for sudo if user is root.

  • On line 29/30 I assume that should be inactive instead of active again?

  • The rpm-va-file may need to use mktemp or the like so it doesn't just conflict in multiple runs on 2 runs at once. May also want to remove this file at the end of the run.

  • In line 172, we should probibly drop the -t filter and just show all tables.

  • For some of these things like selinux state you can just use the ansible facts that are gathered. Do a 'ansible -m setup host' and you can see all the things it gathers. selinux is in ansible_selinux for example.

Thanks a lot for working on this... it's coming along nicely.

Replying to [comment:20 kevin]:

What is the needs-updates change for? To allow listing out all the pending updates?

yes

A few more minor comments:

  • No need for sudo if user is root.

sudo removed

  • On line 29/30 I assume that should be inactive instead of active again?

yup, well spotted

  • The rpm-va-file may need to use mktemp or the like so it doesn't just conflict in multiple runs on 2 runs at once. May also want to remove this file at the end of the run.

not quite sure I understood the case you mentioned. the file get over-written between runs on the same host, and not clear why/how run same tasks at the same time, my understanding is a single instance of the task will be active on the particular host in any run, but I might be wrong. so, check the new task.

  • In line 172, we should probibly drop the -t filter and just show all tables.

done

  • For some of these things like selinux state you can just use the ansible facts that are gathered. Do a 'ansible -m setup host' and you can see all the things it gathers. selinux is in ansible_selinux for example.

Did not know that, but when attempting the command on a freshly installed RHEL 6.6 box, ansible returns: ansible_selinux: false. So, I guess the point of ansible fact fades away here, right?

Thanks a lot for working on this... it's coming along nicely.

Thanks for the kind words.

Kindly, check the updated play and provide feedback.

the last attachment (replacement) addresses all kevin's comments and changing the selinux tasks to ansible facts instead of shell, since the latter is hacky; according to kevin on irc:).

kindly, test and update with feedback.

Cool.

  • Yeah, there's no need to load the various vars files here, we don't use them anywhere.

  • On the hosts: all, perhaps we could make it take a -e 'target=hostname' like we do for other scripts. See ansible/scripts/ dir for that.

  • The rpm -Va output is a bit of a jumble. I wonder if we could look at just making this playbook write to a file and display that at the end? we might be able to make it nicer format wise doing that. Just a thought.

  • Finally, is there anything else you can think of we should check? what are the sorts of things you might want to look at on a host to tell if it's operating normally?

Replying to [comment:23 kevin]:

Cool.

  • Yeah, there's no need to load the various vars files here, we don't use them anywhere.

removed

  • On the hosts: all, perhaps we could make it take a -e 'target=hostname' like we do for other scripts. See ansible/scripts/ dir for that.

done

  • The rpm -Va output is a bit of a jumble. I wonder if we could look at just making this playbook write to a file and display that at the end? we might be able to make it nicer format wise doing that. Just a thought.

You're right, didn't like the iterator my self.
Hope this version (replaced the previous) is more clear to read

  • Finally, is there anything else you can think of we should check? what are the sorts of things you might want to look at on a host to tell if it's operating normally?

One thing I can think of is: a server "signature" saying "This IS a Fedora-Infra Server".
I haven't been around long enough, at least on the operational side, to figure what it might be, but at the very least, I see that fedmsg is a core component to all infra-servers, and having it present and operational somehow reflects this signature, but I might be wrong.

Kindly, test and provide feedback.

Yeah, that looks pretty ok now for output.

Well, if we are connecting to the host via ansible/ssh then we know that the ssh host key is right and that it has our correct key. ;)

So, I think we could commit this as we have it now, but I wonder about the idea of outputting to a file. That would allow is to run this at a point in time and then save that file, then run it again say a week later and diff the two. There should be ideally no difference (except timestamps). Thoughts?

Replying to [comment:25 kevin]:

Yeah, that looks pretty ok now for output.

Well, if we are connecting to the host via ansible/ssh then we know that the ssh host key is right and that it has our correct key. ;)

makes sense

So, I think we could commit this as we have it now, but I wonder about the idea of outputting to a file. That would allow is to run this at a point in time and then save that file, then run it again say a week later and diff the two. There should be ideally no difference (except timestamps). Thoughts?

attaching my attempt at the persisting through the file approach.
the file is: chk-host-with-diff.yml

kindly, test and feedback

Looks pretty good, but instead of a prev dir, I would use the timestamp. That way we can keep around as many as we like and compare each to the latest.

The diff could just look for the last one (whatever it's timestamp) and diff against that.

Aside from that this looks great now. ;)

Replying to [comment:27 kevin]:

Looks pretty good, but instead of a prev dir, I would use the timestamp. That way we can keep around as many as we like and compare each to the latest.

The diff could just look for the last one (whatever it's timestamp) and diff against that.

Aside from that this looks great now. ;)

to clarify the first attempt: the idea is to establish a "happy with" state, against which the diffs will show a deviation from. if there is any difference, then, someone (one who gets the email with diffs) must decide if we're happy with it or not. if not, things should go back (on the next diff) to happy, as someone must attended the diff and resolved it. otherwise, scrap the original happy state and establish a new one (manually remove pre dir, and re-run playbook).

with timestamps and diffing against the last one, one risks the diff passing unnoticed.

I might missed something in the usage scenario you have in mind, so I'm attaching a timestamp version as well.

kindly, test and feedback.
thanks

Sure. My thought was that we could run it periodically and keep those old runs, then we could see exactly what timeframe it changed in.

If you just wanted to run it one off you can go look at the output and see if anything looks like it needs fixing.

I'm happy with this as a first cut and we can improve it over time as we go along. ;)

I've commited it to the repo. Thanks for all your work on this!

Login to comment on this ticket.

Metadata