I'd like to have a 'check-host' playbook (or we could call it something nicer, I don't care much). It would need to work for both Fedora and RHEL hosts. This playbook would do the following:
We may want to make it have two 'modes'. A 'check' by default that doesn't change anything and a 'fix' that tries to fix some of the issues noted.
This may be a nice ticket for multiple apprentices to collaborate on.
Alternatively, if not a playbook, it could be a python script that uses ansible's machinery. It could be modelled off of the selinux-info script we have: http://infrastructure.fedoraproject.org/cgit/ansible.git/tree/scripts/selinux-info
Is this still being worked on, or can I takeover and give it a shot?
I am too busy, currently, so you can takeover.
Is anyone working on this, If not, can i take it over ?
No one I know of. Feel free to start in on it. ;)
Can I work on this fix ?
Playbook check-host.yml
I just added the playbook check-host.yml​. For checking you can use tag "check", for fixing you can use tag "fix". Example: ansible-playbook check-host.yml -i hosts --tags="check"
There are other tags, like "iptables" and "selinux", that would be used to run only those tasks, or avoid running those tags (specially "verify" tasks, that take much time to end).
Please somebody with privileges to run ansible playbooks on fedora infrastructure servers, run this playbook and indicate problems or suggestions to improve it.
ramesh1909, if you don't mind, i would like to own this ticket.
I finally got to looking at this. Very sorry for the long delay. ;(
Overall it's a great start. :) Some comments:
I just want it to check things and report, not make any changes. We should remove the plays that restart/try and fix things. Perhaps they could be in another playbook, but there's likely lots of reasons why something might need fixing and it might be in different places.
Some of the tasks, for instance the ones calling systemctl should use a 'when: ' clause to only run on systems that use systemd (rhel7 or fedora). See other playbooks we have that already use those conditionals.
Could you re-arrange things based on that?
Hi.
Sorry too for long delay. I has been travelling and very busy at work.
On the description it was stated that it could be good to have two modes of operation: check and fix. The two modes were added to the playbook.
I am going to modify it to be as Kevin suggests.
If no one is working on this, I'd like to
attachment check-host.2.yml
attachment needs-updates.diff
attachment needs-updates.diff.2
This version outputs: host:updateCount,updatePackageNames for each host on a single line. I've tested it on a group of two machines needs-updates.diff.3
Starting to look pretty nice. ;)
Can you attach the entire current version of the playbook (just a bit handier than downloading all the diffs).
Replying to [comment:18 kevin]:
Starting to look pretty nice. ;) Can you attach the entire current version of the playbook (just a bit handier than downloading all the diffs).
there are two items for this fix: the playbook (doing the checking) and a modified version of the needs-updates script found under /srv/web/infra/ansible/scripts Attaching both as: chk-host.yml (playbook:new) needs-updates (modified version of existing script)
Kindly, check and provide feedback.
attachment chk-host.yml
attachment needs-updates
What is the needs-updates change for? To allow listing out all the pending updates?
A few more minor comments:
No need for sudo if user is root.
On line 29/30 I assume that should be inactive instead of active again?
The rpm-va-file may need to use mktemp or the like so it doesn't just conflict in multiple runs on 2 runs at once. May also want to remove this file at the end of the run.
In line 172, we should probibly drop the -t filter and just show all tables.
For some of these things like selinux state you can just use the ansible facts that are gathered. Do a 'ansible -m setup host' and you can see all the things it gathers. selinux is in ansible_selinux for example.
Thanks a lot for working on this... it's coming along nicely.
Replying to [comment:20 kevin]:
yes
A few more minor comments: No need for sudo if user is root.
sudo removed
yup, well spotted
not quite sure I understood the case you mentioned. the file get over-written between runs on the same host, and not clear why/how run same tasks at the same time, my understanding is a single instance of the task will be active on the particular host in any run, but I might be wrong. so, check the new task.
done
Did not know that, but when attempting the command on a freshly installed RHEL 6.6 box, ansible returns: ansible_selinux: false. So, I guess the point of ansible fact fades away here, right?
Thanks for the kind words.
Kindly, check the updated play and provide feedback.
the last attachment (replacement) addresses all kevin's comments and changing the selinux tasks to ansible facts instead of shell, since the latter is hacky; according to kevin on irc:).
kindly, test and update with feedback.
Cool.
Yeah, there's no need to load the various vars files here, we don't use them anywhere.
On the hosts: all, perhaps we could make it take a -e 'target=hostname' like we do for other scripts. See ansible/scripts/ dir for that.
The rpm -Va output is a bit of a jumble. I wonder if we could look at just making this playbook write to a file and display that at the end? we might be able to make it nicer format wise doing that. Just a thought.
Finally, is there anything else you can think of we should check? what are the sorts of things you might want to look at on a host to tell if it's operating normally?
attachment chk-host.yml.1
Replying to [comment:23 kevin]:
Cool. Yeah, there's no need to load the various vars files here, we don't use them anywhere.
removed
You're right, didn't like the iterator my self. Hope this version (replaced the previous) is more clear to read
One thing I can think of is: a server "signature" saying "This IS a Fedora-Infra Server". I haven't been around long enough, at least on the operational side, to figure what it might be, but at the very least, I see that fedmsg is a core component to all infra-servers, and having it present and operational somehow reflects this signature, but I might be wrong.
Kindly, test and provide feedback.
Yeah, that looks pretty ok now for output.
Well, if we are connecting to the host via ansible/ssh then we know that the ssh host key is right and that it has our correct key. ;)
So, I think we could commit this as we have it now, but I wonder about the idea of outputting to a file. That would allow is to run this at a point in time and then save that file, then run it again say a week later and diff the two. There should be ideally no difference (except timestamps). Thoughts?
attachment chk-host-and-diff.yml
Replying to [comment:25 kevin]:
Yeah, that looks pretty ok now for output. Well, if we are connecting to the host via ansible/ssh then we know that the ssh host key is right and that it has our correct key. ;)
makes sense
attaching my attempt at the persisting through the file approach. the file is: chk-host-with-diff.yml
kindly, test and feedback
Looks pretty good, but instead of a prev dir, I would use the timestamp. That way we can keep around as many as we like and compare each to the latest.
The diff could just look for the last one (whatever it's timestamp) and diff against that.
Aside from that this looks great now. ;)
Replying to [comment:27 kevin]:
Looks pretty good, but instead of a prev dir, I would use the timestamp. That way we can keep around as many as we like and compare each to the latest. The diff could just look for the last one (whatever it's timestamp) and diff against that. Aside from that this looks great now. ;)
to clarify the first attempt: the idea is to establish a "happy with" state, against which the diffs will show a deviation from. if there is any difference, then, someone (one who gets the email with diffs) must decide if we're happy with it or not. if not, things should go back (on the next diff) to happy, as someone must attended the diff and resolved it. otherwise, scrap the original happy state and establish a new one (manually remove pre dir, and re-run playbook).
with timestamps and diffing against the last one, one risks the diff passing unnoticed.
I might missed something in the usage scenario you have in mind, so I'm attaching a timestamp version as well.
kindly, test and feedback. thanks
attachment chk-host-and-diff-timestamp.yml
Sure. My thought was that we could run it periodically and keep those old runs, then we could see exactly what timeframe it changed in.
If you just wanted to run it one off you can go look at the output and see if anything looks like it needs fixing.
I'm happy with this as a first cut and we can improve it over time as we go along. ;)
I've commited it to the repo. Thanks for all your work on this!
Login to comment on this ticket.