wiki:VMware_FencingConfig
Last modified 3 years ago Last modified on 05/22/11 02:45:31

VMware fence agent & Red Hat Cluster Suite

Last updated 04-May-2010

Updates history:

  • 2010-05-04: Fence_vmware in RHEL 5.5 is currently fence_vmware_ng (same syntax but named as fence_vmware) so fence_vmware_ng is no longer needed.
  • 2010-15-01: Fence_vmware from RHEL 5.3, 5.4 doesn't work with ESX 4.0. Working agent included
  • 2009-10-07: Tested on ESX 4.0.0 and vCenter 4.0.0
  • 2009-01-19: Fixed vmware-ng. Old one mail fail, if somebody turn on VM (for example VMware cluster itself) before agent itself. Before, this leads to error and fence agent fail. Now only warning is displayed and fencing is considered sucessfull.
  • 2009-01-15: New vmware-ng. There is speed improvement with many VMs registered in VMware for status operation => whole fencing. Default type esx is now really default.

We have 2 agents for VMware virtual machines fencing.

  • First fence_vmware is in RHEL5.3 and 5.4/STABLE2 branch. It's designed and tested against VMware ESX server (not ESXi!) and Server 1.x. This one is replaced by new fence_vmware (currently named fence_vmware) in RHEL 5.5/STABLE3
  • Second in master/STALBLE3 branch. It's designed and tested against VMware ESX/ESXi/VC and Server 2.x, 1.x. This is what replaced old fence_vmware. (in master/STABLE3 is actually named fence_vmware)

Fence_vmware

It's union of two older agents. fence_vmware_vix and fence_vmware_vi.

VI (in following text, VI API is not only original VI Perl API with last version 1.6, but VMware vSphere SDK for Perl too) is VMware API for controlling their main business class of VMware products (ESX/VC). This API is fully cluster aware (VMware cluster). So this agent is able to do fencing guests machines physically running on ESX but managed by VC and able to work without any reconfiguration in case of migrating guest to another ESX.

VIX is newer API, working on VMware "low-end" products (Server 2.x, 1.x), but there is some support for ESX/ESXi 3.5 update 2 and VC 2.5 update 2. This API is NOT cluster aware, and recommended only for Server 2.x and 1.x. But if you are using only one ESX/ESXi or doesn't have VMware Cluster and never use migration, you can use this API too.

If you are using RHEL 5.5/RHEL 6 just install fence-agents package and you are ready to use fence_vmware. For distributions with older fence_agetns, you can get this agent from GIT (RHEL 5.5/STABLE3/master) repository and use (please make sure to use current library (fencing.py) too).

Pre-req

VI Perl API or/and VIX API installed on every node in cluster. This is big difference against older agent, where you don't need install anything, but new agent has little less painful configuration (and many bonuses)

Running

If you run fence_vmware with -h you will see something like this:

Options:
   -o <action>    Action: status, reboot (default), off or on
   -a <ip>        IP address or hostname of fencing device
   -l <name>      Login name
   -p <password>  Login password or passphrase
   -S <script>    Script to run to retrieve password
   -n <id>        Physical plug number on device or name of virtual machine
   -e             Command to execute
   -d             Type of VMware to connect
   -x             Use ssh connection
   -s             VMWare datacenter filter
   -q             Quiet mode
   -v             Verbose mode
   -D <debugfile> Debugging to output file
   -V             Output version information and exit
   -h             Display this help and exit

Now parameters one by one, little more deeper (format is short option - XML argument name - description).

  • o - action - This is same as with any other agent.
  • a - ipaddr - Hostname/IP address of VMware ESX/ESXi/VC or Server 2/1.x. You can enter tcp port in this option in usually way (hostname:port). Port is not needed for standard ESX/ESXi/VC installations, but Server 2.x runs management console on other than usual, so this is why you have this possibility.
  • l - login - This is login name for management console.
  • p - passwd - This is password for management console.
  • S - passwd_script - Script which retrieve password
  • n - port - Virtual machine name. This is in case of VI API guest name, you are able to see in VI Client (for example node1). For VIX Api, this name is in form [datacenter] path/name.vmx.
  • d - vmware_type - Type of VMware to connect. This parameter distinguish, what API you will use (VI or VIX). Possible values are esx, server2 and server1. Default is esx.
    • esx - VI API is used. Only one cluster aware, able to work with ESX/ESXi/VC
    • server2 - VIX API is used. Works for Server 2.x, ESX/ESXi 3.5 update 2 and VC 2.5 update 2, but not cluster aware!!!
    • server1 - VIX API. Works only for Server 1.x
  • s - vmware_datacenter - Used for filter available guests. Default is show all guests in all datacenters. With this, you will be able to fence same-named guests, if they are in different datacenters (so two node1 isn't any problem). If you never have same-named guests, this option is useless for you.
  • e - exec - Executable to operate. In every mode, this agent works by forking another helping program, which really does useful work. In case of VI, it's Perl vmware_fence_helper. In case of VIX, it's vmrun from VIX API package. If you have commands on nonstandard locations, you can use this option, to specify, where command lives.

Example usage of agent in CLI mode: You have VC (named vccenter) with node1 which you want to fence. You will use Administrator account with password pass.

fence_vmware -a vccenter -l Administrator -p pass -n 'node1'

If everything works, you can modify your cluster.conf as follows (in this example, you have two nodes, guest1 and guest2):

      ...
      <clusternodes>
              <clusternode name="guest1" nodeid="1" votes="1">
                      <fence>
                              <method name="1">
                                      <device name="vmware1"/>
                              </method>
                      </fence>
              </clusternode>
              <clusternode name="guest2" nodeid="2" votes="1">
                      <fence>
                              <method name="1">
                                      <device name="vmware2"/>
                              </method>
                      </fence>
              </clusternode>
      </clusternodes>
      <fencedevices>
              <fencedevice agent="fence_vmware" ipaddr="vccenter" login="Administrator" name="vmware1" passwd="pass" port="guest1"/>
              <fencedevice agent="fence_vmware" ipaddr="vccenter" login="Administrator" name="vmware2" passwd="pass" port="guest2"/>
      </fencedevices>
      ...

You can test setup with fence_node fqdn command.

Changing configuration from old fence_vmware to new fence_vmware

  • Install needed VI Perl API on every node
  • remove login, and passwd parameter
  • change vmlogin to login and vmpasswd to passwd
  • change port value to shorter name (basically remove /full/path/ and .vmx)
  • If you have vmipaddr, delete ipaddr and change vmippadr to ipaddr.

Problems

One of biggest problem of ESX 3.5/ESXi 3.5/VC 2.5 behaves very badly in case you have many virtual machines registered, because get list of VMs takes just too long. This will make fencing of larger datacenter unusable, because in case of 100+ registered VMs, whole fencing can take few minutes. This appears to be fixed in ESX 4.0.0/vCenter 4.0.0 (200+ registered VMs, fencing of one takes ~17 sec).. In case you don't want to upgrade, you can use separate datacenter for each cluster.

Old Fence_vmware

This is older fence agent, which should work on every ESX server, which has allowed ssh connection and has vmware-cmd command on it. Basic idea of this agent is to connect via ssh to ESX server, there run vmware-cmd which is able to run/shutdown virtual machine.

In ESX 4.0, vmware-cmd changed little, so it will not work anymore. You can solve this, by deleting lines 32 and 33 ('if options.has_key("-A"):' and 'cmd_line+=" -v"') or download fence_vmware.gz, unpack it and replace original /sbin/fence_vmware.

Biggest problem of this solution is many parameters, which must be entered.

If you run fence_vmware with -h you will see something like this:

   -o <action>    Action: status, reboot (default), off or on
   -a <ip>        IP address or hostname of fencing device
   -l <name>      Login name
   -p <password>  Login password or passphrase
   -S <script>    Script to run to retrieve password
   -x             Use ssh connection
   -k <filename>  Identity file (private key) for ssh
   -n <id>        Physical plug number on device or name of virtual machine
   -A <ip>        IP address or hostname of managed VMware ESX (default localhost)
   -L <name>      VMware ESX management login name
   -P <password>  VMware ESX management login password
   -B <script>    Script to run to retrieve VMware ESX management password
   -q             Quiet mode
   -v             Verbose mode
   -D <debugfile> Debugging to output file
   -V             Output version information and exit
   -h             Display this help and exit

Now parameters one by one, little more deeper (format is short option - XML argument name - description).

  • o - action - This is same as with any other agent.
  • a - ipaddr - Hostname/IP address of VMware ESX ssh
  • l - login - This is login name for ESX ssh
  • p - passwd - This is password for ESX ssh
  • S - passwd_script - Script which retrieve password
  • A - vmipaddr - VMware ESX hostname/ip adress. Here it starts to be more confusing. What's the biggest difference between -a and -A? -a is address of computer where you want ssh to. -A is address of VMware server to operate. This is mostly localhost (think, that after you ssh to your ESX, you wan't operate on that machine -> localhost).
  • L - vmlogin - VMware ESX user login. Difference between -l and -L is mostly same as -a and -A. -L is login to VMware operating server.
  • P - vmpasswd - VMWare esx user password.
  • B - vmpasswd_script - Script to retrieve -P. This script is runned on GUEST! machine, not on VMware ESX machine!
  • n - port - Virtual machine name. This is output of vmware-cmd -L, mostly something like /vmfs/volumes/48bfcbd1-4624461c-8250-0015c5f3ef0f/Rhel/Rhel.vmx.

I'm big fan of pictures, so example situation:

  
+---------------------------------------------------------------------------------------+
| +----------                                                                           |
| | guest1  | ssh to VMware ESX - can be, where guest1 run                              |
| | RHEL 5  |------------------+                                                        |
| +---------+                  |                                                        |
|                             \/                                                        |
| +----------      +--------SSH (22)---------------------------------+                  |
| | guest2  |      |        ------> run vmware-cmd with params off --|-> Kill guest1 VM | 
| | RHEL 5  |      |                                                 |                  |
| +---------+      |    dom0 - VMware management console             |                  |
|                  | (192.168.1.1) - Has user test with password test|                  |
|                  |               - Has vmware-cmd                  |                  |
|                  +-------------------------------------------------+                  |
|                                                                                       |
|            VMware ESX hypervisor                                                      |
+---------------------------------------------------------------------------------------+

As you can see, guest1 connect to VMware management console (with hostname/login/password (-a/-l/-p) for ssh) and there, vmware-cmd is runned (with hostname/login/password (-A/-L/-P for VMware).

So why we have 2 set's of parameters? Because:

  • On guest machine, you don't need to install anything. No vmware-cmd. You just connect to other machine, which has this command.
  • On dom0, ssh is not allowed for user root. So we can't use root login/password.
  • Mostly, owner of Virtual Machine is root, but again, ssh is not allowed for that user.

Recomended way, how to use this agent is:

  • Install ESX server and set root pasword (for example root)
  • Create normal (non-root) user on VMware dom0 (in this example, I will suspect user test with password test)
  • Create virtual machine (there is funny thing. You must use Windows, because web console is not able to create new virtual machine :) )
  • Install your cluster node

If everything done, test fencing via command line (on one of guests)

fence_vmware -a 192.168.1.1 -l test -p test -L root -P root -o status -n /vmfs/volumes/48bfcbd1-4624461c-8250-0015c5f3ef0f/Rhel/Rhel.vmx

You should get status of virtual machine named Rhel.

If everything works, you can modify your cluster.conf like:

      ...
      <clusternodes>
              <clusternode name="guest1" nodeid="1" votes="1">
                      <fence>
                              <method name="1">
                                      <device name="vmware1"/>
                              </method>
                      </fence>
              </clusternode>
              <clusternode name="guest2" nodeid="2" votes="1">
                      <fence>
                              <method name="1">
                                      <device name="vmware2"/>
                              </method>
                      </fence>
              </clusternode>
      </clusternodes>
      <fencedevices>
              <fencedevice agent="fence_vmware" ipaddr="192.168.1.1" login="test" name="vmware1" passwd="test" vmlogin="root" vmpasswd="root" port="PATH_TO_VMX"/>
              <fencedevice agent="fence_vmware" ipaddr="192.168.1.1" login="test" name="vmware2" passwd="test" vmlogin="root" vmpasswd="root" port="PATH_TO_VMX"/>
      </fencedevices>
      ...

Recomendation for every VMware

The vmware "client" machine should have VMware Tools installed. So I recommend to install vmware tools in all cluster machine. This improve speed of guest.


CategoryHowTo

Attachments