wiki:KvmMigration
Last modified 3 years ago Last modified on 08/03/11 15:23:49

RGManager supports migration of Xen and KVM guests at this time. In order to get KVM to migrate easily, there are a couple of steps one must perform.

Requirements

  • Each VM name must be unique cluster wide, including local-only / non-cluster VMs. Libvirtd only enforces unique names on a per-host basis. If you clone a VM by hand, you must change the name in the clone's configuration file.
  • UUIDs must be unique. If you clone a VM by hand, you must change the UUID in the clone's configuration file.
  • A shared place for VM guest images.
  • SSH & proper configuration

SSH configuration

RGManager uses qemu+ssh migration as the root user to migrate QEMU/KVM virtual machines from one host to another. SSH prevents the contents of the virtual machine from being read while in-flight, but careful configuration off SSH is required in order to allow migration in a reasonably sane manner by the cluster software.

Generally speaking, the prerequisite is that 'root' must be able to ssh to any other host in the cluster without having to enter a password or passphrase.

Generate ssh keys for root

The first step is to generate ssh keys for the 'root' user. These keys must have no passphrase, and should be different than the standard ssh id_rsa (or id_dsa). For these examples, we will use id_rsa_cluster for the private key and id_rsa_cluster.pub for the public keys. You can tell ssh-keygen to use a different file with the -f parameter:

[root@marathon-01 ~]# ssh-keygen -f /root/.ssh/id_rsa_cluster
Generating public/private rsa key pair.
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa_cluster.
Your public key has been saved in /root/.ssh/id_rsa_cluster.pub.
The key fingerprint is:
8c:c1:3e:ea:8a:70:a1:62:a1:31:e8:26:a5:22:40:31 root@marathon-01

This step must be performed on each cluster node.

Edit root's .ssh/config

Create special entries for each cluster node in .ssh/config for the root user. They should look something like:

Host molly.foo.com
        StrictHostKeyChecking no
        IdentityFile /root/.ssh/id_rsa_cluster
Host frederick.foo.com
        StrictHostKeyChecking no
        IdentityFile /root/.ssh/id_rsa_cluster

If you have names like marathon-01 and marathon-02, your configuration file can be simpler:

Host marathon-*
        StrictHostKeyChecking no
        IdentityFile /root/.ssh/id_rsa_cluster

Note: If you are using a migration mapping for virtual machines in order to control what NIC is used for migration, you should be using the hostnames or IP addresses on the migration network, not the normal hostnames.

Note: The entries in .ssh/config must match hostnames or migration maps in cluster.conf. For example, if you use the FQDN in cluster.conf and the short hostname in .ssh/config, the match will fail since ssh does not perform hostname lookups during pattern matching. This will cause us to have to enter a password, which is exactly what we do not want. You may add another Host block for each FQDN or hostname set you so choose.

This configuration addition must be done on each cluster node. If you created a new .ssh/config file, you may simply copy it over to the other hosts.

Distribute the public keys

Now, we need to create an authorized_keys2 file which includes the public keys we just generated from all of the cluster nodes. The first key is probably on the host we're logged in to, so we can just start with that.

[root@marathon-01 ~]# cat .ssh/id_rsa_cluster.pub >> authorized_keys2_cluster

For the other hosts, we will have to use scp to copy the file locally, then append it to our authorized_keys2_cluster file.

[root@marathon-01 ~]# scp root@marathon-02:.ssh/id_rsa_cluster.pub .
root@marathon-02's password:
id_rsa_cluster.pub                                         100%  398     0.4KB/s   00:00
[root@marathon-01 ~]# cat id_rsa_cluster.pub >> authorized_keys2_cluster
[root@marathon-01 ~]# rm -f id_rsa_cluster.pub
[root@marathon-01 ~]# scp root@marathon-03:.ssh/id_rsa_cluster.pub .
root@marathon-03's password:
id_rsa_cluster.pub                                         100%  398     0.4KB/s   00:00
[root@marathon-01 ~]# cat id_rsa_cluster.pub >> authorized_keys2_cluster
[root@marathon-01 ~]# rm -f id_rsa_cluster.pub
...

Once this is complete, you need to copy the file we just created to all of the cluster nodes and append it to root's authorized_keys2 file. Again, the first host is easy if you're already logged in to it.

[root@marathon-01 ~]# cat authorized_keys2_cluster >> .ssh/authorized_keys2

Copy the file to each cluster node and append it to each host's authorized_keys2 file:

[root@marathon-01 ~]# scp authorized_keys2_cluster root@marathon-02:
root@marathon-02's password:
authorized_keys2_cluster                                   100% 1990     1.9KB/s   00:00
[root@marathon-01 ~]# ssh marathon-02 "cat authorized_keys2_cluster >> .ssh/authorized_keys2"
root@marathon-02's password:
...

Test 'ssh' from one host to another as root

You should be able to run ssh from any cluster node to any other cluster node if you have configured this correctly. This is the basic prerequisite for getting KVM migration working with Linux-cluster.

[root@marathon-01 ~]# ssh marathon-02
Last login: Tue Nov 10 11:10:26 2009 from marathon-01.foo.com
[root@marathon-02 ~]#

Cluster Configuration

Generally speaking, configuration of KVM guests is the same as Xen guests, except the description files reside in a different directory. Xen stores the configuration files /etc/xen, while libvirt looks for XML descriptions contained in /etc/libvirt/qemu for KVM.

Virtual Machine Back-end Storage Requirements

Virtual machine ("guest", "domain") root images may be stored on:

  • cluster file systems such as GFS or GFS2
  • network file systems such as NFS
  • on CLVM-managed LVM LVs on shared disks/LUNs (all cluster VGs must be enabled on the target before initiating a migration)
  • on shared disks/LUNs directly
  • on shared disk/LUN partitions
  • on synchronously distributed RAID-1 targets like DRBD in multi-master mode

If it is not in the above list, DO NOT USE IT, or ASK ON #linux-cluster.

Sharing QEMU configurations

Virtual machine description files must be shared between all cluster nodes.

STABLE3 & RHEL55 branches

The current RHEL55 & STABLE3 branches contain enabling patches which allow storage of the guest description XML files on cluster or network file systems and use libvirt's transient domain feature to define and start guests on the fly.

Example vm using xmlfile:

  <vm name="mydomain" xmlfile="/mounts/gfs-guests/mydomain.xml" />

Example vm config using path (pathspec, like $PATH in a Linux shell):

  <vm name="mydomain" path="/mounts/gfs-guests:/guests" />

RHEL54 branch (optionally, STABLE3 & RHEL55 branches)

If you are not running cluster-3.0.0, or later, you will not have either of these patches. In this case, your virtual machine configuration files will all need to be stored in /etc/libvirt/qemu. Copy the .xml files for your guests to each cluster node.

[root@marathon-01 ~]# scp /etc/libvirt/qemu/*xml root@marathon-02:/etc/libvirt/qemu
qa-virt-01.xml                                             100%  914     0.9KB/s   00:00
qa-virt-02.xml                                             100%  914     0.9KB/s   00:00
qa-virt-03.xml                                             100%  914     0.9KB/s   00:00

If you copy a VM machine description from one host to another, you must restart libvirtd on that host. Simply ssh to the host and run 'service libvirtd restart'. This will cause libvirtd to recognize the new virtual machine configurations.

Note: It is recommended that you freeze running VMs in place using clusvcadm -Z prior to restarting libvirtd.

Known issues

First migration to a host fails

After all that key distribution, even with StrictHostKeyChecking no, virsh will report an error causing rgmanager to think that migration failed, even though the migration has actually succeeded.

The logs look like this:

Feb 26 13:49:01 marathon-01 clurgmgrd: [7000]: <err> Migrate virt-01 to marathon-04 failed:
Feb 26 13:49:01 marathon-01 clurgmgrd: [7000]: <err> Warning: Permanently added 'marathon-04,10.1.2.3' (RSA) to the list of known hosts.^M
Feb 26 13:49:01 marathon-01 clurgmgrd[7000]: <notice> migrate on vm "virt-01" returned 1 (generic error)
Feb 26 13:49:01 marathon-01 clurgmgrd[7000]: <err> Migration of vm:virt-01 to marathon-04 failed; return code 1

This appears to be a bug in virsh (it shouldn't report an error in this case), but it can be worked around by simply:

  • From node-01, ssh to node-01...node-NN. This will generate a file called .ssh/known_hosts
      # for i in `grep -r "clusternode.*name=" /etc/clu*/cluster.conf|awk -F 'name="' '{print $2}'|cut -d'"' -f1`
        do
             ssh -o StrictHostKeyChecking=no "$i" 'echo hi' >/dev/null
        done
    
  • Remember to use the migration mapping names or cluster node names when doing this.
  • Distribute this file to each cluster member using scp
      # scp .ssh/known_hosts root@marathon-02:.ssh
      # scp .ssh/known_hosts root@marathon-03:.ssh
      # scp .ssh/known_hosts root@marathon-NN:.ssh
    

If your virtual machine migration fails due to the above problem, it is very easy to get the VM in to the correct state due to the way rgmanager handles virtual machines. Simply: 1. log in to the source host 1. disable the virtual machine clusvcadm -d vm:my_vm 1. enable the virtual machine clusvcadm -e vm:my_vm RGManager will search the cluster and simply mark the virtual machine as started on the current owner.

SSH won't work after following the above instructions

If you already had public keys in each hosts' .ssh/authorized_keys2 file corresponding to each cluster member, you must remove them.

SELinux can cause trouble on recent versions of selinux-policy. A known-working context for ~root/.ssh files is as follows:

# ls -lZ .
-rw-------. root root unconfined_u:object_r:ssh_home_t:s0 authorized_keys
-rw-------. root root unconfined_u:object_r:ssh_home_t:s0 id_rsa
-rw-------. root root unconfined_u:object_r:ssh_home_t:s0 id_rsa_cluster
-rw-r--r--. root root unconfined_u:object_r:ssh_home_t:s0 id_rsa_cluster.pub
-rw-r--r--. root root unconfined_u:object_r:ssh_home_t:s0 id_rsa.pub
-rw-r--r--. root root unconfined_u:object_r:ssh_home_t:s0 known_hosts