Skip to main content

Live migration with OpenStack on Ubuntu 14.04

In this post I going to configure the compute nodes to enable the instance live migration on kvm instances backed with CEPH. In my set-up cinder volumes and the nova instances ephimeral disks are backed with CEPH so all the compute nodes can see all the storage.

Assuming that cinder and nova is correctly integrated with CEPH we have to follow these steps to set up live migration:

In libvirt-bin service configuration file we have to enable -l flag to libvirt-bin service args so it listen through tcp socket.
/etc/default/libvirt-bin

# Defaults for libvirt-bin initscript (/etc/init.d/libvirt-bin)
# This is a POSIX shell fragment

# Start libvirtd to handle qemu/kvm:
start_libvirtd="yes"

# options passed to libvirtd, add "-l" to listen on tcp
libvirtd_opts="-d -l"

In libvirtd configuration, set the options needed to listen on tcp:
/etc/libvirt/libvirtd.conf

# Flag listening for secure TLS connections on the public TCP/IP port.
listen_tls = 0
# Listen for unencrypted TCP connections on the public TCP/IP port.
listen_tcp = 1
tcp_port = "16509"
# Override the default configuration which binds to all network
# interfaces. This can be a numeric IPv4/6 address, or hostname
listen_addr = "172.17.16.117"
# Authentication.
#
#  - none: do not perform auth checks. If you can connect to the
#          socket you are allowed. This is suitable if there are
#          restrictions on connecting to the socket (eg, UNIX
#          socket permissions), or if there is a lower layer in
#          the network providing auth (eg, TLS/x509 certilos resultadosficates)
auth_unix_ro = "none"
auth_unix_rw = "none"
auth_tcp = "none"

Because we are setting no auth for tcp connection you should take other actions for your production environment to ensure only certain servers are allowed to connect to this port, for example using iptables.

Configure qemu user and group with root.
/etc/libvirt/qemu.conf

# The user for QEMU processes run by the system instance. It can be
# specified as a user name or as a user id. The qemu driver will try to
# parse this value first as a name and then, if the name doesn't exist,
# as a user id.
user = "root"
# The group for QEMU processes run by the system instance. It can be
# specified in a similar way to user.
group = "root"
# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
dynamic_ownership = 0

Once the changes are made restart the libvirt-bin service:

$ sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 21411

Check if libvirt-bin is listening on tcp port 16509

$ sudo netstat -npta | grep 16509  
tcp        0      0 172.17.16.117:16509     0.0.0.0:*               LISTEN    21411/libvirtd  

Set the needed flags in libvirt for live migration:
/etc/nova/nova.conf

[libvirt]
[..]
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED
live_migration_uri=qemu+tcp://%s/system

Assuming that the compute nodes have different hardware you have to set up a common cpu model in nova.conf configuration file. You can set kvm64, the most compatible mode across Intel and AMD platforms or if you have intel cpus, like me, you can set SandyBridge. In any case, the mode you selected must be supported in all compute nodes.

/etc/nova/nova.conf

[libvirt]
[..]
type = qemu
cpu_mode=custom
cpu_model=kvm64
[libvirt]
[..]
type = qemu
cpu_mode=custom
cpu_model=SandyBridge

You can see all the cpu modes that kvm support with:

$ /usr/bin/qemu-system-x86_64 -cpu help
x86           qemu64  QEMU Virtual CPU version 2.0.0                  
x86           phenom  AMD Phenom(tm) 9550 Quad-Core Processor         
x86         core2duo  Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz 
x86            kvm64  Common KVM processor                            
x86           qemu32  QEMU Virtual CPU version 2.0.0                  
x86            kvm32  Common 32-bit KVM processor                     
x86          coreduo  Genuine Intel(R) CPU           T2600  @ 2.16GHz 
x86              486                                                  
x86          pentium                                                  
x86         pentium2                                                  
x86         pentium3                                                  
x86           athlon  QEMU Virtual CPU version 2.0.0                  
x86             n270  Intel(R) Atom(TM) CPU N270   @ 1.60GHz          
x86           Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)   
x86           Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)    
x86          Nehalem  Intel Core i7 9xx (Nehalem Class Core i7)       
x86         Westmere  Westmere E56xx/L56xx/X56xx (Nehalem-C)          
x86      SandyBridge  Intel Xeon E312xx (Sandy Bridge)                
x86          Haswell  Intel Core Processor (Haswell)                  
x86       Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)           
x86       Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)          
x86       Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)          
x86       Opteron_G4  AMD Opteron 62xx class CPU                      
x86       Opteron_G5  AMD Opteron 63xx class CPU                      
x86             host  KVM processor with all supported host features (only available in KVM mode)

After these changes, if you see a message like this:

$ sudo nova live-migration 6fba9cbe-66e2-484d-ba90-18ad519865ff host3
ERROR (BadRequest): Unacceptable CPU info: CPU doesn't have compatibility.

It could be caused by this bug #1082414. In Juno, as a work around, you can comment out the line number 5010 “self._compare_cpu(source_cpu_info)” in libvirt driver:
/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py

# Compare CPU
source_cpu_info = src_compute_info['cpu_info']
#self._compare_cpu(source_cpu_info)

In KILO this bug should be fixed, so no changes are needed in driver.py.

I’m not so sure that the following is a requirement for the live migration, but it definitively is to enable the migration process and the instance resize, because some commands are run through a ssh connection.

Enable ssh access between compute nodes with nova user. First edit each /etc/passwd file and enable shell access for your nova user:
/etc/passwd

nova:x:108:113::/var/lib/nova:/bin/sh

Put this ssh configuration file in your nova home directory to avoid checking host’s keys between the compute nodes.
/var/lib/nova/.ssh/config

Host *
    StrictHostKeyChecking no

For each compute node create a rsa key pair as the nova user:

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/nova/.ssh/id_rsa): 
Created directory '/var/lib/nova/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /var/lib/nova/.ssh/id_rsa.
Your public key has been saved in /var/lib/nova/.ssh/id_rsa.pub.
The key fingerprint is:
e1:97:a7:f5:10:71:bb:1f:9a:91:dd:c8:66:22:be:49 nova@host
The key's randomart image is:
+--[ RSA 2048]----+
|            . .  |
|             o . |
|        .   . .  |
|       . . . ooo.|
|        S + =o*o.|
|         o = *+..|
|          E  o. .|
|         . o     |
|          o      |
+-----------------+

Copy all the created public keys content to an authorized_keys file and share it in all the compute nodes for the nova user:
/var/lib/nova/.ssh/authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDl+XPbYlzlDm3F+5N2SCiZlCRL/wZ9WAD3xwC5uNeza7NbQwy9jL5t2jHQn+bLMHP27GJO5Afl0cx9aPMe+mUvXDf0kk1yhND/eqRauNjQ/NONhUT9VDMiQBL7F28xWD+d0XTSr/G1/ddYxt/ouoZF94nPXCLmzqY4JdwWCq2VV/ChJRAXqs0tzPpOxmAGWNm7+mOxL4SFiFRCHR4LxxveV5rf10EzrOJFOEewUQ51yTqn8tuIs59nPuVzwNezYVJ4iZM3gcdm+rnE/40I/sodePDhiuIVkcT0Zl1stGVxVJrpsUtzE8+YsZLe+aH/IlsHXMPdpCIbinyv0vmzIG1H nova@host1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUTvfP4RmRdRXIlWn72X+y+DKnwiDlz9iWqB+0zVhMmy3T4bYY4Okw5qXCZ6xOA2BLzsuY07QLNdFCHDs6FjPjEtT+A8U4w3x4aZDwS+jgl6eC3vpTU/rkEpCDF/KOvkvoP+U8zuKS4r1r5+UAoFAKvDCM8RGGwY6mC2+uEqv23at9OIrWrbkdHVlVnxhSYk4prg2PnePMFchs3Sh9yEaLw/3F2wGBJGjYbVkfAu87UbQy6mRqWepJx8qSP2XYvIuVKleYpHS41Vk3H/+L4tTR0ibYBD+eDR80IRN4qGE6vzdf7hJW1Gl0Ozx9fzSzO0u6f/8254PqrNxya0PMmCbb nova@host2
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDapvnExGGOKVx0XVqTPNWTwXR0kXLfzb2se1slb7oAL7clZShhUKDwFHOVRO16tV7k/VD3mEf0Z+VBmU2MyxXa5nOIwbBCIIy9E/01fXh9QcP5dn1Qs8GzsoNh4j3AHSDbmYgsaG0d+BrBxmF/HpU+qZvBOMudT8reXT++5VQFNMP5cXkd6b8gyeYlrRH2SAaa7kIy44z3ZqQHzmFA+TJwYSrMoawgpdDE75HWQMAgiECXFK2Nb71+gd9sHOttzNPGmSx6TmbkHAi1W9rGYSZ88n1+19tHbnyZi+Qn8HYvKmLMyQFhje71DMwzK3FzbSpZuTaMfiEslRS9skYD6OTd nova@host3
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrBNLab4QNjAIwGm7Ajc0CGHrtSlLnbV447vAdc/QWRoU+yiBlv4NxWq3aOogczuq6ar3hufXAnUX7ClMTon6f2Fcq/cv2D5V8YkXG7NtZQUKj0F6R27dEOUMPX64w2PGZen2QpcJNxLJXokbdTnDRc2odJ+0kw8rGKWDPioeLDjw5Qrb6EfddxWBJLbk3+gravyc2zHWMCzLUhRU4JMxBMutk3AXV2XBUflnOBoUMFixv8Mrm4wWQE3w29dZGL6wYtl2dAt9YENo9UIko/jVreuAc5gTIr4v1iywzaDivLT2HR2BjqTkABOd9cuWw6o7ZS0lTTPf8skGxAGNSOoQT nova@host4

Check if you are able to run the ls command in a remote host in your compute nodes:

$ ssh nova@host1 ls -l /etc/nova/nova.conf 
-rw-r----- 1 nova nova 3329 sep 21 11:17 /etc/nova/nova.conf

Now, you should be able to do an instance live migration between your compute nodes and the instance resize/migration should work too without problems.