Deploying Highly Available OpenStack

Arthur Berezin

September 2015

In this article by Arthur Berezin, the author of the book OpenStack Configuration Cookbook, we will cover the following topics:

  • Installing Pacemaker
  • Installing HAProxy
  • Configuring Galera cluster for MariaDB
  • Installing RabbitMQ with mirrored queues
  • Configuring highly available OpenStack services

(For more resources related to this topic, see here.)

Many organizations choose OpenStack for its distributed architecture and ability to deliver the Infrastructure as a Service (IaaS) platform for mission-critical applications. In such environments, it is crucial to configure all OpenStack services in a highly available configuration to provide as much possible uptime for the control plane services of the cloud. Deploying a highly available control plane for OpenStack can be achieved in various configurations. Each of these configurations would serve certain set of demands and introduce a growing set of prerequisites.

Pacemaker is used to create active-active clusters to guarantee services' resilience to possible faults. Pacemaker is also used to create a virtual IP addresses for each of the services. HAProxy serves as a load balancer for incoming calls to service's APIs.

This article discusses neither high availably of virtual machine instances nor Nova-Compute service of the hypervisor.

OpenStack Configuration Cookbook

Most of the OpenStack services are stateless, OpenStack services store persistent in a SQL database, which is potentially a single point of failure we should make highly available. In this article, we will deploy a highly available database using MariaDB and Galera, which implements multimaster replication. To ensure availability of the message bus, we will configure RabbitMQ with mirrored queues.

This article discusses configuring each service separately on three controllers' layout that runs OpenStack controller services, including Neutron, database, and RabbitMQ message bus. All can be configured on several controller nodes, or each service could be implemented on its separate set of hosts.

Installing Pacemaker

All OpenStack services consist of system Linux services. The first step of ensuring services' availability is to configure Pacemaker clusters for each service, so Pacemaker monitors the services. In case of failure, Pacemaker restarts the failed service. In addition, we will use Pacemaker to create a virtual IP address for each of OpenStack's services to ensure services are accessible using the same IP address when failures occurs and the actual service has relocated to another host.

In this section, we will install Pacemaker and prepare it to configure highly available OpenStack services.

Getting ready

To ensure maximum availability, we will install and configure three hosts to serve as controller nodes. Prepare three controller hosts with identical hardware and network layout. We will base our configuration for most of the OpenStack services on the configuration used in a single controller layout, and we will deploy Neutron network services on all three controller nodes.

How to do it…

Run the following steps on three highly available controller nodes:

    1. Install pacemaker packages:
[root@controller1 ~]# yum install -y pcs pacemaker corosync
fence-agents-all resource-agents
  1. Enable and start the pcsd service:
    [root@controller1 ~]# systemctl enable pcsd
    [root@controller1 ~]# systemctl start pcsd
    
  2. Set a password for hacluster user; the password should be identical on all the nodes:
    [root@controller1 ~]# echo 'password' | passwd --stdin 
    hacluster
    

    We will use the hacluster password through the HAProxy configuration.

  3. Authenticate all controller nodes running using -p option to give the password on the command line, and provide the same password you have set in the previous step:
    [root@controller1 ~] # pcs cluster auth controller1 controller2 
    controller3 -u hacluster -p password --force

    At this point, you may run pcs commands from a single controller node instead of running commands on each node separately.

    [root@controller1 ~]# rabbitmqctl set_policy HA '^(?!amq\.).*' 
    '{"ha-mode": "all"}'

There's more...

You may find the complete Pacemaker documentation, which includes installation documentation, complete configuration reference, and examples in Cluster Labs website at http://clusterlabs.org/doc/.

Installing HAProxy

Addressing high availability for OpenStack includes avoiding high load of a single host and ensuring incoming TCP connections to all API endpoints are balanced across the controller hosts. We will use HAProxy, an open source load balancer, which is particularly suited for HTTP load balancing as it supports session persistence and layer 7 processing.

Getting ready

In this section, we will install HAProxy on all controller hosts, configure Pacemaker cluster for HAProxy services, and prepare for OpenStack services configuration.

How to do it...

Run the following steps on all controller nodes:

  1. Install HAProxy package:
    # yum install -y haproxy
  2. Enable nonlocal binding Kernel parameter:
    # echo net.ipv4.ip_nonlocal_bind=1 >>
    /etc/sysctl.d/haproxy.conf
    # echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind
  3. Configure HAProxy load balancer settings for the GaleraDB, RabbitMQ, and Keystone service as shown in the following diagram:

     OpenStack Configuration Cookbook

  4. Edit /etc/haproxy/haproxy.cfg with the following configuration:
    global
       daemon
    defaults
       mode tcp
       maxconn 10000
       timeout connect 2s
       timeout client 10s
       timeout server 10s
     
    frontend vip-db
       bind 192.168.16.200:3306
       timeout client 90s
       default_backend db-vms-galera
     
    backend db-vms-galera
       option httpchk
       stick-table type ip size 2
       stick on dst
       timeout server 90s
       server rhos5-db1 192.168.16.58:3306 check inter 1s port 9200
       server rhos5-db2 192.168.16.59:3306 check inter 1s port 9200
       server rhos5-db3 192.168.16.60:3306 check inter 1s port 9200
     
    frontend vip-rabbitmq
       bind 192.168.16.213:5672
       timeout client 900m
       default_backend rabbitmq-vms
     
    backend rabbitmq-vms
       balance roundrobin
       timeout server 900m
       server rhos5-rabbitmq1 192.168.16.61:5672 check inter 1s
       server rhos5-rabbitmq2 192.168.16.62:5672 check inter 1s
       server rhos5-rabbitmq3 192.168.16.63:5672 check inter 1s
     
    frontend vip-keystone-admin
       bind 192.168.16.202:35357
       default_backend keystone-admin-vms
    backend keystone-admin-vms
       balance roundrobin
       server rhos5-keystone1 192.168.16.64:35357 check inter 1s
       server rhos5-keystone2 192.168.16.65:35357 check inter 1s
       server rhos5-keystone3 192.168.16.66:35357 check inter 1s
     
    frontend vip-keystone-public
       bind 192.168.16.202:5000
       default_backend keystone-public-vms
    backend keystone-public-vms
       balance roundrobin
       server rhos5-keystone1 192.168.16.64:5000 check inter 1s
       server rhos5-keystone2 192.168.16.65:5000 check inter 1s
       server rhos5-keystone3 192.168.16.66:5000 check inter 1s

    This configuration file is an example for configuring HAProxy with load balancer for the MariaDB, RabbitMQ, and Keystone service.

  5. We need to authenticate on all nodes before we are allowed to change the configuration to configure all nodes from one point. Use the previously configured hacluster user and password to do this.
    # pcs cluster auth controller1 controller2 controller3 -u
    hacluster -p password --force
  6. Create a Pacemaker cluster for HAProxy service as follows:

    Note that you can run pcs commands now from a single controller node.

    # pcs cluster setup --name ha-controller controller1 controller2 controller3
    # pcs cluster enable --all
    # pcs cluster start --all
    
  7. Finally, using pcs resource create command, create a cloned systemd resource that will run a highly available active-active HAProxy service on all controller hosts:
    pcs resource create lb-haproxy systemd:haproxy op monitor 
    start-delay=10s --clone
  8. Create the virtual IP address for each of the services:
    # pcs resource create vip-db IPaddr2 ip=192.168.16.200
    # pcs resource create vip-rabbitmq IPaddr2 ip=192.168.16.213
    # pcs resource create vip-keystone IPaddr2 ip=192.168.16.202
    
  9. You may use pcs status command to verify whether all resources are successfully running:

    # pcs status

Configuring Galera cluster for MariaDB

Galera is a multimaster cluster for MariaDB, which is based on synchronous replication between all cluster nodes. Effectively, Galera treats a cluster of MariaDB nodes as one single master node that reads and writes to all nodes. Galera replication happens at transaction commit time, by broadcasting transaction write set to the cluster for application. Client connects directly to the DBMS and experiences close to the native DBMS behavior. wsrep API (write set replication API) defines the interface between Galera replication and the DBMS:

OpenStack Configuration Cookbook

Getting ready

In this section, we will install Galera cluster packages for MariaDB on our three controller nodes, then we will configure Pacemaker to monitor all Galera services.

Pacemaker can be stopped on all cluster nodes, as shown, if it is running from previous steps:

# pcs cluster stop --all

How to do it..

Perform the following steps on all controller nodes:

  1. Install galera packages for MariaDB:
    # yum install -y mariadb-galera-server xinetd resource-agents
  2. Edit /etc/sysconfig/clustercheck and add the following lines:
    MYSQL_USERNAME="clustercheck"
    MYSQL_PASSWORD="password"
    MYSQL_HOST="localhost"
  3. Edit Galera configuration file /etc/my.cnf.d/galera.cnf with the following lines:

    Make sure to enter host's IP address at the bind-address parameter.

[mysqld]
skip-name-resolve=1
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=[host-IP-address]
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name="galera_cluster"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync

You can learn more on each of the Galera's default options on the documentation page at http://galeracluster.com/documentation-webpages/configuration.html.

  • Add the following lines to the xinetd configuration file /etc/xinetd.d/galera-monitor:
    service galera-monitor
    {
           port           = 9200
           disable         = no
           socket_type     = stream
           protocol       = tcp
           wait           = no
           user           = root
           group           = root
           groups         = yes
           server         = /usr/bin/clustercheck
           type           = UNLISTED
           per_source     = UNLIMITED
           log_on_success =
           log_on_failure = HOST
           flags           = REUSE
    }
  • Start and enable the xinetd service:
    # systemctl enable xinetd
    # systemctl start xinetd
    # systemctl enable pcsd
    # systemctl start pcsd
  • Authenticate on all nodes. Use the previously configured hacluster user and password to do this as follows:
    # pcs cluster auth controller1 controller2 controller3 -u 
    hacluster -p password --force

    Now commands can be run from a single controller node.

  • Create a Pacemaker cluster for Galera service:
    # pcs cluster setup --name controller-db controller1 
    controller2 controller3
    # pcs cluster enable --all
    # pcs cluster start --all
  • Add the Galera service resource to the Galera Pacemaker cluster:
    # pcs resource create galera galera enable_creation=true 
    wsrep_cluster_address="gcomm://controller1,controller2,controll
    er3" meta master-max=3 ordered=true op promote timeout=300s on-
    fail=block --master
  • Create a user for CLusterCheck xinetd service:
    mysql -e "CREATE USER 'clustercheck'@'localhost' IDENTIFIED BY 
    'password';"

See also

You can find the complete Galera documentation, which includes installation documentation and complete configuration reference and examples in Galera cluster website at http://galeracluster.com/documentation-webpages/.

Installing RabbitMQ with mirrored queues

RabbitMQ is used as a message bus for services to inner-communicate. The queues are located on a single node that makes the RabbitMQ service a single point of failure. To avoid RabbitMQ being a single point of failure, we will configure RabbitMQ to use mirrored queues across multiple nodes. Each mirrored queue consists of one master and one or more slaves, with the oldest slave being promoted to the new master if the old master disappears for any reason. Messages published to the queue are replicated to all slaves.

Getting Ready

In this section, we will install RabbitMQ packages on our three controller nodes and configure RabbitMQ to mirror its queues across all controller nodes, then we will configure Pacemaker to monitor all RabbitMQ services.

How to do it..

Perform the following steps on all controller nodes:

  1. Install RabbitMQ packages on all controller nodes:
    # yum -y install rabbitmq-server
  2. Start and enable rabbitmq-server service:
    # systemctl start rabbitmq-server
    # systemctl stop rabbitmq-server
  3. RabbitMQ cluster nodes use a cookie to determine whether they are allowed to communicate with each other; for nodes to be able to communicate, they must have the same cookie. Copy erlang.cookie from controller1 to controller2 and controller3:
    [root@controller1 ~]# scp /var/lib/rabbitmq/.erlang.cookie 
    root@controller2:/var/lib/rabbitmq/
    [root@controller1 ~]## scp /var/lib/rabbitmq/.erlang.cookie 
    root@controller3:/var/lib/rabbitmq/
  4. Start and enable Pacemaker on all nodes:

    # systemctl enable pcsd
    # systemctl start pcsd

    Since we already authenticated all nodes of the cluster in the previous section, we can now run following commands on controller1.

  5. Create a new Pacemaker cluster for RabbitMQ service as follows:

    [root@controller1 ~]# pcs cluster setup --name rabbitmq 
    controller1 controller2 controller3
    [root@controller1 ~]# pcs cluster enable --all
    [root@controller1 ~]# pcs cluster start --all
  6. To the Pacemaker cluster, add a systemd resource for RabbitMQ service:

    [root@controller1 ~]# pcs resource create rabbitmq-server 
    systemd:rabbitmq-server op monitor start-delay=20s --clone
    
  7. Since all RabbitMQ nodes must join the cluster one at a time, stop RabbitMQ on controller2 and controller3:

    [root@controller2 ~]# rabbitmqctl stop_app
    [root@controller3 ~]# rabbitmqctl stop_app
    
  8. Join controller2 to the cluster and start RabbitMQ on it:

    [root@controller2 ~]# rabbitmqctl join_cluster 
    rabbit@controller1
    [root@controller2 ~]# rabbitmqctl start_app
    
  9. Now join controller3 to the cluster as well and start RabbitMQ on it:

    [root@controller3 ~]# rabbitmqctl join_cluster 
    rabbit@controller1
    [root@controller3 ~]# rabbitmqctl start_app
    
  10. At this point, the cluster should be configured and we need to set RabbitMQ's HA policy to mirror the queues to all RabbitMQ cluster nodes as follows:

There's more..

The RabbitMQ cluster should be configured with all the queues cloned to all controller nodes. To verify cluster's state, you can use the rabbitmqctl cluster_status and rabbitmqctl list_policies commands from each of controller nodes as follows:

[root@controller1 ~]# rabbitmqctl cluster_status
[root@controller1 ~]# rabbitmqctl list_policies

To verify Pacemaker's cluster status, you may use pcs status command as follows:

[root@controller1 ~]# pcs status

See also

For a complete documentation on how RabbitMQ implements the mirrored queues feature and additional configuration options, you can refer to project's documentation pages at https://www.rabbitmq.com/clustering.html and https://www.rabbitmq.com/ha.html.

Configuring Highly OpenStack Services

Most OpenStack services are stateless web services that keep persistent data on a SQL database and use a message bus for inner-service communication. We will use Pacemaker and HAProxy to run OpenStack services in an active-active highly available configuration, so traffic for each of the services is load balanced across all controller nodes and cloud can be easily scaled out to more controller nodes if needed. We will configure Pacemaker clusters for each of the services that will run on all controller nodes. We will also use Pacemaker to create a virtual IP addresses for each of OpenStack's services, so rather than addressing a specific node, services will be addressed by their corresponding virtual IP address. We will use HAProxy to load balance incoming requests to the services across all controller nodes.

Get Ready

In this section, we will use the virtual IP address we created for the services with Pacemaker and HAProxy in previous sections. We will also configure OpenStack services to use the highly available Galera-clustered database, and RabbitMQ with mirrored queues.

This is an example for the Keystone service. Please refer to the Packt website URL here for complete configuration of all OpenStack services.

How to do it..

Perform the following steps on all controller nodes:

  1. Install the Keystone service on all controller nodes:
    yum install -y openstack-keystone openstack-utils
    openstack-selinux
  2. Generate a Keystone service token on controller1 and copy it to controller2 and controller3 using scp:
    [root@controller1 ~]# export SERVICE_TOKEN=$(openssl rand -hex
    10)
    [root@controller1 ~]# echo $SERVICE_TOKEN >
    ~/keystone_admin_token
    [root@controller1 ~]# scp ~/keystone_admin_token
    root@controller2:~/keystone_admin_token
  3. Export the Keystone service token on controller2 and controller3 as well:
    [root@controller2 ~]# export SERVICE_TOKEN=$(cat 
    ~/keystone_admin_token)
    [root@controller3 ~]# export SERVICE_TOKEN=$(cat 
    ~/keystone_admin_token)

    Note: Perform the following commands on all controller nodes.

  4. Configure the Keystone service on all controller nodes to use vip-rabbit:
    # openstack-config --set /etc/keystone/keystone.conf DEFAULT
    admin_token $SERVICE_TOKEN
    # openstack-config --set /etc/keystone/keystone.conf DEFAULT
    rabbit_host vip-rabbitmq
  5. Configure the Keystone service endpoints to point to Keystone virtual IP:
    # openstack-config --set /etc/keystone/keystone.conf DEFAULT
    admin_endpoint 'http://vip-keystone:%(admin_port)s/'
    # openstack-config --set /etc/keystone/keystone.conf DEFAULT
    public_endpoint 'http://vip-keystone:%(public_port)s/'
  6. Configure Keystone to connect to the SQL databases use Galera cluster virtual IP:
    # openstack-config --set /etc/keystone/keystone.conf database 
    connection mysql://keystone:keystonetest@vip-mysql/keystone
    # openstack-config --set /etc/keystone/keystone.conf database 
    max_retries -1
  7. On controller1, create Keystone KPI and sync the database:
    [root@controller1 ~]# keystone-manage pki_setup --keystone-user keystone --keystone-group keystone
    [root@controller1 ~]# chown -R keystone:keystone
    /var/log/keystone   /etc/keystone/ssl/
    [root@controller1 ~] su keystone -s /bin/sh -c "keystone-manage
    db_sync"
  8. Using scp, copy Keystone SSL certificates from controller1 to controller2 and controller3:

    [root@controller1 ~]# rsync -av /etc/keystone/ssl/
    controller2:/etc/keystone/ssl/
    [root@controller1 ~]# rsync -av /etc/keystone/ssl/
    controller3:/etc/keystone/ssl/
  9. Make sure that Keystone user is owner of newly copied files controller2 and controller3:

    [root@controller2 ~]# chown -R keystone:keystone
    /etc/keystone/ssl/
    [root@controller3 ~]# chown -R keystone:keystone
    /etc/keystone/ssl/
    
  10. Create a systemd resource for the Keystone service, use --clone to ensure it runs with active-active configuration:
    [root@controller1 ~]# pcs resource create keystone
    systemd:openstack-keystone op monitor start-delay=10s --clone
  11. Create endpoint and user account for Keystone with the Keystone VIP as given:
    [root@controller1 ~]# export SERVICE_ENDPOINT="http://vip-keystone:35357/v2.0"
    [root@controller1 ~]# keystone service-create --name=keystone --type=identity --description="Keystone Identity Service"
    [root@controller1 ~]# keystone endpoint-create --service keystone --publicurl 'http://vip-keystone:5000/v2.0' --adminurl 'http://vip-keystone:35357/v2.0' --internalurl 'http://vip-keystone:5000/v2.0'
     
    [root@controller1 ~]# keystone user-create --name admin --pass keystonetest
    [root@controller1 ~]# keystone role-create --name admin
    [root@controller1 ~]# keystone tenant-create --name admin
    [root@controller1 ~]# keystone user-role-add --user admin --role admin --tenant admin
  12. Create all controller nodes on a keystonerc_admin file with OpenStack admin credentials using the Keystone VIP:
    cat > ~/keystonerc_admin << EOF
    export OS_USERNAME=admin
    export OS_TENANT_NAME=admin
    export OS_PASSWORD=password
    export OS_AUTH_URL=http://vip-keystone:35357/v2.0/
    export PS1='[\u@\h \W(keystone_admin)]\$ '
    EOF
  13. Source the keystonerc_admin credentials file to be able to run the authenticated OpenStack commands:

    [root@controller1 ~]# source ~/keystonerc_admin
  14. At this point, you should be able to execute the Keystone commands and create the Services tenant:
    [root@controller1 ~]# keystone tenant-create --name services
    --description "Services Tenant"

Summary

In this article, we have covered the installation of Pacemaker and HAProxy, configuration of Galera cluster for MariaDB, installation of RabbitMQ with mirrored queues, and configuration of highly available OpenStack services.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Production Ready OpenStack - Recipes for Successful Environments

Explore Title