OpenStack is one of the fastest growing open source projects in history. It is rapidly becoming the standard for open source, public and private clouds. Since its first release in 2010, there have been 12 major releases, with the thirteenth being planned as of the writing of this book. The project has grown from a few thousand lines of code written by dozens of developers to over 2.6 million lines of code from over 2,100 contributors. OpenStack originally started with two projects, Object Storage (Swift) and Compute (Nova). OpenStack has grown to include over 40 projects. This huge amount of commitment and contribution has led to the momentum that OpenStack enjoys today.
OpenStack has become very popular among companies and organizations because it allows them to provide public and private clouds to their employees, partners, customers, and constituents. In addition, the vibrant community around OpenStack allows its adopters to avoid lock in and gives them freedom to work with the technologies of their choice. As an open source project, those who adopt it have the freedom to work with the community and add functionalities and features as they see fit. This flexibility has enticed hundreds of organizations to join this community, many dedicating developers to the cause.
OpenStack is extremely powerful, but it is not without complexity. One of the side effects of its rapid growth and large community involvement is the fact that things often change quickly. New projects are added regularly, and along with those projects, come new functionalities. As the community finds better ways to implement things, it often necessitates change. As the projects begin to get more and more integrated, it becomes very important to understand how these projects flow and interrelate. While the growth of OpenStack has been rapid, the development of OpenStack talent has not kept pace. As a result, individuals with OpenStack skills are in high demand.
OpenStack requires operators with the ability to identify, isolate, and troubleshoot errors that might arise in the environment. Troubleshooting OpenStack is not always straightforward because the functionality of an OpenStack cloud is delivered by several different projects all working together under the OpenStack umbrella. In addition, the OpenStack projects are further augmented by external open source technologies. With OpenStack's power and flexibility comes the challenge of pinpointing the source of errors and problems. While this challenge is real, it is by no means insurmountable.
In this book, we will show you how to find success with OpenStack troubleshooting. We will introduce you to inside tips and a winning methodology to troubleshoot your OpenStack cluster. It is assumed that you are familiar with the basic Linux administration, cloud computing in general, and OpenStack in particular. We will walk you through a set of useful tools to troubleshoot OpenStack, and we will provide a step-by-step guide to address common problems in installation, performance, availability, and automation. We will focus on central OpenStack projects, including those providing compute, storage, and networking. By the time we reach the end of this book, you will be better prepared to tackle the OpenStack troubleshooting challenges that may come your way. You will have a better understanding of how OpenStack works under the hood, and this understanding, along with the tips and methodologies presented in this book, will make you an efficient and confident OpenStack troubleshooter.
In this chapter, we will cover the following topics:
The project overview of OpenStack
Basic troubleshooting methods and tools
The more you understand about OpenStack, how it is organized and architected, the more successful you will be at troubleshooting it. In this section, we provide you with a strong foundation of understanding about OpenStack. Throughout the book, we will build on this foundation, going deeper into each project as we encounter them in future chapters. To start the journey, we will introduce you to some of the projects that are commonly deployed in an OpenStack cluster. It's worth pointing out that we won't cover every OpenStack project, but we will attempt to adequately cover each of the commonly deployed projects.
Keystone is the OpenStack Identity service. It is responsible for authentication and is involved in authorization for an OpenStack cluster. Keystone is also responsible for service discovery, allowing users to see which services are available in a cluster. A user-initiated request will typically flow through Keystone; so, learning to troubleshoot this service is a wise investment.
Glance is the OpenStack Image service. Glance is primarily responsible for image registration and storage. As an example, compute instances can be created based on machine images. These images are typically stored through and retrieved via Glance.
Neutron is the OpenStack Networking service. Networking is hard, and it is no different in OpenStack. Neutron is responsible for abstracting the network-related functionality in OpenStack. This is an area where many operators may run into trouble. Learning how to skillfully troubleshoot Neutron will serve you well as an OpenStack administrator.
Nova is the OpenStack Compute service. Nova provides compute instances in an OpenStack cloud. This is one of the largest OpenStack projects and one of the oldest. Nova is used heavily in an OpenStack cloud, and it is critical that troubleshooters understand this project, its concepts, components, and architecture.
Cinder is the project that provides block storage services for OpenStack. Cinder abstracts and provides access to several backend storage options. Compute instances will often receive their block storage via the Cinder service.
Swift is the OpenStack Object Storage service. Swift provides object-based storage, which is accessed via an API. Unlike Cinder, Swift does not expose block-level storage, but it does offer a system that allows you to store petabytes of data on a cluster that is built on commodity hardware.
The OpenStack Orchestration service is named Heat. Heat allows users to leverage a declarative template language to describe, build, and deploy OpenStack resources. It is designed to allow users to manage the entire life cycle of their cloud resources.
The OpenStack dashboard is named Horizon. Horizon provides the graphical user interface for OpenStack. It relies on the OpenStack APIs to present much of this functionality. It is an extremely useful tool when troubleshooting the APIs or OpenStack functionality in general.
Oslo is the OpenStack project that contains the shared Python libraries that are leveraged across all projects. Examples of these include code that supports messaging, command-line programs, configuration, and logging.
One of the strengths of the OpenStack community is that it treats documentation as a first-class citizen. In the community, documentation is just as important as code. The documentation project is structured like the others and receives a good amount of exposure and attention.
In addition to the these projects, there are several other popular projects worth mentioning. These projects include the following:
While Nova typically handles the provisioning of virtual machines, with Ironic, users can provision physical hardware in a cloudy way. The Ironic driver allows you to deploy bare metal hardware in a similar fashion to the way you deploy virtual machines.
Magnum is a project designed to allow users to manage application containers in OpenStack. This allows container orchestration engines, such as Docker and Kubernetes, to be leveraged through OpenStack.
Congress provides a policy as a service for OpenStack. The aim of the project is to provide a framework for regulatory compliance and governance across cloud services. Its responsibility is policy enforcement.
These are just some of the many projects under the Big Tent of OpenStack. New projects with the promise of a new functionality are created regularly. As these projects gain more and more adoption, the chance that you will need to troubleshoot them increases.
One of the design tenants of OpenStack, since its inception, is to not reinvent the wheel. In other words, when a solid technology existed that met the needs of the project, the original developers would leverage the existing technology as opposed to creating their own version. The result is that OpenStack is built upon many technologies that administrators already know and love. The tools used to troubleshoot these technologies can also be used to troubleshoot OpenStack. In this section, we will go over a few of the supporting technologies that are commonly used across OpenStack projects. Where different projects use specific supporting technologies, they will be covered in the respective chapters for those projects.
The OpenStack software runs on Linux. The primary OpenStack services are Linux services. Rest-assured that all your experience in troubleshooting the Linux operating systems will serve you well in the world of OpenStack. You will come across OpenStack clusters running on just about every Linux distribution. Some deployments will leverage Linux networking, and experience in this area is extremely valuable in OpenStack. Many of the most popular Linux distributions offer packages to deploy OpenStack. Operators may optionally deploy from source or leverage one of the installers available in the market. Either way, Linux is critical to any OpenStack deployment, and we will make use of many common Linux tools when troubleshooting.
Most OpenStack services are backed by a database. The Oslo project in OpenStack provides common Python code for OpenStack projects that need to access a database. Oslo provides libraries to connect to a Postgres or MySQL database. Experience with these database engines, and others like them, is very useful when troubleshooting. As you understand the different projects and what they store in the database, you can trace a request to ensure that the state recorded in the database matches the state reported elsewhere.
OpenStack often leverages a message broker to facilitate communication between its components. To avoid tight coupling, most components do not communicate directly with one another, but instead communication is routed through a message broker. With the message broker playing a central role in component communication, it is a powerful resource for the troubleshooter. It is possible to trace messages from one component to another and spot messages that may not be generated or delivered. This information can help lead you down the right path when attempting to isolate an issue.
There are many paths an OpenStack troubleshooter can follow when attempting to resolve an issue. It is worth arguing that there is more than one way to approach any troubleshooting problem. Operators and administrators will need to find a methodology that works well for them and the context in which they operate. With this in mind, I would like to share a methodology that I have found useful when working with OpenStack, specifically the following methodologies:
Services: Confirm that the required services are up and running.
Authentication: Ensure that authentication is properly configured.
CLI Debug: Run the CLI commands in the debug mode, looking for errors.
Execute the request against the API directly, looking for issues.
Check Logs: Check log files for traces or errors.
I have found that working through these steps when troubleshooting OpenStack will yield useful clues that will help identify, isolate, and resolve issues.
There are many tools available when troubleshooting OpenStack. In the following sections, we will cover a few of the tools that we leverage frequently. I would recommend that you add these to your toolbox if you are not already using them.
OpenStack is deployed in a Linux environment; therefore, administrators can leverage popular Linux tools when troubleshooting. If you are an experienced Linux administrator, you should be comfortable with most of these tools, and you should find that your existing Linux experience will serve you well as you troubleshoot OpenStack. In this section, we will walk you through some of the more common tools that are used. We will explain how each tool can be leveraged in an OpenStack environment specifically, but if you are interested in learning how the tools work generally, much can be learned by researching each tool on the Internet.
OpenStack runs several processes that are critical to its smooth operation. Understanding each process can be very helpful to quickly identify and resolve problems in your cluster. It is not uncommon for the source of your problems to be rooted in the fact that a process has died or not started successfully. Bringing your cluster back to health may be as simple as restarting the necessary process. As we tackle each OpenStack project, we will introduce you to the key processes for that project's service. Like any Linux process, there are several commands that we can leverage to check these processes. Some of the common commands that we will leverage are detailed in the following sections.
ps command is already familiar to you as a Linux administrator. We leverage this command in OpenStack to get a snapshot of the current processes running on our host machines. The command will quickly allow us to see which OpenStack processes are running, and more importantly when troubleshooting, which OpenStack processes are not running.
We typically use the
ps command with the standard
-aux options and then pipe that to
grep in order to find the OpenStack process we are interested in:
For example, the preceding code would list each of the OpenStack Nova processes, which, by convention, are prefixed with
nova-. It's also worth pointing out that this command may also reveal the
âlog- file option set when the process was launched. This will give you the location of the log files for each process, which will be extremely valuable during our troubleshooting.
In addition to the
ps command that is used to look at processes, you can also leverage the
pgrep command. This command allows you to look up processes based on a pattern. For example, you can list processes based on their names:
This command will list all the processes that have
nova in their name. Without the
-l option, the command would only list the process ID. If we want to see the process name too, we simply add
-l. If you'd like to see the full-line output like we saw with
ps, then you can add the
-a option. With this option, you will be able to see extra attributes that are used when starting the process, including log file locations.
The preceding command would kill the process with
PID 20069. This can be useful in situations where you have process hanging and you need to restart them. This is an alternative to the standard
pgrep provide us with a snapshot of the running processes,
htop will give us an interactive view of our processes. The
htop commands are very similar, but
htop provides you with a little added interface sugar, including the ability to scroll data. You may need to install
htop on your servers if you decide to use it. Using either of these commands, you will be able to see the processes interactively sorted by things, such as percentage of CPU used by the process or percentage of memory. If you find your cluster in a situation where there is resource contention on the host, this tool can begin to give you an idea of which process to focus on first. The following screenshot is a sample output from
It's likely you will need to troubleshoot an issue that is related to hard drives when dealing with OpenStack. You can leverage standard Linux tools to interrogate the hard drive and assist you in troubleshooting.
There will be several moments in our OpenStack journey where we will be concerned about storage and the hard drives in our cluster that provide some of that storage. The
df command will be leveraged to report on the disk space used by our filesystem. We can add the
-h option to make the values human readable:
The output from this command tells us which filesystems are currently mounted and provides usage information for each, such as the size of the filesystem, the amount used, and the amount available. The command also tells us the mount point for each filesystem.
In addition to
df, we will leverage
fdisk command allows us to work with the disk partition tables. This may become necessary when troubleshooting OpenStack Block Storage or working with images in OpenStack. Take the following code as an example:
The preceding command will list the partition table. From the list, you can see the details about the disk, including its name and size. You can also see which partitions correspond to the disk. In addition to listing the partition table, you can also modify the partitions.
This command will allow you to change the partition table for the disk named
/dev/xvda. After running this command, type
m to see the menu of commands. Using this menu, you can create new partitions, delete existing ones, or change existing partitions.
As we will discover later in this book, there are some use cases where you can't use
fdisk. In those situations, we will look to another partitioning tool named
parted. This tool also allows us to work with partitions. With
parted, we can create, resize, copy, move, and delete partitions. The
parted tool allows you to work with many different types of filesystems as compared to
The preceding command will start the parted tool. Once the tool starts, you can type
help in the prompt to see a list of menu items. Some of the functionalities listed include
makefs, to make filesystems; and
makepart, to make a partition; or
makepartfs, to make both at the same time.
On Debian and Ubuntu systems, we can leverage the advanced package tool by running the
apt command. This tool can be used to provide insight into which packages are installed on our system. This knowledge can be useful when troubleshooting OpenStack problems related to packages:
apt search openstack | less
The preceding command will list the packages that have the word
openstack in their description and paginate the output. Running this command will give you a sense of some of the packages that come together to create an OpenStack cloud. Not all the packages listed are required, but this will give you an idea of which packages are available:
apt list | grep nova
apt list command will list the packages on the system. For example, we can pipe
apt list to
nova to see a list of packages with nova in the name. We can take any of these packages and run them through
apt show to get the details about the package. Take this line of command, for example:
apt show nova-api
One of the many commands you may find useful when troubleshooting is the watch command. This command provides a convenient way to execute a command on a given time interval. I often use it to keep an eye on my processes when I'm trying to get them to restart. I've also leveraged this command when troubleshooting instance creation, as it allows me to check whether and when the instance becomes active:
watch pgrep -l nova
The preceding command will run the
pgrep -l nova command every two seconds by default. You can adjust the interval at which the command is run by passing the
watch -n 3 nova list
This command will run the nova list command every 3 seconds.
cat: This is used to print files or input to the standard output.
less: This is used to view files and allows you to page through those files.
find: This allows you to search through files in a hierarchy of directories.
grep: This allows you to search through files for lines that match a given pattern. We will use the
grepcommand quite a bit as we are searching through logs for different types of messages.
tail: This allows you to output the last part of a file. We will leverage the
tailcommand often with the
-fargument, which will allow us to follow a file as it is updated. This is used to watch logs live as we run different services or commands.
One of the central components of any OpenStack cluster is the messaging broker. OpenStack uses a message broker to pass information back and forth between its components. The message queue facilitates intra-component communication, and as a result, it can often be a useful place to search for troubleshooting clues in an OpenStack cluster. While the default message broker installed with OpenStack is RabbitMQ, deployers have the ability to select from several other messaging technologies including ZeroMQ or QPid. We will explore some high-level troubleshooting tools for RabbitMQ in the following sections.
The RabbitMQ message system comes with a handy utility named
rabbitmqctl. This tool allows operators to complete several useful tasks, but here we will highlight a few that are particularly helpful to troubleshoot OpenStack.
The preceding command will return the status of your RabbitMQ message broker. It can be helpful to check the output of this command for any errors. For example, if you see an error that starts with
Error: unable to connect to node, then this means that RabbitMQ is likely not running. You can run the following command on Ubuntu to try and start it:
sudo service rabbitmq-server start rabbitmqctl stop
The preceding command will stop your RabbitMQ message broker. You can use the same service command from the one we used here to restart it.
list_queues command will list the queues in your message broker. When you run this on an OpenStack cluster, you will be able to see the decent number of queues used by the software to pass messages back and forth between OpenStack components. In addition to the name of the queues, this command can show you several attributes for each queue. Select the attributes you want to see by passing them after your
In this command, we are requesting several columns of data directly after the
list_queues command. Running the command this way will return a tab-delimited list including the name of the queue, whether or not the queue is durable, the number of messages ready to be read from the queue, the number of consumers listening to the queue, and the current queue state.
list_queues command, there is also a
list_exchanges command, which allows you to see the exchanges in your message broker. Running this command on an OpenStack cluster will allow you to see the exchanges that OpenStack leverages. Exchanges sit between message producers and the queues where those messages will eventually reside. The exchange is responsible for taking the message from the message producer and delivering that message to the appropriate queue if there is any queue at all:
rabbitmqctl list_exchanges name type durable policy
list_exchanges command with the name, type, durable, and policy column headers, as demonstrated in the preceding line of code, will output the exchanges and their respective values for each column. Specifically, the name of the exchange; the exchange type which is either direct, topic, headers, or fanout; whether or not the exchange is durable, meaning it will survive a server restart; and finally, the exchange's policy. Policies are a method by which administrators can control the behavior of queues and exchanges across the entire RabbitMQ cluster. You can see which policies, if any, are configured by running this:
In RabbitMQ, exchanges are related to queues through bindings. Queues use bindings to tell exchanges that they are interested in messages flowing through that exchange. You can list the bindings by running this code:
list bindings command will display each of the bindings between exchanges and queues. By default, it will list the name of the source, the type of the source, the name of the destination, the type of the destination, a routing key, and any binding arguments.
To view a list of RabbitMQ clients with connections to the RabbitMQ server, run the
This command will list the username associated with the connection, the hostname of the connected peer, the port of the peer, and the state of the connection.
In this chapter, we explored the OpenStack projects at a high level. You learned a little about each of the core projects and some of the optional projects that are deployed. We looked at the main supporting technologies that OpenStack leverages to provide things like, data persistence, and messaging. Finally, we introduced the troubleshooting methodology that we will use throughout this book and a few of the troubleshooting tools that we will take advantage of while working with OpenStack. In the next chapter, we will dive into troubleshooting OpenStack Keystone, the identity service.