Home Cloud & Networking Mastering OpenStack

Mastering OpenStack

By Omar Khedher
books-svg-icon Book
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Designing OpenStack Cloud Architecture
About this book
Publication date:
July 2015
Publisher
Packt
Pages
400
ISBN
9781784395643

 

Chapter 1. Designing OpenStack Cloud Architecture

Owing to the widespread use of OpenStack development around the globe, several enterprises have already started switching to a new and amazing way to gain infrastructural resources and reduce the investment costs of their respective IT environments. What makes this opportunity great is the open source experience that it offers. Well, you may claim that there are several other cloud solutions that are open source as well. What makes OpenStack unique is its exposure; it is widely open to other open source solutions along with being a shining example of a multiport-integrated solution with great flexibility. All that you really need is a good design to fulfill most of your requirements and the right decisions on how and what to deploy.

If you browse the pages of this book, you might wonder what makes a laminated cover entitled Mastering, such a great deal to you as a system administrator, cloud architect, DevOps engineer, or any technical personnel operating on the Linux platform. Basically, you may be working on a project, going on a vacation, building a house, or redesigning your fancy apartment. In each of these cases, you will always need a strategy. A Japanese military leader, Miyamoto Musashi, wrote the following—a very impressive thought on perception and sight—in The Book of Five Rings, Start Publishing LLC:

"In strategy, it is important to see distant things as if they were close and to take a distanced view of close things."

Ultimately, based on what you learned from the OpenStack literature, and what you have deployed, or practiced, you will probably ask the famous key question: How does OpenStack work? Well, the OpenStack community is very rich in terms of topics and tutorials—some of which you may have already tried out. It is time to go ahead and raise the curtain on the OpenStack design and architecture.

Basically, the goal of this chapter is to get you from where you are today to the point where you can confidently build a private cloud based on OpenStack with your own design choice.

At the end of this chapter, you will have a good perspective on ways to design your project by putting the details under the microscope. You will also learn about how OpenStack services work together and be ready for the next stage of our adventure by starting the deployment of an OpenStack environment with best practices.

This chapter will cover the following points:

  • Getting acquainted with the logical architecture of the OpenStack ecosystem and the way its different core components interact with each other

  • Learning how to design an OpenStack environment by choosing the right core services for the right environment

  • Designing the first OpenStack architecture for a large-scale environment while bearing in mind that OpenStack can be designed in numerous ways

  • Learning some best practices and the process of capacity planning for a robust OpenStack environment

Let's start the mission by putting the spot light on the place where the core OpenStack components come in the first place.

 

OpenStack – think again


Today, cloud computing is about Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). The challenge that has been set by the public cloud is about agility, speed, and service efficiency. Most companies have expensive IT systems they have developed and deployed over the years, but they are siloed. In many cases, the IT systems are struggling to respond to the agility and speed of the public cloud services that are offered within their own private silos in their own private data center. The traditional data center model and siloed infrastructure might lead to unsustainability. In fact, today's enterprise data center focuses on what it takes to become a next-generation data center. The shift to the new data center generation has evolved the adoption of a model for the management and provision of software. This has been accompanied by a shift from workload isolation in the traditional model to a mixed model. With an increasing number of users utilizing cloud services, the next-generation data centers are able to handle multitenancy. The traditional one was limited to a single tenancy. Moreover, enterprises today look for scaling down next to scaling up. It is a huge step in the data center technology to shift the way of handling an entire infrastructure.

The big move to a software infrastructure has allowed administrators and operators to deliver a fully automated infrastructure within a minute. The next-generation data center reduces the infrastructure to a single, big, agile, scalable, and automated unit. The end result is that the administrators will have to program the infrastructure. This is where OpenStack comes into the picture—the next-generation data center operating system. The ubiquitous influence of OpenStack was felt by many big global cloud enterprises such as VMware, Cisco, Juniper, IBM, Red Hat, Rackspace, PayPal, and EBay, to name but a few. Today, many of them are running a very large scalable private cloud based on OpenStack in their production environment. If you intend to be a part of a winning, innovative cloud enterprise, you should jump to the next-generation data center and gain a valuable experience by adopting OpenStack in your IT infrastructure.

Note

To read more about the success stories of many companies, visit https://www.openstack.org/user-stories.

 

Introducing the OpenStack logical architecture


Before delving into the architecture of OpenStack, we need to refresh or fill gaps, if they do exist, to learn more about the basic concepts and usage of each core component.

In order to get a better understanding on how it works, it will be beneficial to first briefly parse the things that make it work. Assuming that you have already installed OpenStack or even deployed it in a small or medium-sized environment, let's put the essential core components under the microscope and go a bit further by taking the use cases and asking the question: What is the purpose of such a component?

Keystone

From an architectural perspective, Keystone presents the simplest service in the OpenStack composition. It is the core component that provides identity service and it integrates functions for authentication, catalog services, and policies to register and manage different tenants and users in the OpenStack projects. The API requests between OpenStack services are being processed by Keystone to ensure that the right user or service is able to utilize the requested OpenStack service. Keystone performs numerous authentication mechanisms such as username/password as well as a token-authentication-based system. Additionally, it is possible to integrate it with an existing backend directory such as Lightweight Directory Access Protocol (LDAP) and the Pluggable Authentication Module (PAM).

A similar real-life example is a city game. You can purchase a gaming day card and profit by playing a certain number of games during a certain period of time. Before you start gaming, you have to ask for the card to get an access to the city at the main entrance of the city game. Every time you would like to try a new game, you must check in at the game stage machine. This will generate a request, which is mapped to a central authentication system to check the validity of the card and its warranty, to profit the requested game. By analogy, the token in Keystone can be compared to the gaming day card except that it does not diminish anything from your request. The identity service is being considered as a central and common authentication system that provides access to the users.

Swift

Although it was briefly claimed that Swift would be made available to the users along with the OpenStack components, it is interesting to see how Swift has empowered what is referred to as cloud storage. Most of the enterprises in the last decade did not hide their fears about a critical part of the IT infrastructure—the storage where the data is held. Thus, the purchasing of expensive hardware to be in the safe zone had become a habit. There are always certain challenges that are faced by storage engineers and no doubt, one of these challenges include the task of minimizing downtime while increasing the data availability. Despite the rise of many smart solutions for storage systems during the last few years, we still need to make changes to the traditional way. Make it cloudy! Swift was introduced to fulfill this mission.

We will leave the details pertaining to the Swift architecture for later, but you should keep in mind that Swift is an object storage software, which has a number of benefits:

  • No central brain indicates no Single Point Of Failure (SPOF)

  • Curative indicates autorecovery in case of failure

  • Highly scalable for large petabytes store access by scaling horizontally

  • Better performance, which is achieved by spreading the load over the storage nodes

  • Inexpensive hardware can be used for redundant storage clusters

Glance

When I had my first presentation on the core components and architecture of OpenStack with my first cloud software company, I was surprised by a question raised by the CTO: What is the difference between Glance and Swift? Both handle storage. Well, despite my deployment of OpenStack (Cacti and Diablo were released at the time) and familiarity with the majority of the component's services, I found the question quite tough to answer! As a system architect or technical designer, you may come across the following questions: What is the difference between them? Why do I need to integrate such a solution? On one hand, it is important to distinguish the system interaction components so that it will be easier to troubleshoot and operate within the production environments. On the other hand, it is important to satisfy the needs and conditions that go beyond your IT infrastructure limits.

To alleviate any confusion, we keep it simple. Swift and Glance are storage systems. However, the difference between the two is in what they store. Swift is designed to be an object storage where you can keep data such as virtual disks, images, backup archiving, and so forth, while Glance stores metadata of images. Metadata can be information such as kernel, disk images, disk format, and so forth. Do not be surprised that Glance was originally designed to store images. Since the first release of OpenStack included only Nova and Swift (Austin code name October 21, 2010), Glance was integrated with the second release (Bexar code name February 23, 2011).

The mission of Glance is to be an image registry. From this point, we can conclude how OpenStack has paved the way to being more modular and loosely coupled core component model. Using Glance to store virtual disk images is a possibility. From an architectural level, including more advanced ways to query image information via the Image Service API provided by Glance through an independent image storage backend such as Swift brings more valuable performance and well-organized system core services. In this way, a client (can be a user or an external service) will be able to register a new virtual disk image, for example, to stream it from a highly scalable and redundant store. At this level, as a technical operator, you may face another challenge—performance. This will be discussed at the end of the book.

Cinder

You may wonder whether there is another way to have storage in OpenStack. Indeed, the management of the persistent block storage is being integrated into OpenStack by using Cinder. Its main capability to present block-level storage provides raw volumes that can be built into logical volumes in the filesystem and mounted in the virtual machine.

Some of the features that Cinder offers are as follows:

  • Volume management: This allows the creation or deletion of a volume

  • Snapshot management: This allows the creation or deletion of a snapshot of volumes

  • You can attach or detach volumes from instances

  • You can clone volumes

  • Volume creation from snapshots is possible via Cinder

  • You can copy images to volumes and vice versa

Several storage options have been proposed in the OpenStack core. Without a doubt, you may be asked this question: What kind of storage will be the best for you? With a decision-making process, a list of pros and cons should be made. The following is a very simplistic table that describes the difference between the storage types in OpenStack to avoid any confusion when choosing the storage management option for your future architecture design:

Specification

Storage Type

 

Object storage

Block storage

Performance

-

OK

Database storage

-

OK

Restoring backup data

OK

OK

Setup for volume providers

-

OK

Persistence

OK

OK

Access

Anywhere

Within VM

Image storage

OK

-

It is very important to keep in mind that unlike Glance and Keystone services, Cinder features are delivered by orchestrating volume providers through the configurable setup driver's architectures such as IBM, NetApp, Nexenta, and VMware.

Whatever choice you have made, it is always considered good advice since nothing is perfect. If Cinder is proven as an ideal solution or a replacement of the old nova-volume service that existed before the Folsom release on an architectural level, it is important to know that Cinder has organized and created a catalog of block-based storage devices with several differing characteristics. However, it is obvious if we consider the limitation of commodity storage redundancy and autoscaling. Eventually, the block storage service as the main core of OpenStack can be improved if a few gaps are filled, such as the addition of values:

  • Quality of service

  • Replication

  • Tiering

The aforementioned Cinder specification reveals its Non-vendor-lock-in characteristic, where it is possible to change the backend easily or perform data migration between two different storage backends. Therefore, a better storage design architecture in a Cinder use case will bring a third party into the scalability game. More details will be covered in Chapter 4, Learning OpenStack Storage – Deploying the Hybrid Storage Model. For instance, you can keep in mind that Cinder is essential for our private cloud design, but it misses some capacity scaling features.

Nova

As you may already know, Nova is the most original core component of OpenStack. From an architectural level, it is considered one of the most complicated components of OpenStack.

In a nutshell, Nova runs a large number of requests, which are collaborated to respond to a user request into running VM. Let's break down the blob image of nova by assuming that its architecture as a distributed application needs orchestration to carry out tasks between different components.

nova-api

The nova-api component accepts and responds to the end user and computes the API calls. The end users or other components communicate with the OpenStack Nova API interface to create instances via OpenStack API or EC2 API.

Note

Nova-api initiates most of the orchestrating activities such as the running of an instance or the enforcement of some particular policies.

nova-compute

The nova-compute component is primarily a worker daemon that creates and terminates VM instances via the hypervisor's APIs (XenAPI for XenServer, Libvirt KVM, and the VMware API for VMware).

It is important to depict how such a process works. The following steps delineate this process:

  1. Accept actions from the queue and perform system commands such as the launching of the KVM instances to take them out when updating the state in the database.

  2. Working closely with nova-volume to override and provide iSCSI or Rados block devices in Ceph.

    Note

    Ceph is an open source storage software platform for object, block, and file storage in a highly available storage environment. This will be further discussed in Chapter 4, Learning OpenStack Storage – Deploying the Hybrid Storage Model.

nova-volume

The nova-volume component manages the creation, attaching, and detaching of N volumes to compute instances (similar to Amazon's EBS).

Note

Cinder is a replacement of the nova-volume service.

nova-network

The nova-network component accepts networking tasks from the queue and then performs these tasks to manipulate the network (such as setting up bridging interfaces or changing the IP table rules).

Note

Neutron is a replacement of the nova-network service.

nova-scheduler

The nova-scheduler component takes a VM instance's request from the queue and determines where it should run (specifically which compute server host it should run on). At an application architecture level, the term scheduling or scheduler invokes a systematic search for the best outfit for a given infrastructure to improve its performance.

Nova also provides console services that allow end users to access the console of the virtual instance through a proxy such as nova-console, nova-novncproxy, and nova-consoleauth.

By zooming out the general components of OpenStack, we find that Nova interacts with several services such as Keystone for authentication, Glance for images, and Horizon for the web interface. For example, the Glance interaction is central; the API process can upload any query to Glance, while nova-compute will download images to launch instances.

Queue

Queue provides a central hub to pass messages between daemons. This is where information is shared between different Nova daemons by facilitating the communication between discrete processes in an asynchronous way.

Any service can easily communicate with any other service via the APIs and queue a service. One major advantage of the queuing system is that it can buffer a large buffer workload. Rather than using an RPC service, a queue system can queue a large workload and give an eventual consistency.

Database

A database stores most of the build-time and runtime state for the cloud infrastructure, including instance types that are available for use, instances in use, available networks, and projects. It is the second essential piece of sharing information in all OpenStack components.

Neutron

Neutron provides a real Network as a Service (NaaS) between interface devices that are managed by OpenStack services such as Nova. There are various characteristics that should be considered for Neutron:

  • It allows users to create their own networks and then attach server interfaces to them

  • Its pluggable backend architecture lets users take advantage of the commodity gear or vendor-supported equipment

  • Extensions allow additional network services, software, or hardware to be integrated

Neutron has many core network features that are constantly growing and maturing. Some of these features are useful for routers, virtual switches, and the SDN networking controllers.

Note

Starting from the Folsom release, the Quantum network service has been replaced by a project named Neutron, which was incorporated into the mainline project in the subsequent releases. The examples elaborated in this book are based on the Havana release and later.

Neutron introduces new concepts, which includes the following:

  • Port: Ports in Neutron refer to the virtual switch connections. These connections are where instances and network services attached to networks. When attached to the subnets, the defined MAC and IP addresses of the interfaces are plugged into them.

  • Networks: Neutron defines networks as isolated Layer 2 network segments. Operators will see networks as logical switches that are implemented by the Linux bridging tools, Open vSwitch, or some other software. Unlike physical networks, this can be defined by either the operators or users in OpenStack.

    Note

    Subnets in Neutron represent a block of IP addresses associated with a network. They will be assigned to instances in an associated network.

  • Routers: Routers provide gateways between various networks.

  • Private and floating IPs: Private and floating IP addresses refer to the IP addresses that are assigned to instances. Private IP addresses are visible within the instance and are usually a part of a private network dedicated to a tenant. This network allows the tenant's instances to communicate when isolated from the other tenants.

    • Private IP addresses are not visible to the Internet.

    • Floating IPs are virtual IPs that Neutron maps instance to private IPs via Network Access Translation (NAT). Floating IP addresses are assigned to an instance so that they can connect to external networks and access the Internet. They are exposed as public IPs, but the guest's operating system has completely no idea that it was assigned an IP address.

In Neutron's low-level orchestration of Layer 1 through Layer 3, components such as IP addressing, subnetting, and routing can also manage high-level services. For example, Neutron provides Load Balancing as a Service (LBaaS) utilizing HAProxy to distribute the traffic among multiple compute node instances.

Note

You can refer to the last updated documentation for more information on networking in OpenStack at http://docs.openstack.org/networking-guide/intro_networking.html.

The Neutron architecture

There are three main components of Neutron architecture that you ought to know in order to validate your decision later with regard to the use case for a component within the new releases of OpenStack:

  • Neutron-server: It accepts the API requests and routes them to the appropriate neutron-plugin for its action

  • Neutron plugins and agents: They perform the actual work such as the plugging in or unplugging of ports, creating networks and subnets, or IP addressing.

    Note

    Agents and plugins differ depending on the vendor technology of a particular cloud for the virtual and physical Cisco switches, NEC, OpenFlow, OpenSwitch, Linux bridging, and so on.

  • Queue: This routes messages between the neutron-server and various agents as well as the database to store the plugin state for a particular queue

Neutron is a system that manages networks and IP addresses. OpenStack networking ensures that the network will not be turned into a bottleneck or limiting factor in a cloud deployment and gives users real self-service, even over their network configurations.

Another advantage of Neutron is its capability to provide a way for organizations to relieve stress within the network of cloud environments and to make it easier to deliver NaaS in the cloud. It is designed to provide a plugin mechanism that will provide an option for the network operators to enable different technologies via the Neutron API.

It also lets its tenants create multiple private networks and control the IP addressing on them.

As a result of the API extensions, organizations have additional control over security and compliance policies, quality of service, monitoring, and troubleshooting, in addition to paving the way to deploying advanced network services such as firewalls, intrusion detection systems, or VPNs. More details about this will be covered in Chapter 5, Implementing OpenStack Networking and Security, and Chapter 8, Extending OpenStack – Advanced Networking Features and Deploying Multi-tier Applications.

Note

Keep in mind that Neutron allows users to manage and create networks or connect servers and nodes to various networks.

The scalability advantage will be discussed in a later topic in the context of the Software Defined Network (SDN) technology, which is an attraction to many networks and administrators who seek a high-level network multitenancy.

Horizon

Horizon is the web dashboard that pools all the different pieces together from your OpenStack ecosystem.

Horizon provides a web frontend for OpenStack services. Currently, it includes all the OpenStack services as well as some incubated projects. It was designed as a stateless and data-less web application—it does nothing more than initiating actions in the OpenStack services via API calls and displaying information that OpenStack returns to the Horizon. It does not keep any data except the session information in its own data store. It is designed to be a reference implementation that can be customized and extended by operators for a particular cloud. It forms the basis for several public clouds—most notably the HP Public Cloud and at its heart, is its extensible modular approach to construction.

Horizon is based on a series of modules called panels that define the interaction of each service. Its modules can be enabled or disabled, depending on the service availability of the particular cloud. In addition to this functional flexibility, Horizon is easy to style with Cascading Style Sheets (CSS).

Most cloud provider distributions provide a company's specific theme for their dashboard implementation.

 

Gathering the pieces and building a picture


Let's try to see how OpenStack works by chaining all the service cores covered in the previous sections in a series of steps:

  1. A user accesses the OpenStack environment via a web interface (HTTP/REST).

  2. Authentication is the first action performed. This is where Keystone comes into the picture.

  3. A conversation is started with Keystone—"Hey, I would like to authenticate and here are my credentials".

  4. Keystone responds "OK, then you may authenticate and give the token" once the credentials have been accepted

  5. You may remember that the service catalog comes with the token as a piece of code, which will allow you to access resources. Now you have it!

  6. The service catalog, during its turn, will incorporate the code by responding "Here are the resources available, so you can go through and get what you need from your accessible list".

    Note

    The service catalog is a JSON structure that exposes the resources available upon a token request.

    You can use the following example on querying by tenant to get a list of servers:

    $ curl -v -H "X-Auth-Token:token" http://192.168.27.47:8774/v2/tenant_id/servers
    

    A list of server details is returned on how to gain access to the servers:

    {
        "server": {
            "adminPass": "verysecuredpassword",
            "id": "5aaee3c3-12ee-7633-b32b-635489236232fbfbf",
            "links": [
                {
                    "href": "http://myopenstack.com/v2/openstack/servers/5aaee3c3-12ee-7633-b32b-635489236232fbfbf",
                    "rel": "self"
                },
                {
                    "href": "http://myopenstack.com/v2/openstack/servers/5aaee3c3-12ee-7633-b32b-635489236232fbfbf",
                    "rel": "bookmark"
                }
            ]
        }
    }
  7. Typically, once authenticated, you can talk to an API node. There are different APIs in the OpenStack ecosystem (OpenStack API and EC2 API).

  8. Once we authenticate and request access, we have the following services that will do the homework under the hood:

    • Compute nodes that deal with hypervisor

    • Volume services that deal with storage

    • Network services that make all the connections between VLANs and virtual network interfaces that work and talk to each other

    The next figure resumes the first blob pieces on how OpenStack works:

  9. However, how do we get these services to talk? In such cases, you should think about the wondrous connector, the RabbitMQ queuing system.

    For anyone who is non-familiar with the queuing system, we can consider an example of a central airport:

    You have booked a flight and have been assigned a specific gateway that only you are interested in. This gateway gets you directly to your seat on the plane. A queuing system allows you to tune in to the server or service that you are interested in.

    A queuing system takes care of issues such as; who wants to do the work? By analogy, since everybody listens to the airport assistance speaker channel, only one person (same passenger's destination) listens to that information and makes it work by joining the gateway.

    Now, we have this information in the queue.

    Note

    If you have a look at the Python source tree, for any service, you will see a network directory for the network code, and there will be an api.py file for every one of these services.

    Let's take an example. If you want to create an instance and implement it in the compute node, it might say "import the nova-compute node API and there is method/function there to create the instance". So, it will do all the jobs of getting over the wire and spinning up the server instances and doing the same for the appropriate node.

  10. Another element of the picture is the schedule, which looks at the services and claims "this is what you have as memory, CPU, disk, network, and so on".

    When a new request comes in, the scheduler might notify "you will get from these available resources available."

    Note

    The scheduling process in OpenStack can perform different algorithms such as simple, chance, and zone. An advanced way to do this is by deploying weight filtering by ranking the servers as its available resources.

    Using this option, the node will spin up the server while you create your own rules. Here, you distribute your servers based on the number of processors and how much memory you may want in your spinning servers.

    The last piece of this picture is that we need to get the information back. So, we have all these services that are doing something. Remember that they have a special airport gateway. Again, our queue performs some actions, and it sends notifications as these actions occur. They might be subscribed to find out certain things such as whether the network is up, the server is ready, or the server has crashed.

Provisioning a flow under the hood

It is important to understand how different services in OpenStack work together, leading to a running virtual machine. We have already seen how a request is processed in OpenStack via APIs. Now, we can go further and closely check how such services and subsystems, which includes authentication, computing, images, networks, queuing, and databases, work in tandem with performing a complete workflow to provide an instance in OpenStack. The next series of steps describes how service components work together once a submission of an instance provisioning request has been done:

  1. A client enters the user credentials via Horizon, which makes the REST call to Keystone for authentication.

  2. The authentication request will be handled by Keystone, which generates and sends back an authentication token. The token will be stored by Keystone, which will be used to authenticate against the rest of the OpenStack components by using APIs.

  3. The action of Launch Instance in the dashboard will convert the creation of a new instance request into an API request, which will be sent to the nova-api service.

  4. The nova-api service receives the authentication request and sends it for validation and access permission to Keystone.

  5. Keystone checks the token and sends an authentication validation, which includes roles and permissions.

  6. The nova-api service later creates an initial entry for an instance in the database and contacts the queuing system via an RPC call (rpc.cast). The call request will be sent to nova-scheduler to specify which host ID will run the instance.

  7. The nova-scheduler contacts the queue and subscribes the new instance request.

  8. The nova-scheduler performs the information gathering process from the database to find out the appropriate host based on its weighting and filtering algorithms.

  9. Once a host has been chosen, the nova-scheduler sends back an RPC call (rpc.cast) to start launching an instance that remains in the queue.

  10. The nova-compute contacts the queue and picks up the call issued by the nova-scheduler. Therefore, nova-compute proceeds with the subscription on the instance and sends an RPC call (rpc.call) in order to get instance-related information such as the instance characteristics (CPU, RAM, and disk) and the host ID. The RPC call remains in the queue.

  11. The nova-conductor contacts the queue and picks up the call.

  12. The nova-conductor contacts the queue and subscribes the new instance request. It interrogates the database to get instance information and publish its state in the queue.

  13. The nova-compute picks the instance information from the queue and sends an authentication token in a REST call to the glance-api to get a specific image URI from a glance.

    The image URI will be obtained by the Image ID to find the requested one from the image repository.

  14. The glance-api will verify the authentication token with Keystone.

  15. Once validated, glance-api returns the image URI, including its metadata, which specifies the location details of the image that is being scrutinized.

    Note

    If the images are stored in a Swift cluster, the images will be requested as Swift objects via the REST calls. Keep in mind that it is not the job of nova-compute to fetch from the swift storage. Swift will interface via APIs to perform object requests. More details about this will be covered in Chapter 4, Learning OpenStack Storage – Deploying the Hybrid Storage Model.

  16. The nova-compute sends the authentication token to a neutron-server via a REST call to configure the network for the instance.

  17. The neutron-server checks the token with Keystone.

  18. Once validated, the neutron-server contacts its agents, such as the neutron-l2-agent and neutron-dhcp-agent, by submitting the request in the queue.

  19. Neutron agents pick the calls from the queue and reply by sending network information pertaining to the instance. For example, neutron-l2-agent gets the L2 configuration from Libvirt and publishes it in the queue. On the contrary, neutron-dhcp-agent contacts dnsmasq for the IP allocation and returns an IP reply in the queue.

    Note

    Dnsmasq is a software that provides a network infrastructure such as the DNS forwarder and the DHCP server.

  20. The neutron-server collects all the network settings from the queue and records it in the database. Therefore, it sends back an RPC call to the queue along with all the network details.

  21. Nova-compute contacts the queue and grabs the instance network configuration.

  22. Nova-compute sends the authentication token to cinder-api via a REST call to get the volume, which will be attached to the instance.

  23. The cinder-api checks the token with Keystone.

  24. Once validated, the cinder-api returns the volume information to the queue.

  25. Nova-compute contacts the queue and grabs the block storage information.

  26. At this stage, the nova-compute executes a request to the specified hypervisor via Libvirt to start the virtual machine.

  27. In order to get the instance state, nova-compute sends an RPC call (rpc.call) to nova-conductor.

  28. The nova-conductor picks the call from the queue and replies to the queue by mentioning the new instance state.

  29. The polling instance state is always performed via nova-api, which consults the database to get the state information and sends it back to the client.

Let's figure out how things can be seen by referring to the following simple architecture diagram:

Expanding the picture

You may have certain limitations that are typically associated with network switches. Network switches create a lot of virtual LANs and virtual networks that specify whether there is a lot of input to data centers.

Let's imagine that we have 250 compute hosts scenario. You can conclude that a mesh of rack servers will be placed in the data center.

Now, you take the step to grow our data center, and to be geographically data-redundant in Europe and Africa: a data center in London, Amsterdam and Tunis.

We have a data center on each of these new locations and each of these locations are able to communicate with each other. At this point, a new terminology is introduced—cell concept.

To scale this out even further, we will take into consideration the entire system. We will take just the worker nodes and put them in other cells.

Another special scheduler works as a top-level cell and enforces the request into the child cell. Now, the child cells can do the work, and they can worry about VLAN and network issues.

The cells can share certain pieces of infrastructure, such as the database, authentication service Keystone, and some of the Glance image services. This is depicted in the following diagram:

Note

More information about the concept of cells and configuration setup in OpenStack can be found for Havana release at the following reference: http://docs.openstack.org/havana/config-reference/content/section_compute-cells.html.

 

A sample architecture setup


Let us first go through the deployment process, which is explained in the following sections. Bear in mind that this is not a unique architecture that can be deployed.

Deployment

Deployment is a huge step in distinguishing all the OpenStack components that were covered previously. It confirms your understanding of how to start designing a complete OpenStack environment. Of course, assuming the versatility and flexibility of such a cloud management platform, OpenStack offers several possibilities that might be considered an advantage. However, on the other hand, you may face a challenge of taking the right design decision that suits your needs.

Basically, what you should consider in the first place is the responsibility in the Cloud. Depending on your cloud computing mission, it is essential to know what a coordinating IaaS is. The following are the use cases:

  • Cloud service brokerage: This is a facilitating intermediary role of Cloud service consumptions for several providers, including maintenance

  • Cloud service provider: This provides XaaS to private instances

  • Self cloud service: This provides XaaS with its own IT for private usage

Apart from the knowledge of the aforementioned cloud service model providers, there are a few master keys that you should take into account in order to bring a well-defined architecture to a good basis that is ready to be deployed.

Though the system architecture design has evolved and is accompanied by the adoption of several methodology frameworks, many enterprises have successfully deployed OpenStack environments by going through a 3D process—a conceptual model design, logical model design, and physical model design.

It might be obvious that complexity increases from the conceptual to the logical design and from the logical to the physical design.

The conceptual model design

As the first conceptual phase, we will have a high-level reflection on what we will need from certain generic classes from the OpenStack architecture:

Class

Role

Compute

Stores virtual machine images

Provides a user interface

Image

Stores disk files

Provides a user interface

Object storage

Provides a user interface

Block storage

Provides volumes

Provides a user interface

Network

Provides network connectivity

Provides a user interface

Identity

Provides authentication

Dashboard

Graphical user interface

Let's map the generic basic classes in the following simplified diagram:

Keep in mind that the illustrated diagram will be refined over and over again since we will aim to integrate more services within our first basic design. In other words, we are following an incremental design approach within which we should exploit the flexibility of the OpenStack architecture.

At this level, we can have a vision and direction of the main goal without worrying about the details.

The logical model design

Based on the conceptual reflection phase, we are ready to construct the logical design. Most probably, you have a good idea about different OpenStack core components, which will be the basis of the formulation of the logical design that is done by laying down their logical representations.

Even though we have already taken the core of the OpenStack services component by component, you may need to map each of them to a logical solution within the complete design.

To do so, we will start by outlining the relations and dependencies between the services core of OpenStack. Most importantly, we aim to delve into the architectural level by keeping the details for the end. Thus, we will take into account the repartition of the OpenStack services between the new package services—the cloud controller and the compute node. You may wonder why such a consideration goes through a physical design classification. However, seeing the cloud controller and compute nodes as simple packages that encapsulate a bunch of OpenStack services, will help you refine your design at an early stage. Furthermore, this approach will plan in advance further high availability and scalability features, which allow you to introduce them later in more detail.

Note

Chapter 3, Learning OpenStack Clustering – Cloud Controllers and Compute Nodes, describes in depth how to distribute the OpenStack services between cloud controllers and compute nodes.

Thus, the physical model design will be elaborated based on the previous theoretical phases by assigning parameters and values to our design. Let's start with our first logical iteration:

Obviously, in a highly available setup, we should achieve a degree of redundancy in each service within OpenStack. You may wonder about the critical OpenStack services claimed in the first part of this chapter—the database and message queue. Why can't they be separately clustered or packaged on their own? This is a pertinent question. Remember that we are still in the second logical phase where we try to dive slowly and softly into the infrastructure without getting into the details. Besides, we keep on going from general to specific models, where we focus more on the generic details. Decoupling RabbitMQ or MySQL from now on may lead to your design being overlooked. Alternatively, you may risk skipping other generic design topics. On the other hand, preparing a generic logical design will help you to not stick to just one possible combination, since the future physical designs will rely on it.

Note

What about storage

The previous logical figure includes several essentials solutions for a high-scalable and redundant OpenStack environment such as virtual IP (VIP), HAProxy, and Pacemaker. The aforementioned technologies will be discussed in more detail in Chapter 6, Openstack HA and Failover.

Compute nodes are relatively simple as they are intended just to run the virtual machine's workload. In order to manage the VMs, the nova-compute service can be assigned for each compute node. Besides, we should not forget that the compute nodes will not be isolated; a Neutron agent and an optional Ceilometer compute agent may run this node.

What about storage?

You should now have a deeper understanding of the storage types within OpenStack—Swift and Cinder.

However, we did not cover a third-party software-defined storage called Ceph, which may combine or replace either or both of Swift and Cinder.

More details will be covered in Chapter 4, Learning OpenStack Storage – Deploying the Hybrid Storage Model. For now, we will design from a basic point where we have to decide how Cinder and/or Swift will be a part of our logical design.

Ultimately, a storage system becomes more complicated when it faces an exponential growth in the amount of data. Thus, the designing of your storage system is one of the critical steps that is required for a robust architecture.

Depending on your OpenStack business size environment, how much data do you need to store? Will your future PaaS construct a wide range of applications that run heavy-analysis data? What about the planned Environment as a Service (EaaS) model? Developers will need to incrementally back up their virtual machine's snapshots. We need persistent storage.

Don't put all your eggs in one basket. This is why we will include Cinder and Swift in the mission. Many thinkers will ask the following question: If one can be satisfied by ephemeral storage, why offer block storage? To answer this question, you may think about ephemeral storage as the place where the end user will not be able to access the virtual disk associated with its VM when it is terminated. Ephemeral storage should mainly be used in production that takes place in a high-scale environment, where users are actively concerned about their data, VM, or application. If you plan that your storage design should be 100 percent persistent, backing up everything wisely will make you feel better. This helps you figure out the best way to store data that grows exponentially by using specific techniques that allow them to be made available at any time. Remember that the current design applies for medium to large infrastructures. Ephemeral storage can also be a choice for certain users, for example, when they consider building a test environment. Considering the same case for Swift, we have claimed previously that the object storage might be used to store machine images, but when is this the case?

Simply, when you provide the extra hardware that fulfils certain Swift requirements: replication and redundancy.

Running a wide production environment while storing machine images on the local file system is not really good practice. First, the image can be accessed by different services and requested by thousands of users at a time. No wonder the controller is already exhausted by the forwarding and routing of the requests between the different APIs in addition to the computation of each resources through disk I/O, memory, and CPU. Each request will cause performance degradation, but it will not fail! Keeping an image in a filesystem under a heavy load will certainly bring the controller to a high latency and it may fail.

Henceforth, we might consider loosely coupled models, where the storage with a specific performance is considered a best fit for the production environment.

Thus, Swift will be used to store images, while Cinder will be used for persistent volumes for virtual machines (check the Swift controller node):

Obviously, Cinder LVM does not provide any redundancy capability between the Cinder LVM nodes. Losing the data in a Cinder LVM node is a disaster. You may want to perform a backup for each node. This can be helpful, but it will be a very tedious task! Let's design for resiliency. We have put what's necessary on the table. Now, what we need is a glue!

Networking

One of the most complicated system designing steps is the part concerning the network! Now, let's look under the hood to see how all the different services that were defined previously should be connected.

The logical networking design

OpenStack shows a wide range of networking configurations that vary between the basic and complicated. Terms such as Open vSwitch, Neutron coupled with the VLAN segmentation, and VMware NSX with Neutron are not intuitively obvious from their appearance to be able to be implemented without fetching their use case in our design. Thus, this important step implies that you may differ between different network topologies because of the reasons behind why every choice was made and why it may work for a given use case.

OpenStack has moved from simplistic network features to complicated features, but of course there is a reason—more flexibility! This is why OpenStack is here. It brings as much flexibility as it can! Without taking any random network-related decisions, let's see which network modes are available. We will keep on filtering until we hit the first correct target topology:

Network mode

Network specification

Implementation

Nova-network

Flat network design without VMs grouping or isolation

Nova-network FlatDHCP

 

Multiple tenants and VMs Predefined fixed private IP space size

Nova-network VLANManager

Neutron

Multiple tenants and VMs Predefined switches and routers configuration

Neutron VLAN

 

Increased tenants and VM groups

Lower performance

Neutron GRE

The preceding table shows a simple differentiation between two different logical network designs for OpenStack. Every mode shows its own requirements, which is very important and should be taken into consideration before the deployment.

Arguing about our example choice, since we aim to deploy a very flexible large-scale environment, we will toggle the Neutron choice for networking management instead of the nova-network.

Note that it is also possible to keep on going with nova-network, but you have to worry about SPOF. Since the nova-network service can run on a single node (cloud controller) next to other network services such as DHCP and DNS, it is required in this case to implement your nova-network service in a multihost networking model, where cloning such a service in every compute node will save you from a bottleneck scenario. In addition, the choice was made for Neutron, since we started from a basic network deployment. We will cover more advanced features in the subsequent chapters of this book.

We would like to exploit a major advantage of Neutron compared to the nova-network, which is the virtualization of layers 2 and 3 of the OSI network model.

Remember that Neutron will enable us to support more subnets per private network segment. Based on Open vSwitch, you will discover that Neutron is becoming a vast network technology.

Let's see how we can expose our logical network design. For performance reasons, it is highly recommended to implement a topology that can handle different types of traffic by using separated logical networks.

In this way, as your network grows, it will still be manageable in case a sudden bottleneck or an unexpected failure affects a segment.

Network layout

Let us look at the different networks that are needed to operate the OpenStack environment.

The external network

The features of an external or a public network are as follows:

  • Global connectivity

  • It performs SNAT from the VM instances that run on the compute node to the Internet for floating IPs

    Note

    SNAT refers to Source Network Address Translation. It allows traffic from a private network to go out to the Internet. OpenStack supports SNAT through its Neutron APIs for routers. More information can be found at http://en.wikipedia.org/wiki/Network_address_translation.

  • It provides connection to the controller nodes in order to access the OpenStack interfaces

  • It provides virtual IPs (VIPs) for public endpoints that are used to connect the OpenStack services APIs

    Note

    A VIP is an IP address that is shared among several servers. It involves a one-to-many mapping of the IP addresses. Its main purpose is to provide a redundancy for the attached servers and VIPs.

  • It provides a connection to the external services that need to be public, such as an access to the Zabbix monitoring system

    Note

    While using VLANs, by tagging networks and combining multiple networks into one Network Interface Card (NIC), you can optionally leave the public network untagged for that NIC to make the access to the OpenStack dashboard and the public OpenStack API endpoints simple.

The storage network

The main feature of a storage network is that it separates the storage traffic by means of a VLAN isolation.

The management network

An orchestrator node was not described previously since it is not a native OpenStack service. Different nodes need to get IP addresses, the DNS, and the DHCP service where the Orchestrator node comes into play. You should also keep in mind that in a large environment, you will need a node provisioning technique which your nodes will be configured to boot, by using PXE and TFTP.

Thus, the management network will act as an Orchestrator data network that provides the following:

  • Administrative networking tasks

  • OpenStack services communication

  • Separate HA traffic

Note

For a large-scale OpenStack environment, you can use a dedicated network for most of the critical internal OpenStack communication, such as the RabbitMQ messaging and the DB queries, by separating the messaging and database into separate cluster nodes.

The internal VM traffic

The features of the internal virtual machine network are as follows:

  • Private network between virtual machines

  • Nonroutable IPs

  • Closed network between the virtual machines and the network L3 nodes, routing to the Internet, and the floating IPs backwards to the VMs

For the sake of simplicity, we will not go into the details of, for instance, the Neutron VLAN segmentation.

The next step is to validate our network design in a simple diagram:

The physical model design

Finally, we will bring our logical design to life in the form of a physical design. At this stage, we need to assign parameters. The physical design encloses all the components that we dealt with previously in the logical design. Of course, you will appreciate how such an escalation in the design breaks down the complexity of the OpenStack environment and helps us distinguish between the types of hardware specifications that you will need.

We can start with a limited number of servers just to set the first deployment of our environment effectively. First, we will consider a small production environment that is highly scalable and extensible. This is what we have covered previously—expecting a sudden growth and being ready for an exponentially increasing number of requests to service instances.

You have to consider the fact that the hardware commodity selection will accomplish the mission of our massive scalable architecture.

Estimating your hardware capabilities


Since the architecture is being designed to scale horizontally, a commodity cost-effective hardware can be used.

In order to expect our infrastructure economy, it would be great to make some basic hardware calculations for the first estimation of our exact requirements.

Considering the possibility of experiencing contentions for resources such as CPU, RAM, network, and disk, you cannot wait for a particular physical component to fail before you take corrective action, which might be more complicated.

Let's inspect a real-life example of the impact of underestimating capacity planning. A Cloud-hosting company set up two medium servers, one for an e-mail server, and the other to host the official website. The company, which is one of our several clients, grew in a few months and eventually, we ran out of disk space. We expected such an issue to get resolved in a few hours, but it took days. The problem was that all the parties did not make proper use of the "cloud", which points to the "on demand" way. The issue had been serious for both the parties. The e-mail server, which is one of the most critical aspects of a company, had been overloaded and the Mean Time To Repair (MTTR) was increasing exponentially. The Cloud provider did not expect this!

Well, it might be ridiculous to write down your SLA report and describe in your incident management section the reason—we did not expect such growth! Later, after redeploying the virtual machine with more disk space, the e-mail server would irritate everyone in the company with a message saying, "We can authenticate but our e-mails are not being sent! They are queued!" The other guy claimed, "Finally, I have sent an e-mail 2 hours ago and I got a phone call that is received." Unfortunately, the cloud paradigm was designed to avoid such scenarios and bring more success factors that can be achieved by hosting providers. Capacity management is considered a day-to-day responsibility where you have to stay updated with regard to software or hardware upgrades.

Through a continuous monitoring process of service consumption, you will be able to reduce the IT risk and provide a quick response to the customer's needs.

From your first hardware deployment, keep running your capacity management processes by looping through tuning, monitoring, and analysis.

The next stop will take into account your tuned parameters and introduce within your hardware/software the right change, which involves a synergy of the change management process.

Let's make our first calculation based on certain requirements. We aim to run 200 VMs in our OpenStack environment.

CPU calculations

The following are the calculation-related assumptions:

  • 200 virtual machines

  • No CPU oversubscribing

    Note

    Processor oversubscription is defined as the total number of CPUs that are assigned to all the powered-on virtual machines multiplied by the hardware CPU core. If this number is greater than the GHz purchased, the environment is said to be oversubscribed.

  • Number of GHz per core: 2.6 GHz

  • Hyper-threading supported: use factor 2

  • Number of GHz per VM (AVG compute units) = 2 GHz

  • Number of GHz per VM (MAX compute units) = 16 GHz

  • Intel Xeon E5-2648L v2 core CPU = 10

  • CPU sockets per server = 2

  • Number of CPU cores per virtual machine:

    16 / (2.6 * 2) = 3.076

    We need to assign at least 3 CPU cores per VM.

    The formula for its calculation will be as follows: max GHz /(number of GHz per core x 1.3 for hyper-threading)

    Note

    If your CPU does not support hyper-threading, you should multiply the number of GHz per core by 1.3 factors instead of 2.

  • Total number of CPU cores:

    (200 * 2) / 2.6 = 153.846

    We have 153 CPU cores for 200 VMs.

    The formula for calculation will be as follows:

    (number of VMs x number of GHz per VM) / number of GHz per core

  • Number of core CPU sockets:

    153 / 10 = 15.3

    We will need 15 sockets.

    The formula for calculation will be as follows:

    Total number of sockets / number of sockets per server

  • Number of socket servers:

    15 / 2 = 7.5

    You will need around 7 to 8 dual socket servers.

    The formula for calculation will be as follows:

    Total number of sockets / Number of sockets per server

  • The number of virtual machines per server with 8 dual socket servers will be calculated as follows:

    200 / 8 = 25

    The formula for calculation will be as follows:

    Number of virtual machines / number of servers

    We can deploy 25 virtual machines per server.

Memory calculations

Based on the previous example, 25 VMs can be deployed per compute node. Memory sizing is also important to avoid making unreasonable resource allocations.

Let's make an assumption list:

  • 2 GB RAM per VM

  • 8 GB RAM maximum dynamic allocation per VM

  • Compute nodes supporting slots of: 2, 4, 8, and 16 GB sticks

    Keep in mind that it always depends on your budget and needs

  • RAM available per compute node:

    8 * 25 = 200 GB

    Considering the number of sticks supported by your server, you will need around 256 GB installed. Therefore, the total number of RAM sticks installed can be calculated in the following way:

    256 / 16 = 16

    The formula for calculation is as follows:

    Total available RAM / MAX Available RAM-Stick size

The network calculations

To fulfill the plans that were drawn for the network previously, we need to achieve the best performance and networking experience. Let's have a look at our assumptions:

  • 200 Mbits/second is needed per VM

  • Minimum network latency

To do this, it might be possible to serve our VMs by using a 10 GB link for each server, which will give:

10000 Mbits/second / 25VMs = 400 Mbits/second

This is a very satisfying value. We need to consider another factor—highly available network architecture. Thus, an alternative is using two data switches with a minimum of 24 ports for data.

Thinking about growth from now, two 48-port switches will be in place.

What about the growth of the rack size? In this case, you should think about the example of switch aggregation that uses the Virtual Link Trunking (VLT) technology between the switches in the aggregation. This feature allows each server rack to divide their links between the pair of switches to achieve a powerful active-active forwarding while using the full bandwidth capability with no requirement for a spanning tree.

Note

VLT is a layer 2 link aggregation protocol between the servers that are connected to the switches, offering a redundant, load-balancing connection to the core network and replacing the spanning-tree protocol.

Storage calculations

Considering the previous example, you need to plan for an initial storage capacity per server that will serve 25 VMs each.

Let's make the following assumptions:

  • The usage of ephemeral storage for a local drive for the VM

  • 100 GB for storage for each VM's drive

  • The usage of persistent storage for remote attaching volumes to VMs

A simple calculation we provide for 200 VMs a space of 200*100 = 20 TB of local storage.

You can assign 250 GB of persistent storage per VM to have 200*200 = 40 TB of persistent storage.

Therefore, we can conclude how much storage should be installed by the server serving 20 VMs 150*25 = 3.5 TB of storage on the server.

If you plan to include object storage as we mentioned earlier, we can assume that we will need 25 TB of object storage.

Most probably, you have an idea about the replication of object storage in OpenStack, which implies the usage of three times the required space for replication.

In other words, you should consider that the planning of X TB for object storage will be multiplied by three automatically based on our assumption; 25*3 = 75 TB.

Also, if you consider an object storage based on zoning, you will have to accommodate at least five times the needed space. This means; 25 * 5 = 125 TB.

Other considerations, such as the best storage performance using SSD, can be useful for a better throughput where you can invest more boxes to get an increased IOPS.

For example, working with SSD with 20K IOPS installed in a server with eight slot drives will bring you:

(20K * 8) / 25 = 6.4 K Read IOPS and 3.2K Write IOPS

That is not bad for a production starter!

Best practices

What about best practices? Is it just a theory? Does anyone adhere to such formulas? Well, let's bring some best practices under the microscope by exposing the OpenStack design flavor.

In a typical OpenStack production environment, the minimum requirement for disk space per compute node is 300 GB with a minimum RAM of 128 GB and a dual 8-core CPUs.

Let's imagine a scenario where, due to budget limitations, you start your first compute node with costly hardware that has a 600 GB disk space, 16-core CPUs, and 256 GB of RAM.

Assuming that your OpenStack environment continues to grow, you may decide to purchase more hardware—a big one at an incredible price! A second compute instance is placed to scale up.

Shortly after this, you may find out that the demand is increasing. You may start splitting requests into different compute nodes but keep on continuing scaling up with the hardware. At some point, you will be alerted to reaching your budget limit!

There are certainly times when the best practices aren't in fact the best for your design. The previous example illustrated a commonly overlooked requirement for the OpenStack deployment.

If the minimal hardware requirement is strictly followed, it may result in an exponential cost with regard to the hardware expenses, especially for new project starters.

Thus, you may choose what exactly works for you and consider the constraints that exist in your environment.

Keep in mind that the best practices are a user manual or a guideline; apply them when you find what you need to be deployed and how it should be set up.

On the other hand, do not stick to values, but stick to rules. Let's bring the previous example under the microscope again—scaling up shows more risk that may lead to failure than scaling out or horizontally. The reason behind such a design is to allow for a fast scale of transactions at the cost of a duplicated compute functionality and smaller systems at a lower cost.

Transactions and requests in the compute node may grow tremendously in a short time to a point that a single big compute node with 16 core CPUs starts failing performance wise, while a few small compute nodes with 4 core CPUs can proceed to complete the job successfully.

 

Summary


In this chapter, we learned about the design characteristics of OpenStack and the core components of such an ecosystem. We have also highlighted the design considerations around OpenStack and discussed the different possibilities of extending its functionalities. Now, we have a good starting point for the purpose of bringing the other incubated projects into production. You may notice that our first basic design covers most of the critical issues that one can face during the production. In addition, it is important to note that this first chapter might be considered as a main guideline for the next parts of this book. The next chapters will treat each concept and technology solution cited in this chapter in more detail by expanding the first basic design. Thus, the next chapter will take you from this generic architecture overview theory to a practical stage. Basically, you will learn how to deploy and expand what was designed by adopting an efficient infrastructure deployment approach—the DevOps style.

About the Author
  • Omar Khedher

    Omar Khedher is a systems and network engineer. He has been involved in several cloud-related project based on AWS and OpenStack. He spent few years as cloud system engineer with talented teams to architect infrastructure in the public cloud at Fyber in Berlin. Omar wrote few academic publications for his PhD targeting cloud performance and was the author of Mastering OpenStack, OpenStack Sahara Essentials and co-authored the second edition of the Mastering OpenStack books by Packt.

    Browse publications by this author
Mastering OpenStack
Unlock this book and the full library FREE for 7 days
Start now