OpenStack Trove truly and remarkably is a treasure or collection of valuable things, especially for open source lovers like us and, of course, it is an apt name for the Database as a Service (DBaaS) component of OpenStack. In this book, we shall see why this component shows the potential and is on its way to becoming one of the crucial components in the OpenStack world.
In this chapter, we will cover the following:
DBaaS and its advantages
An introduction to OpenStack's Trove project and its components
Data is a key component in today's world, and what would applications do without data? Data is very critical, especially in the case of businesses such as the financial sector, social media, e-commerce, healthcare, and streaming media. Storing and retrieving data in a manageable way is absolutely key. Databases, as we all know, have been helping us manage data for quite some time now.
Databases form an integral part of any application. Also, the data-handling needs of different type of applications are different, which has given rise to an increase in the number of database types. As the overall complexity increases, it becomes increasingly challenging and difficult for the database administrators (DBAs) to manage them.
DBaaS is a cloud-based service-oriented approach to offering databases on demand for storing and managing data. DBaaS offers a flexible and scalable platform that is oriented towards self-service and easy management, particularly in terms of provisioning a business' environment using a database of choice in a matter of a few clicks and in minutes rather than waiting on it for days or even, in some cases, weeks.
The fundamental building block of any DBaaS is that it will be deployed over a cloud platform, be it public (AWS, Azure, and so on) or private (VMware, OpenStack, and so on). In our case, we are looking at a private cloud running OpenStack. So, to the extent necessary, you might come across references to OpenStack and its other services, on which Trove depends.
XaaS (short for Anything/Everything as a Service, of which DBaaS is one such service) is fast gaining momentum. In the cloud world, everything is offered as a service, be it infrastructure, software, or, in this case, databases. Amazon Web Services (AWS) offers various services around this: the Relational Database Service (RDS) for the RDBMS (short for relational database management system) kind of system; SimpleDB and DynamoDB for NoSQL databases; and Redshift for data warehousing needs.
The OpenStack world was also not untouched by the growing demand for DBaaS, not just by users but also by DBAs, and as a result, Trove made its debut with the OpenStack release Icehouse in April 2014 and since then is one of the most popular advanced services of OpenStack.
It supports several SQL and NoSQL databases and provides the full life cycle management of the databases.
In any organization, most of their DBAs' time is wasted in mundane tasks such as creating databases, creating instances, and so on. They are not able to concentrate on tasks such as fine-tuning SQL queries so that applications run faster, not to mention the time taken to do it all manually (or with a bunch of scripts that need to be fired manually), so this in effect is wasting resources in terms of both developers' and DBAs' time. This can be significantly reduced using a DBaaS.
With DBaaS, databases that are provisioned by the system will be compliant with standards as there is very little human intervention involved. This is especially helpful in the case of heavily regulated industries. As an example, let's look at members of the healthcare industry. They are bound by regulations such as HIPAA (short for Health Insurance Portability and Accountability Act of 1996), which enforces certain controls on how data is to be stored and managed. Given this scenario, DBaaS makes the database provisioning process easy and compliant as they only need to qualify the process once, and then every other database coming out of the automated provisioning system is then compliant with the standards or controls set.
Since DBaaS is cloud based, which means there will be a lot of automation, administration becomes that much more automated and easier. Some important administration tasks are backup/recovery and software upgrade/downgrade management. As an example, with most databases, we should be able to push configuration modifications within minutes to all the database instances that have been spun out by the DBaaS system. This ensures that any new standards being thought of can easily be implemented.
Scaling (up or down) becomes immensely easy, and this reduces resource hogging, which developers used as part of their planning for a rainy day, and in most cases, it never came. In the case of DBaaS, since you don't commit resources upfront and only scale up or down as and when necessary, resource utilization will be highly efficient.
These are some of the advantages available to organizations that use DBaaS. Some of the concerns and roadblocks for organizations in adopting DBaaS, especially in a public cloud model, are as follows:
In contrast to public cloud-based DBaaS, concerns regarding data security, performance, and visibility reduce significantly in the case of private DBaaS systems such as Trove. In addition, the benefits of a cloud environment are not lost either.
OpenStack Trove, which was originally called Red Dwarf, is a project that was initiated by HP, and many others contributed to it later on, including Rackspace. The project was in incubation till the Havana release of OpenStack.
It was formally introduced in the Icehouse release in April 2014, and its mission is to provide scalable and reliable cloud DBaaS provisioning functionality for relational and non-relational database engines.
Big-tent is a new approach that allows projects to enter the OpenStack code namespace. In order for a service to be a big-tent service, it only needs to follow some basic rules, which are listed here. This allows the projects to have access to the shared teams in OpenStack, such as the infrastructure teams, release management teams, and documentation teams. The project should:
Align with the OpenStack mission
Subject itself to the rulings of the OpenStack Technical Committee
Support Keystone authentication
Be completely open source and open community based
At the time of writing this book, the adoption and maturity levels are as shown here:
The maturity index is 1 on a scale of 1 to 5. It is derived from the following five aspects:
The presence of an installation guide
Whether the Adoption percentage is greater or lesser than 75
Stable branches of the project
Whether it supports seven or more SDKs
Corporate diversity in the team working on the project
Without further ado, let's take a look at the architecture that Trove implements in order to provide DBaaS.
The AMQP (short for Advanced Message Queuing Protocol) message bus brokers the interactions between the task manager, API, guest agent, and conductor. This component ensures that Trove can be installed and configured as a distributed system.
This component is responsible for providing the RESTful API with JSON and XML support. This component can be called the face of Trove to the external world since all the other components talk to Trove using this. It talks to the task manager for complex tasks, but it can also talk to the guest agent directly to perform simple tasks, such as retrieving users.
The task manager is the engine responsible for doing the majority of the work. It is responsible for provisioning instances, managing the life cycle, and performing different operations. The task manager normally sends common commands, which are of an abstract nature; it is the responsibility of the guest agent to read them and issue database-specific commands in order to execute them.
The guest agent runs inside the Nova instances that are used to run the database engines. The agent listens to the messaging bus for the topic and is responsible for actually translating and executing the commands that are sent to it by the task manager component for the particular datastore.
Let's also look at the different types of guest agents that are required depending on the database engine that needs to be supported. The different guest agents (for example, the MySQL and PostgreSQL guest agents) may even have different capabilities depending on what is supported on the particular database. This way, different datastores with different capabilities can be supported, and the system is kept extensible.
The conductor component is responsible for updating the Trove backend database with the information that the guest agent sends regarding the instances. It eliminates the need for direct database access by all the guest agents for updating information. This is like the way the guest agent also listens to the topic on the messaging bus and performs its functions based on it.
The following diagram can be used to illustrate the different components of Trove and also their interaction with the dependent services:
Datastore is the term used for the RDBMS or NoSQL database that Trove can manage; it is nothing more than an abstraction of the underlying database engine, for example, MySQL, MongoDB, Percona, Couchbase, and so on.
This is linked to the datastore and defines a set of packages to be installed or already installed on an image. As an example, let's take MySQL 5.5. The datastore version will also link to a base image (operating system) that is stored in Glance.
The configuration parameters that can be modified are also dependent on the datastore and the datastore version.
A configuration group is a bunch of options that you can set. As an example, we can create a group and associate a number of instances to one configuration group, thereby maintaining the configurations in sync.
Normally, it's a good idea to have a high memory-to-CPU ratio as a flavor for running database instances.
The following diagram shows these different terminologies, as a quick summary. Users or applications connect to databases, which reside in instances. The instances run in Nova but are instantiations of the Datastore version belonging to a Datastore. Just to explain this a little further, say we have two versions of MySQL that are being serviced. We will have one datastore but two datastore versions, and any instantiation of that will be called an instance, and the actual MySQL database that will be used by the application will be called the database (shown as DB in the diagram).
In the following diagram, we have represented all the components of Trove (the API, task manager, and conductor) except the Guest Agent databases as Trove Controller. The Guest Agent code is different for every datastore that needs to be supported and the Guest Agent for that particular datastore is installed on the corresponding image of the datastore version.
The guest agents by default have to implement some of the basic actions for the datastore, namely, create, resize, and delete, and individual guest agents have extensions that enable them to support additional features just for that datastore.
The following diagram should help us understand the command proxy function of the guest agent. Please note that the commands shown are only indicative, and the actual commands will vary.
At the time of writing this book, Trove's guest agents are installable only on Linux; hence, only databases on Linux systems are supported. Feature requests (https://blueprints.launchpad.net/trove/+spec/mssql-server-db-support) were created for the ability to create a guest agent for Windows and support Microsoft SQL databases, but they have not yet been approved at the time of writing this and might be a remote possibility.
Trove supports various databases; the following table shows the databases supported by this service at the time of writing this. Automated installation is available for all the different databases, but there is some level of difference when it comes to the configuration capabilities of Trove with respect to different databases.
This has lot to do with the lack of a common configuration base among the different databases. At the time of writing this book, MySQL and MariaDB have the most configuration options available, as shown in this list:
Horizon/Trove CLI requests a new database instance and passes the datastore name and version, along with the flavor ID and volume size as mandatory parameters. Optional parameters such as the configuration group, AZ, replica-of, and so on can also be passed.
The Trove API requests Nova for an instance with the particular image and a Cinder volume of a specific size to be added to the instance.
The Nova instance boots and follows these steps:
The cloud-init scripts are run (like all other Nova instances).
The configuration files (for example,
trove-guestagent.conf) are copied down to the instance.
The guest agent is installed.
The Trove API will also have sent the request to the task manager, which will then send the
preparecall to the message bus topic.
After booting, the guest agent listens to the message bus for any activities for it to do, and once it finds a message for itself, it processes the
preparecommand and performs the following functions:
Installing the database distribution (if not already installed on the image)
Creating the configuration file with the default configuration for the database engine (and any configuration from the configuration groups associated overriding the defaults)
Starting the database engine and enabling auto-start
Polling the database engine for availability (until the database engine is available or the timeout is reached)
Reporting the status back to the Trove backend using the Trove conductor
The Trove manager reports back to the API and the status of the machine is changed.
Dev/test databases are an absolute killer feature, and almost all companies that start using Trove will definitely use it for their dev/test environments. This provides developers with the ability to freely create and dispose of database instances at will. This ability helps them be more productive and removes any lag from when they want it to when they get it.
The capability of being able to take a backup, run a database, and restore the backup to another server is especially key when it comes to these kinds of workloads.
Support for Neutron: Now we can use both Nova-network and Neutron for networking purposes.
Replication: MySQL master/slave replication was added. The API also allowed us to detach a slave for it to be promoted.
Clustering: MongoDB cluster support was added.
Configuration group improvements:
The functionality of using a default configuration group for a datastore version was added. This allows us to build the datastore version with a base configuration of your company standards.
Basic error checking was added to configuration groups.
Configuration groups for Redis and MongoDB
Cluster support for Redis and MongoDB
Percona XtraDB cluster support
Backup and restore for a single instance of MongoDB
User and database management for MongoDB
Horizon support for database clusters
A management API for datastores and versions
The ability to deploy Trove instances in a single admin tenant so that the Nova instances are hidden from the user
In this chapter, we were introduced to the basic concepts of DBaaS and how Trove can help with this. With several changes being introduced and a score of one on five with respect to maturity, it might seem as if it is too early to adopt Trove. However, a lot of companies are giving Trove a go in their dev/test environments as well as for some web databases in production, which is why the adoption percentage is steadily on the rise.
A few companies that are using Trove today are giants such as eBay, who run their dev/test Test databases on Trove; HP Helion Cloud, Rackspace Cloud, and Tesora (which is also one of the biggest contributors to the project) have DBaaS offerings based on the Trove component.
Trove is increasingly being used in various companies, and it is helping in reducing DBAs' mundane work and improving standardization. In the next chapter, we will see how to quickly set up Trove using DevStack scripts.