In this article by Nilesh Nimkar, the author of the book Practical OneOps, we will look at a few steps that will help you manage both kinds of installations.

(For more resources related to this topic, see here.)

Upgrading OneOps with minimal downtime

As mentioned preceding you might be running a stand-alone instance of OneOps or an enterprise instance. For both types, you will have to use different strategies to update the OneOps code. In general, it is easier and straightforward to update a standalone instance rather than a enterprise instance. Your strategies to update and the branch or tag of code that you will use will also differ based on kind of system that you have.

Updating a standalone OneOps install

If you have a standalone installation, it's possible you created it in one of the several ways. You either installed it using Vagrant or you used the Amazon Machine Images (AMI). It is also possible that you built your own installation on another cloud like Google, Azure or Rackspace. Irrespective of the way your instance of OneOps was created, the steps to upgrade it remains the same and are very simple. When you setup OneOps two scripts are run by the setup process, oo-preqs.sh and oo_setup.sh. Once an instance is setup, both these scripts are also copied to /home/oneops directory on the server. Of these two scripts, oo_setup.sh can be used to update an OneOps standalone install at any time.

managing-your-oneops-img-0

You need an active internet connection to upgrade OneOps.

You can see the list of releases in the OneOps git repository for any of the OneOps components. For example releases for sensor can be seen at

https://github.com/oneops/sensor/releases

Release candidates have a RC1 at the end and stable releases have — STABLE at the end. If you want to install a particular release, like 16.09.29-RC1 then invoke the script and pass the release number as the argument. Passing master will build the master branch and will build and install the latest and greatest code. This is great to get all the latest features and bug fixes but this will also make your installation susceptible to new bugs.

./oo_setup.sh master

Ensure that the script is invoked as root. Instead of running as sudo, it helps if you are logged in as root with:

sudo su -

After the script is invoked, it will do a bunch of things to upgrade your OneOps.

First it sets three variables:

OO_HOME which is set to /home/oneops
BUILD_BASE which is set to /home/oneops/build
GITHUB_URL which is set to https://github.com/oneops

All the builds will take place under BUILD_BASE.

Under the BUILD_BASE, the script then checks if dev-tools exists. If it does, it updates it to the latest code by doing a git pull on it. If it does not, then it does a git clone and gets a latest copy from GitHub. The dev-tools repository has a set of tools for core OneOps developers. The most important of which are under the setupscripts sub directory.

The script then copies all the scripts from under the setupscripts sub directory to OO_HOME directory. Once done it invokes the oneops_build.sh script. If you passed any build tag to oo_setup.sh script, that tag is passed on to oneops_build.sh script as is.

The oneops_build.sh script is a control script so the speak. What it means is in turn it will invoke a bunch of other scripts which will shutdown services, pull and build the OneOps code, install the built code and then restart the services once done.

Most of the scripts that run henceforth set and export a few variables again. Namely OO_HOME, BUILD_BASE and GITHUB_URL. Another variable that is set is SEARCH_SITE whose value is always set to localhost.

The first thing the script does is to shutdown Cassandra on the server to conserve memory on the server and reduce the load during the build since the build itself is very memory and CPU intensive. It also marks the start time of the script. Next it runs the install_build_srvr.sh script by passing the build tag that was passed to the original script. This is a very innovative script which does a quick installation of Jenkins, installs various plugins to Jenkins, runs various jobs to do builds, monitors the jobs for either success or failure and then shuts down Jenkins all in an automated fashion.

If you have your own Jenkins installation, I highly recommend you read through this script as this will give you great ideas for your own automation of installation, monitoring and controlling of Jenkins.

As mentioned in preceding, the install_build_srvr.sh script sets a bunch of variables first. It then clones the git repository called build-wf from the BUILD_BASE if it does not already exist. If it does exist, it does a git pull to update the code. Outside of a Docker container, the build-wf is a most compact Jenkins installation you will find. You can check it out at the URL following:

https://github.com/oneops/build-wf

It consists of a Rakefile to download and install Jenkins and its associated plugins, a config.xml that configures it, a plugins.txt that provides a list of plugins and a jobs directory with all the associated jobs in it.

If the script detects a Jenkins server that is already present and a build is already in progress, it cleanly attempts to shut down the existing Jenkins server. It then attempts to install the latest Jenkins jar using the following command:

rake install

Once the installation is done a dist directory is created to store the resulting build packages. After setting the path to local maven, the server is brought up using the command following:

rake server

If you did not specify what revision to build, the last stable build is used.

managing-your-oneops-img-1

The actual release revision itself is hardcoded in this script. Every time a stable release is made, this file is manually changed and the release version is updated and the file is checked in. After the server comes up, it is available at port 3001 if you are running on any cloud. If you are running a Vagrant setup, it will be mapped to port 3003. If you connect to one of these ports on your machines via your browser, you should be able to see your Jenkins in action.

managing-your-oneops-img-2

The script calls the job oo-all-oss via curl using Jenkins REST API. The oo-all-oss is a master job that in turn builds all of OneOps components, including the database components. Even the installation of Jenkins plugins is done via a Jenkins job called Jenkins-plugin. The script then goes into an infinite loop and keeps checking the job status till the jobs are done. Once all jobs are finished or if an error is encountered, the server is shutdown using

rake stop

Once the build completes, the Cassandra sever is started again. Once it starts the Cassandra service the script start deploying all the built artifacts. The first artifact to be deployed is the database artifact. For that it runs the init_db.sh script. This script first creates the three main schemas, namely kloopzapp, kloopzdb and activitidb. Since you are upgrading an existing installation, this script may very well give an error. Next the script will run a bunch of database scripts which will create tables, partitions, functions and other ddl statements. Again since you are upgrading any errors here can be safely ignored.

Next to be installed, is the display. The script backs up the current display from /opt/oneops/app to /opt/oneops/~app in case a rollback is needed. It then copies and untars the newly built package. Using rake, it detects if the Rails database is setup. If the database version is returned as 0 then rake db:setup command is run to setup a brand new database. Or else rake db:migrate command is run to migrate and upgrade the database.

The next component to get installed is amq. This is done by calling the script deploy_amq.sh. The amq gets installed in the directory /opt/activemq. Before installation the activemq service is stopped. The script the copies over the amq-config and amqplugin-fat jar. It also takes a backup of the old configuration and overwrites it with new configuration. After that the service is started again.

After AMQ, the script installs all the webapps under Tomcat. Tomcat itself is installed under /usr/local/tomcat7 and all the webapps get installed under /usr/local/tomcat7/webapps. Before copying over all the war files, the tomcat service is stopped. The script also creates directories that the controller, publisher and transmitter rely on for successful operation. Once the wars are copied Tomcat service is started again. Tomcat, at this point will automatically deploy the services.

After the web services are deployed the script deploys the search service. Before deployment, the search-consumer service is stopped. The search.jar and the startup script is then copied to /opt/oneops-search directory and the search-consumer service is started again.

As a final step in deployment, the OneOps Admin gem is deployed. The OneOps Admin gem contains two commands that help administer OneOps from the backend. These commands are inductor and circuit. The script then either updates the circuit repository if it exists or clones it if it does not from https://github.com/oneops/circuit-oneops-1 and installs it. After successfully installing the circuit an inductor is created using the shared queue using the command following. This command is also a great reference for you should you wish to create your own inductors for testing.

inductor add --mqhost localhost 
--dns on 
--debug on 
--daq_enabled true 
--collector_domain localhost 
--tunnel_metrics on 
--perf_collector_cert /etc/pki/tls/logstash/certs/logstash-forwarder.crt 
--ip_attribute public_ip 
--queue shared 
--mgmt_url http://localhost:9090 
--logstash_cert_location /etc/pki/tls/logstash/certs/logstash-forwarder.crt 
--logstash_hosts vagrant.oo.com:5000 
--max_consumers 10 
--local_max_consumers 10 
--authkey superuser:amqpass 
--amq_truststore_location /opt/oneops/inductor/lib/client.ts 
--additional_java_args "" 
--env_vars ""

After installing the inductor, the display service is started and the standalone OneOps upgrade is complete.

Updating a Enterprise OneOps Install

Updating an enterprise OneOps install takes a different approach for a few different reasons. First of all in an enterprise install all the services get installed on their own instances. Secondly, since an enterprise install caters to an entire enterprise, stability, availability and scalability are always a issue. So here are a few things that you should remember before you upgrade your enterprise install.

Ensure you have your own Jenkins build server and it uploads the artifacts to your own Nexus repository. Ensure this Nexus repository is configured in the OneOps that manages your enterprise OneOps installation.
Ensure you use a stable build and not a Release Candidate or the master build. This way you will have a well tested build for your enterprise.
Make sure your backup server is configured and OneOps is being regularly backed up.
Although the downtime should be minimal to none, make sure you do the upgrade during the least busy time to avoid any unforeseen events.
If you have more than one OneOps installations, it is prudent to direct traffic to the second installation while one is being updated.

With these things in mind the sequence for updating the various components is pretty much the same as updating a standalone OneOps install. However the steps involved are a bit different. The first thing you need to do, as mentioned preceding, is to choose an appropriate stable release that you want to deploy. Once you choose that, go to OneOps that manages your enterprise installation and click on the OneOps assembly. Select Design from the left hand side menu and then select Variables from the center screen. From the numerous variables you see, the one that you want to modify is called Version. Click on it and then click Edit in the upper right hand corner.

managing-your-oneops-img-3

Click Save. Once the changes are saved, you can go ahead and commit your changes. You will notice that all the components derive their local version variable from the global version variable. At this point if you click on Transition, and attempt a deployment, OneOps will generate a deployment plan which will have the latest revision of all the components that need the upgrade. Go ahead and click Deploy. OneOps should do the rest.

Configuring database backups

As seen so far, OneOps has a complex architecture and relies on many databases to provide optimum functionality as we have seen before. Again as with deployment, for database backup the steps needed to backup a single machine install and an enterprise install are different.

Backup a standalone OneOps install

For a standalone install the three main postgres databases you need to backup are activitidb, kloopzapp, and kloopzdb. You can access these databases directly by logging in to your OneOps server and then doing a sudo as the postgres user.

# sudo su – postgres
-bash-4.2$ psql
Postgres=# l

Once you issue these commands you can see these databases listed along with the default postgres database. Now you can design chef recipes to take backups or install puppet or ansible and automate the backup process. However in accordance with the KISS principle the simplest way you can setup backups are to use the built in postgres command The pg_dump command for single database backup or pg_dumpall for a all database backups. You can add a cron job to run these commands nightly and another cron job to scp the dumped files and delete the local copies.

The KISS is an acronym coined by the US Navy in 1960 for a design principle that states that systems work best if the design is kept simple and unnecessary complexity is avoided. Please look it up online. Search for KISS Principle.

As time goes by your database size will also increase. To tackle that you can pipe your backup commands directly to a compression program.

pg_dumpall | gzip filename.gz

Similarly you can restore the database with using the exact reverse of that command.

gunzip filename.gz | pg_restore

Backup an enterprise OneOps Install

Again an enterprise OneOps install, as opposed to a standalone OneOps install comes with backups built in. To make the backups work, you have to setup a few things correctly to begin with. Firstly you have to setup the BACKUP-HOST global variable to point to a host that has plenty of storage attached to it.

managing-your-oneops-img-4

Once the variable is set, the value trickles down to the database components as local variables derived from the global variable. All backups taken are then copied to this host. For example, following is the screenshot for this variable for CMSDB:

managing-your-oneops-img-5

Once this is done, OneOps sets up automated jobs for database backups. These jobs are actually shell scripts which are wrappers over chef recipes for the database snapshot backup.

Summary

In this article we saw a few steps that will help you manage both kinds of installations. However as DevOps you will have to manage not only assemblies but the OneOps system itself. Depending on the size of the organization and the complexity of the deployments you handle, you may opt to choose either a single server installation or an enterprise install.