In this article by Nikhil Pathania, the author of the book Learning Continuous Integration with Jenkins, we'll learn how to achieve Continuous Integration. Implementing Continuous Integration involves using various DevOps tools. Ideally, a DevOps engineer is responsible for implementing Continuous Integration. The current Article introduces the readers to the constituents of Continuous Integration and the means to achieve it.
(For more resources related to this topic, see here.)
DevOps stands for development operations, and the people who manage these operations are called DevOps engineers. All the following mentioned tasks fall under development operations:
- Build and release management
- Deployment management
- Version control system administration
- Software configuration management
- All sorts of automation
- Implementing continuous integration
- Implementing continuous testing
- Implementing continuous delivery
- Implementing continuous deployment
- Cloud management and virtualization
I assume that the preceding tasks need no explanation. A DevOps engineer accomplishes the previously mentioned tasks using a set of tools; these tools are loosely called DevOps tools (Continuous Integration tools, agile tools, team collaboration tools, defect tracking tools, continuous delivery tool, cloud management tool, and so on).
A DevOps engineer has the capability to install and configure the DevOps tools to facilitate development operations. Hence, the name DevOps.
Use a version control system
This is the most basic and the most important requirement used to implement Continuous Integration. A version control system, or sometimes it's also called a revision control system, is a tool used to manage your code history. It can be centralized or distributed. Some of the famously centralized version control systems are SVN and IBM Rational ClearCase. In the distributed segment, we have tools such as Git. Ideally, everything that is required to build software must be version controlled. A version control tool offers many features, such as labeling, branching, and so on.
When using a version control system, keep the branching to the minimum. Few companies have only one main branch and all the development activities happen on that. Nevertheless, most of the companies follow some branching strategies. This is because there is always a possibility that part of a team may work on a release and others may work on another release. At other times, there is a need to support the older release versions. Such scenarios always lead companies to use multiple branches.
For example, imagine a project that has an Integration branch, a release branch, a hotfix branch, and a production branch. The development team will work on the release branch. They check-out and check-in code on the release branch. There can be more than one release branch where development is running in parallel. Let's say sprint 1 and sprint 2. Once sprint 2 is near its completion (assuming that all the local builds on the sprint 2 branch were successful), it is merged to the Integration branch. Automated builds run when there is something checked-in on the Integration branch, and the code is then packaged and deployed in the testing environments. If the testing passes with flying colors and the business is ready to move the release to production, then automated systems take the code and merge it with the production branch.
Typical branching strategies
From here, the code is then deployed in production. The reason for maintaining a separate branch for production comes from the desire to maintain a neat code with less number of versions. The production branch is always in sync with the hotfix branch. Any instant fix required on the production code is developed on the hotfix branch. The hotfix changes are then merged to the production as well as the Integration branch. The moment sprint 1 is ready, it is first rebased with the Integration branch and then merged into it. And it follows the same steps thereafter.
An example to understand VCS
Let's say I add a file to the Profile.txt version control with some initial details, such as the name, age, and employee ID.
To modify the file, I have to check out the file. This is more like reserving the file for edit. Why reserve? In a development environment, a single file may be used by many developers. Hence, in order to facilitate an organized use, we have the option to reserve a file using the check-out operation. Let's assume that I do a check-out on the file and do some modifications by adding another line.
After the modification, I perform a check-in operation. The new version contains the newly added line. Similarly, every time you or someone else modifies a file, a new version gets created.
Types of version control systems
We have already seen that a version control system is a tool used to record changes made to a file or set of files over time. The advantage is that you can recall specific versions of your file or a set of files. Almost every type of file can be version controlled. It's always good to use a Version Control System (VCS) and almost everyone uses it nowadays. You can revert an entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover.
Looking back at the history of version control tools, we can observe that they can be divided into three categories:
- Local version control systems
- Centralized version control systems
- Distributed version control systems
Centralized version control systems
Initially, when VCS came into existence; some 40 years ago, they were mostly personal. Such as the one that comes with Microsoft Office Word, wherein you can version control a file you are working on. The reason was that in those times software development activity was minuscule in magnitude and was mostly done by individuals. But, with the arrival of large software development teams working in collaboration, the need for a centralized VCS was sensed. Hence, came VCS tools, such as Clear Case and Perforce. Some of the advantages of centralized VCS are as follows:
- All the code resides on a centralized server. Hence, it's easy to administrate and provides a greater degree of control.
- These new VCS also bring with them some new features, such as labeling, branching, baselining to name a few, which help people collaborate better.
- In a centralized VCS, the developers should be always connected to the network. As a result, the VCS at any given point of time always represents the updated code.
The following diagram illustrates a centralized VCS:
A centralized version control system
Distributed version control systems
Another type of VCS is the distributed VCS. Here, there is a central repository containing all the software solution code. Instead of creating a branch, the developers completely clone the central repository on their local machine and then create a branch out of the local clone repository. Once he is done with his work, the developer first merges his branch with the Integration branch, and then he syncs the local clone repository with the central repository. You can argue that it is a combination of a local VCS plus a central VCS. An example of a distributed VCS is Git.
A distributed version control system
Use a repository tools
As part of the software development life cycle, the source code is continuously built into binary artifacts using Continuous Integration. Therefore, there should be a place to store these built packages for later use. The answer is to use a repository tool. But, what is a repository tool?
It's a version control system for binary files. Do not confuse this with the version control system discussed in the previous sections. The former is responsible for versioning the source code and the lateral for binary files, such as .rar, .war, .exe, .msi, and so on. For example, the Maven plugin always downloads the plugins required to build the code into a folder. Rather than downloading the plugins again and again, they can be managed using a repository tool.
As soon as a build gets created and passes all the checks, it should be uploaded to the repository tool. From there, if the developers and testers can manually pick them, deploy them, and test them, or if the automated deployment is in place, then the build is automatically deployed in the respective test environment. So, what's the advantage of using a build repository?
A repository tool does the following:
- Every time a build gets generated, it is stored in a repository tool. There are many advantages of storing the build artifacts. One of the most important advantages is that the build artifacts are located in a centralized location from where they can be accessed when needed.
- It can store third-party binary plugins, modules that are required by the build tools. Hence, the build tool need not download the plugins every time a build runs. The repository tool is connected to the online source and keeps updating the plugin repository.
- It records what, when, and who created a build package.
- It creates a staging area to manage releases better. This also helps in speeding up the Continuous Integration process.
- In a Continuous Integration environment, each build generates a package and the frequency at which the build and packaging happen is high. As a result, there is a huge pile of packages. Using a repository tool makes it possible to store all the packages in one place. In this way, developers get the liberty to choose what to promote and what not to promote in higher environments.
Use a Continuous Integration tool
What is a Continuous Integration tool? It is nothing more than an orchestrator. A continuous integration tool is at the center of the Continuous Integration system and is connected to the version control system tool, build tool, repository tool, testing and production environments, quality analysis tool, test automation tool, and so on. All it does is an orchestration of all these tools. There are many Continuous Integration tools, Build Forge, Bamboo, Team city to name a few. But, the prime focus of our topic is Jenkins.
Basically, Continuous Integration tools consist of various pipelines. Each pipeline has its own purpose. There are pipelines used to take care of Continuous Integration. Some take care of testing, some take care of deployments, and so on. Technically, a pipeline is a flow of jobs. Each job is a set of tasks that run sequentially. Scripting is an integral part of a Continuous Integration tool that performs various kinds of tasks. The tasks may be as simple as copying a folder/file from one location to another, or it can be a complex Perl script used to monitor a machine for file modification. Nevertheless, the script is getting replaced by the growing number of plugins available in Jenkins. Now, you need not script to build Java code; there are plugins available for it. All you need to do is install and configure a plugin to get the job done. Technically, plugins are nothing but small modules written in Java, but they remove the burden of scripting from the developers' head.
Creating a self-triggered build
The next important thing is the self-triggered automated build. Build automation is simply a series of automated steps that compile the code and generate executables. The build automation can take help of build tools, such as Ant and Maven. Self-triggered automated builds are the most important parts of a Continuous Integration system. There are two main factors that call for an automated build mechanism:
- Catch integration or code issue as early as possible
There are projects where 100 to 200 builds happen per day. In such cases, speed plays an important factor. If the builds are automated, then it can save a lot of time. Things become even more interesting if the triggering of the build is made self-driven without any manual intervention. An auto-triggered build on very code change further saves time.
When builds are frequent and fast, the probability of finding errors (a build error, compilation error, and integration error) is also more and fast.
Automate the packaging
There is a possibility that a build may have many components. Let's take, for example, a build that has the .rar file as an output. Along with this, it has some UNIX configuration files, release notes, some executables, and also some database changes. All these different components need to be together. The task of creating a single archive or a single media out of many components is called packaging. This again can be automated using the Continuous Integration tools and can save a lot of time.
Using build tools
IT projects can be on various platforms, such as Java, .NET, Ruby on Rails, C, and C++ to name a few. Also, in a few places, you may see a collection of technologies. No matter what, every programming language, excluding the scripting languages, has compilers that compile the code from a high-level language to a machine-level language. Ant and Maven are the most common build tools used for projects based on Java. For the .NET lovers, there is MSBuild and TFS build. Coming to the Unix and Linux world, you have make and omake and also clear make in case you are using IBM Rational ClearCase as the version control tool. Let's see the important ones.
Maven is a build tool used mostly to compile Java code. The output is mostly jar files, or in some cases, war files depending on the requirements. It uses Java libraries and Maven plugins in order to compile the code. The code to be built is described using an XML file that contains information about the project being built, dependencies, and so on.
Maven can be easily integrated into Continuous Integration tools, such as Jenkins, using plugins.
MSBuild is a tool used to build Visual Studio projects. MSBuild is bundled with Visual Studio. MSBuild is a functional replacement for nmake. MSBuild works on project files, which have the XML syntax, similar to that of Apache Ant. Its fundamental structure and operation are similar to that of UNIX make utility. The user defines what will be the input (the various source codes), and the output (usually, a .exe or .msi). But, the utility itself decides what to do and the order in which to do it.
Automating the deployments
Imagine a mechanism where the automated packaging has produced a package that contains .war files, database scripts, and some UNIX configuration files. Now, the task here is to deploy all the three artifacts into their respective environments. The .war file must be deployed in the application server. The UNIX configuration files should sit on the respective UNIX machine, and lastly, the database scripts should be executed in the database server. The deployment of such packages containing multiple components is usually done manually in almost every organization that does not have automation in place. The manual deployment is slow and prone to human errors. This is where the automated deployment mechanism is helpful.
Automated deployment goes hand in hand with the automated build process. The previous scenario can be achieved using an automated build and deployment solution that builds each component in parallel, packages them, and then deploys them in parallel. Using tools such as Jenkins, this is possible. But, the previous idea introduces some challenges, which are as follows:
- There is a considerable amount of scripting required to orchestrate build packaging and deployment of a release containing multiple components. These scripts by itself are huge code to maintain that require time and resources.
- In most of the cases, deployment is not as simple as placing files in a directory. For example, imagine that the deployment requires you to install a Java-based application onto a Linux machine. Before we go ahead and deploy the application, it is required for you to have the correct version of Java installed on the target machine, or what if the machine is a new one and doesn't have Java at all. Also, the deployment automation mechanism should make sure that the target machine has the correct configuration before the artifacts are deployed.
Most of the preceding challenges can be handled using a Continuous Integration tool, such as Jenkins. The field of managing the configuration on n number of machines is called configuration management. There are tools, such as Chef and Puppet to do this.
Automating the testing
Testing is an important part of a software development life cycle. In order to maintain quality software, it is necessary that the software solution goes through various test scenarios. Giving less importance to testing can result in customer dissatisfaction and a delayed product.
Since testing is a manual, time-consuming, and repetitive task, automating the testing process can significantly increase the speed of software delivery. However, automating the testing process is a bit difficult than automating the build, release, and deployment processes. It usually takes a lot of efforts to automate nearly all the test cases used in a project. It is an activity that matures over time.
Hence, when we begin to automate the testing, we need to take a few factors into consideration. Test cases that are of great value and easy to automate must be considered first. For example, automate the testing where the steps are the same, but they run every time with different data. Also, automate the testing where a software functionality is tested on various platforms. Also, automate the testing that involves a software application running on different configurations.
Previously, the world was mostly dominated by the desktop applications. Automating the testing of a GUI-based system was quite difficult. This called for scripting languages where the manual mouse and keyboard entries were scripted and executed to test the GUI application. Nevertheless, today the software world is completely dominated by the web and mobile-based applications, which are easy to test through an automated approach using a test automation tool.
Once the code is built, packaged, and deployed, testing should run automatically to validate the software. Traditionally, the process followed is to have an environment for SIT, UAT, PT, and Pre-Production. First, the release goes through SIT, which stands for System Integration Test. Here, testing is performed on an integrated code to check its functionality altogether. If pass, the code is deployed in the next environment, that is, UAT where it goes through a user acceptance test, and then similarly, it can lastly be deployed in PT where it goes through the performance test. Thus, in this way, the testing is prioritized.
It is not always possible to automate all of the testings. But, the idea is to automate whatever testing is possible. The previous method discussed requires the need to have many environments and also a number of automated deployments into various environments. To avoid this, we can go for another method where there is only one environment where the build is deployed, and then, the basic tests are run, and after that, long running tests are triggered manually.
Use static code analysis
Static code analysis, also commonly called white-box testing, is a form of software testing that looks for the structural qualities of the code. For example, it answers how robust or maintainable the code is. Static code analysis is performed without actually executing programs. It is different from the functional testing, which looks into the functional aspects of software and is dynamic.
Static code analysis is the evaluation of software's inner structures. For example, is there a piece of code used repetitively? Does the code contain lots of commented lines? How complex is the code? Using the metrics defined by a user, an analysis report can be generated that shows the code quality in terms of maintainability. It doesn't question the code functionality.
Some of the static code analysis tools, such as SonarQube come with a dashboard, which shows various metrics and statistics of each run. Usually, as part of Continuous Integration, the static code analysis is triggered every time build runs. As discussed in the previous sections, static code analysis can also be included before a developer tries to check-in his code. Hence, code on low quality can be prevented right at the initial stage.
Automate using scripting languages
One of the most important parts, or shall we say the backbone of Continuous Integration are the scripting languages. Using these, we can reach where no tool reaches. In my own experience, there are many projects where build tools, such as Maven, Ant, and the others don't work. For example, the SAS Enterprise application has a GUI interface used to create packages and perform code promotions from environment to environment. It also offers few APIs to do the same through the command line. If one has to automate the packaging and code promotion process, then they ought to use the scripting languages.
One of my favorites; Perl is an open source scripting language. It is mainly used for text manipulation. The main reasons for its popularity are as follows:
- It comes free and preinstalled with any Linux and Unix OS
- It's also freely available for Windows
- It is simple and fast to script using Perl
- It works both on Windows, Linux, and Unix platforms
Though it was meant to be just a scripting language for processing files, nevertheless it has seen a wide range of usages in the areas of system administration, build, release and deployment automation, and much more. One of the other reasons for its popularity is the impressive collections of third-party modules.
I would like to expand on the advantages of the multiple platform capabilities of Perl. There are situations where you will have Jenkins servers on a Windows machine, and the destination machines (where the code needs to be deployed) will be Linux machines. This is where Perl helps; a single script written on the Jenkins Master will run on both the Jenkins Master and the Jenkins Slaves.
However, there are various other popular scripting languages that you can use, such as Ruby, Python, and Shell to name a few.
Test in a production-like environment
Ideally testing such as SIT, UAT, and PT to name a few, are performed in an environment that is different from the production. Hence, there is every possibility that the code that has passed these quality checks may fail in production. Therefore, it's advised to perform an end-to-end testing on the code in a production like environment, commonly referred as pre-production environment. In this way, we can be best assured that the code won't fail in production.
However, there is a challenge to it. For example, consider an application that runs on various web browsers both on mobiles and PCs. To test such an application effectively, we would need to simulate the entire production environment used by the end users. These call for multiple build configurations and complex deployments, which are manual. Continuous Integration systems need to take care of this; on a click of a button, various environments should get created each reflecting the environment used by the customers. And then, it should be followed by deployment and testing thereafter.
If something fails, there should be an ability to see when, who, and what caused the failure. This is called as backward traceability. How to achieve it? Let's see:
- By introducing automated notifications after each build. The moment a build is completed, the Continuous Integration tools automatically respond to the development team with the report card.
- As seen in the Scrum methodology, the software is developed in pieces called backlogs. Whenever a developer checks in the code, they need to apply a label on the checked-in code. This label can be the backlog number. Hence, when the build or a deployment fails, it can be traced back to the code that caused it using the backlog number.
- Labeling each build also helps in tracking back the failure.
Using a defect tracking tool
Defect tracking tools are a means to track and manage bugs, issues, tasks, and so on. Earlier projects were mostly using Excel sheets to track their defects. However, as the magnitude of the projects increased in terms of the number of test cycles and the number of developers, it became absolutely important to use a defect tracking tool. Some of the popular defect tracking tools are Atlassian JIRA and Bugzilla.
The quality analysis market has seen the emergence of various bug tracking systems or defect management tools over the years.
A defect tracking tools offers the following features:
- It allows you to raise or create defects and tasks that have got various fields to define the defect or the task.
- It allows you to assign the defect to the concerned team or an individual responsible for the change.
- Progressing through the life cycle stages-workflow.
- It provides you with the feature to comment on a defect or a task, watch the progress of the defect, and so on.
- It provides metric. For example, how many tickets were raised in a month? How much time was spent on resolving the issues? All these metrics are of significant importance to the business.
- It allows you to attach a defect to a particular release or build for better traceability.
The previously mentioned features are a must for a bug tracking system. There may be many other features that a defect tracking tool may offer, such as voting, estimated time to resolve, and so on.
In this article, we learned how various DevOps tools go hand-in-hand to achieve Continuous Integration, and of course, helps projects go agile. We can fairly conclude that Continuous Integration is an engineering practice where each chunk of code is immediately built and unit-tested locally, then integrated and again built and tested on the Integration branch.
Resources for Article:
- Exploring Jenkins [article]
- Jenkins Continuous Integration [article]
- Maven and Jenkins Plugin [article]