Software delivery cycle has been getting shorter and shorter, while on the other hand, application size has been getting bigger and bigger. Software developers and IT operators are under the pressure to find a solution to this. There is a new role, called DevOps, which is dedicated to support software building and delivery.
This chapter covers the following topics:
- How has software delivery methodology changed?
- What is microservice, and why do people adopt this architecture?
- How does DevOps support to build and deliver the application to the user?
Building a computer application and delivering it to the customer has been discussed and has evolved over time. It is related to Software Development Life Cycle (SDLC); there are several types of processes, methodologies, and histories. In this section, we will describe its evolution.
Back in the 1990s, software delivery was adopted by a physical method, such as a floppy disk or a CD-ROM. Therefore, SDLC was a very long-term schedule, because it was not easy to (re)deliver to the customer.
In this case, we can't go back to the previous phase. For example, after starting or finishing the Implementation phase, it is not acceptable to go back to the Design phase (to find a technical expandability issue, for example). This is because it will impact the overall schedule and cost. The project tends to proceed and complete to release, then it goes to the next release cycle including a new design.
It perfectly matches the physical software delivery because it needs to coordinate with logistics management that press and deliver the floppy/CD-ROM to the user. Waterfall model and physical delivery used to take a year to several years.
A few years later, the internet became widely accepted, and then software delivery method also changed from physical to electrical, such as online download. Therefore, many software companies (also known as dot-com companies) tried to figure out how to shorten the SDLC process in order to deliver the software that can beat the competitors.
Many developers started to adopt new methodologies such as incremental, iterative, or agile models and then deliver to the customer faster. Even if new bugs are found, it is now easier to update and deliver to the customer as a patch by electrical delivery. Microsoft Windows update was also introduced since Windows 98.
In this case, the software developer writes only a small logic or module, instead of the entire application in one shot. Then, it delivers to the QA, and then the developer continues to add a new module and finally delivers it to the QA again.
When the desired modules or functions are ready, it will be released as shown in the following diagram:
This model makes the SDLC cycle and the software delivery faster and also easy to be adjust during the process, because the cycle is from a few weeks to a few months which is small enough to make a quick adjustment.
Although this model is currently favoured by the majority, at that moment, application software delivery meant software binary, such as EXE program which is designed to be installed and run on the customer's PC. On the other hand, the infrastructure (such as server and network) is very static and set up beforehand. Therefore, SDLC doesn't tend to include these infrastructures in the scope yet.
A few years later, smartphones (such as iPhone) and wireless technology (such as Wi-Fi and 4G network) became widely accepted, and software application also changed from binary to the online service. The web browser is the interface of the application software, which need not be installed anymore. On the other hand, infrastructure becomes dynamic, since the application requirement keeps changing and the capacity needs to grow as well.
Virtualization technology and Software Defined Network(SDN) make the server machine dynamic. Now, cloud services such as Amazon Web Services (AWS) and Google Cloud Platform (GCP) can be easy to create and manage dynamic infrastructures.
Now, infrastructure is one of the important components and being within a scope of Software Development Delivery Cycle, because the application is installed and runs on the server side, rather than a client PC. Therefore, software and service delivery cycle takes between a few days to a few weeks.
As discussed previously, the surrounding software delivery environment keeps changing; however, the delivery cycle is getting shorter and shorter. In order to achieve rapid delivery with higher quality, the developer and QA start to adopt some automation technologies. One of the popular automation technologies is Continuous Integration (CI). CI contains some combination of tools, such as Version Control Systems (VCS), build server, and testing automation tools.
VCS helps the developer to maintain program source code onto the central server. It prevents overwriting or conflict with other developers' code and also preserves the history. Therefore, it makes it easier to keep the source code consistent and deliver to the next cycle.
The same as VCS, there is a centralized build servers that connects VCS to retrieve the source code periodically or automatically when the developer updates the code to VCS, and then trigger a new build. If the build fails, it notifies the developer in a timely manner. Therefore, it helps the developer when someone commits the broken code into the VCS.
Testing automation tools are also integrated with build server that invoke the unit test program after the build succeeds, then notifies the result to the developer and QA. It helps to identify when somebody writes a buggy code and stores to VCS.
The entire flow of CI is as shown in the following diagram:
CI helps both the developer and the QA not only to increase the quality, but also to shorten archiving an application or module package cycle. In an age of electrical delivery to the customer, CI is more than enough. However, because delivery to the customer means to deploy to the server.
CI plus deployment automation is the ideal process for the server application to provide a service to customers. However, there are some technical challenges that need to be resolved. How to deliver a software to the server? How to gracefully shutdown the existing application? How to replace and rollback the application? How to upgrade or replace if the system library also needs to change? How to modify the user and group settings in OS if needed? and so on.
Because the infrastructure includes the server and network, it all depends on an environment such as Dev/QA/staging/production. Each environment has different server configuration and IP address.
The configuration management tool helps to configure an OS including the user, group, and system libraries, and also manages multiple servers that keep consistent with the desired state or configuration if we replace the server.
It is not a scripting language, because scripting language performs to execute a command based on the script line by line. If we execute the script twice, it may cause some error, for example, attempt to create the same user twice. On the other hand, configuration management looks at the state, so if user is created already, the configuration management tool doesn't do anything. But if we delete a user accidentally or intentionally, the configuration management tool will create the user again.
It also supports to deploy or install your application to the server. Because if you tell the configuration management tool to download your application, then set it up and run the application, it tries to do so.
In addition, if you tell the configuration management tool to shut down your application, then download and replace to a new package if available, and then restart the application, it keeps up to date with the latest version.
Of course, some of the users want to update the application only when it is required, such as blue-green deployments. The configuration management tool allows you to trigger to execute manually too.
Blue-green deployments is a technique that prepares the two sets of application stack, then only one environment (example: blue) is servicing to the production. Then when you need to deploy a new version of application, deploy to the other side (example: green) then perform the final test. Then if it works fine, change the load balancer or router setting to switch the network flow from blue to green. Then green becomes a production, while blue becomes dormant and waiting for the next version deployment.
The configuration management tool supports not only OS or Virtual Machine, but also cloud infrastructure. If you need to create and configure a network, storage, and Virtual Machine on the cloud, it requires some of the cloud operations.
Configuration management has some advantage against maintaining an operation manual Standard Operation Procedure (SOP). For example, maintaining a configuration file using VCS such as Git, you can trace the history of how the environment setting has changed.
It is also easy to duplicate the environment. For example, you need an additional environment on cloud. If you follow the traditional approach, (that is, to read the SOP document to operate the cloud), it always has a potential human error and operation error. On the other hand, we can execute the configuration management tool that creates an environment on cloud quickly and automatically.
The orchestration tool is also categorized as one of the configuration management tools. However its more intelligent and dynamic when configuring and allocating the cloud resources. For example, orchestration tool manages several server resources and networks, and then when the administrator wants to increase the application instances, orchestration tool can determine an available server and then deploy and configure the application and network automatically.
Although orchestration tool is beyond the SDLC, it helps Continuous Delivery when it needs to scale the application and refactor the infrastructure resource.
Overall, the SDLC has been evolved to achieve rapid delivery by several processes, tools, and methodologies. Eventually, software (service) delivery takes anywhere from a few hours to a day. While at the same time, software architecture and design has also evolved to achieve large and rich applications.
When the application size is getting bigger, developers tried to divide by several modules. Each module should be independent and reusable, and should be maintained by different developer teams. Then, when we start to implement an application, the application just initializes and uses these modules to build a larger application efficiently.
The following example shows what kind of library Nginx (https://www.nginx.com) uses on CentOS 7. It indicates that Nginx uses
POSIX thread library,
PCRE the regular expression library,
zlib the compression library,
GNU C library, and so on. So, Nginx didn't reinvent to implement SSL encryption, regular expression, and so on:
$ /usr/bin/ldd /usr/sbin/nginx linux-vdso.so.1 => (0x00007ffd96d79000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fd96d61c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd96d400000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd96d1c8000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fd96cf67000) libssl.so.10 => /lib64/libssl.so.10 (0x00007fd96ccf9000) libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fd96c90e000) libz.so.1 => /lib64/libz.so.1 (0x00007fd96c6f8000) libprofiler.so.0 => /lib64/libprofiler.so.0 (0x00007fd96c4e4000) libc.so.6 => /lib64/libc.so.6 (0x00007fd96c122000) ...
Package management tool allows you to register your module or package to the centralized or private repository, and also allows to download the necessary packages. The following screenshot shows Maven repository for AWS SDK:
When you add some particular dependencies to your application, Maven downloads the necessary packages. The following screenshot is the result you get when you add
aws-java-sdk dependency to your application:
Modular programming helps you to increase software development speed and reduce it to reinvent the wheel, so it is the most popular way to develop a software application now.
However, applications need more and more combination of modules, packages, and frameworks, as and when we keep adding a feature and logic. This makes the application more complex and larger, especially server-side applications. This is because it usually needs to connect to a database such as RDBMS, as well as an authentication server such as LDAP, and then return the result to the user by HTML with the appropriate design.
Therefore, developers have adopted some software design patterns in order to develop an application with a bunch of modules within an application.
One of the popular application design patterns is Model View and Controller (MVC). It defines three layers. View layer is in charge of user interface (UI) input output (I/O). Model layer is in charge of data query and persistency such as load and store to database. Then, the Controller layer is in charge of business logic that is halfway between View and Model:
There are some frameworks that help developers to make MVC easier, such as Struts (https://struts.apache.org/), SpringMVC (https://projects.spring.io/spring-framework/), Ruby on Rails (http://rubyonrails.org/), and Django (https://www.djangoproject.com/). MVC is one of the successful software design pattern that is used for the foundation of modern web applications and services.
MVC defines a border line between every layer which allows many developers to jointly develop the same application. However, it causes side effects. That is, the size of the source code within the application keeps getting bigger. This is because database code (Model), presentation code (View), and business logic (Controller) are all within the same VCS repository. It eventually makes impact on the software development cycle, which gets slower again! It is called monolithic, which contains a lot of code that builds a giant exe/war program.
There is no clear measurement of monolithic application definition, but it used to have more than 50 modules or packages, more than 50 database tables, and then it needs more than 30 minutes to build. When it needs to add or modify one module, it affects a lot of code, therefore developers try to minimize the application code change. This hesitation causes worse effects such that sometimes the application even dies because no one wants to maintain the code anymore.
Therefore, the developer starts to divide monolithic applications in to small pieces of application and connect via the network.
Actually, dividing an application in to small pieces and connecting via the network has been attempted back in the 1990s. Sun Microsystems introduced Sun RPC (Remote Procedure Call). It allows you to use the module remotely. One of popular Sun RPC implementers is Network File System (NFS). CPU OS versions are independent across NFS client and NFS server, because they are based on Sun RPC.
The programming language itself also supports RPC-style functionality. UNIX and C language have the
rpcgen tool. It helps the developer to generate a stub code, which is in charge of network communication code, so the developer can use the C function style and be relieved from difficult network layer programming.
Java has JavaRemote Method Invocation (RMI) which is similar to Sun RPC, but for Java, RMI compiler (rmic) generates the stub code that connects remote Java processes to invoke the method and get a result back. The following diagram shows Java RMI procedure flow:
Objective C also has distributed object and .NET has remoting, so most of the modern programming languages have the capability of Remote Procedure Call out of the box.
These Remote Procedure Call designs have the benefit to divide an application into multiple processes (programs). Individual programs can have separate source code repositories. It works well although machine resource (CPU, memory) was limited during 1990s and 2000s.
However, it was designed and intended to use the same programming language and also designed for client/server model architecture, instead of a distributed architecture. In addition, there was less security consideration; therefore, it is not recommended to use over a public network.
In the 2000s, there was an initiative web services that used SOAP (HTTP/SSL) as data transport, using XML as data presentation and service definition Web Services Description Language (WSDL), then used Universal Description, Discovery, and Integration (UDDI) as the service registry to look up a web services application. However, as the machine resources were not rich and due to the complexity of web services programming and maintainability, it is not widely accepted by developers.
Go to 2010s, now machine power and even the smartphone have plenty of CPU resource, in addition to network bandwidth of a few hundred Mbps everywhere. So, the developer starts to utilize these resources to make application code and system structure as easy as possible making the software development cycle quicker.
Based on hardware resources, it is a natural decision to use HTTP/SSL as RPC transport, but from having experience with web services difficulty, the developer makes it simple as follows:
It is called RESTful design, and that has been widely accepted by many developers and become de facto standard of distributed applications. RESTful application allows any programming language as it is HTTP-based, so the RESTful server is Java and client Python is very natural.
It brings freedom and opportunities to the developer that its easy to perform code refactoring, upgrade a library and even switch to another programming language. It also encourages the developer to build a distributed modular design by multiple RESTful applications, which is called microservices.
If you have multiple RESTful applications, there is a concern on how to manage multiple source code on VCS and how to deploy multiple RESTful servers. However, Continuous Integration, and Continuous Delivery automation makes a lower bar to build and deploy a multiple RESTful server application easier.
Therefore, microservices design is getting popular for web application developers.
Although the name is micro, it is actually heavy enough compared to the applications from 1990s or 2000s. It uses full stack of HTTP/SSL server and contains entire MVC layers. The microservices design should care about the following topics:
- Stateless: This doesn't store user session to the system, which helps to scale out easier.
- No shared datastore: The microservice should own the datastore such as database. It shouldn't share with the other application. It helps to encapsulate the backend database that is easy to refactor and update the database scheme within a single microservice.
- Versioning and compatibility: The microservice may change and update the API but should define a version and it should have backward compatibility. This helps to decouple between other microservices and applications.
- Integrate CI/CD: The microservice should adopt CI and CD process to eliminate management effort.
There are some frameworks that can help to build the microservice application such as Spring Boot (https://projects.spring.io/spring-boot/) and Flask (http://flask.pocoo.org). However, there are a lot of HTTP-based frameworks, so the developer can feel free to try and choose any preferred framework or even programming language. This is the beauty of the microservice design.
The following diagram is a comparison between monolithic application design and microservices design. It indicates that microservice (also MVC) design is the same as monolithic, which contains interface layer, business logic layer, model layer, and datastore.
But the difference is that the application (service) is constructed by multiple microservices and that different applications can share the same microservice underneath:
The developer can add the necessary microservice and modify an existing microservice with the rapid software delivery method that won't affect an existing application (service) anymore.
It is a breakthrough to an entire software development environment and methodology that is getting widely accepted by many developers now.
Although Continuous Integration and Continuous Delivery automation process helps to develop and deploy multiple microservices, the number of resources and complexity, such as Virtual Machine, OS, library, and disk volume and network can't compare with monolithic applications.
Therefore, there are some tools and roles that can support these large automation environments on the cloud.
As discussed previously, automation is the best practice to achieve rapid software delivery and solves the complexity to manage many microservices. However, automation tools are not an ordinary IT/infrastructure applications such as Active Directory, BIND (DNS), and Sendmail (MTA). In order to achieve automation, there is an engineer who should have both developer skill set to write a code, especially scripting language, and infrastructure operator skill set such as VM, network, and storage.
DevOps is a clipped compound of development and operations that can have an ability to make automation processes such as Continuous Integration, Infrastructure as code, and Continuous Delivery. DevOps uses some DevOps tools to make these automation processes.
One of the popular VCS tools is Git (https://git-scm.com). The developer uses Git to check-in and check-out the code all the time. There are some hosting Git service: GitHub (https://github.com) and Bitbucket (https://bitbucket.org). It allows you to create and save your Git repositories and collaborate with other users. The following screenshot is a sample pull request on GitHub:
The build server has a lot of variation. Jenkins (https://jenkins.io) is one of well-established applications, which is the same as TeamCity (https://www.jetbrains.com/teamcity/). In addition to build server, you also have hosted services, the Software as a Service (SaaS) such as Codeship (https://codeship.com) and Travis CI (https://travis-ci.org). SaaS has the strength to integrate with other SaaS tools.
The following screenshot is a sample build using Codeship; it checks out the code from GitHub and invokes Maven to build (
mvn compile) and unit testing (
There are a variety of configuration management tools such as Puppet (https://puppet.com), Chef (https://www.chef.io), and Ansible (https://www.ansible.com), which are the most popular in configuration management.
AWS OpsWorks (https://aws.amazon.com/opsworks/) provides a managed Chef platform. The following screenshot is a Chef recipe (configuration) of installation of Amazon CloudWatch Log agent using AWS OpsWorks. It automates to install CloudWatch Log agent when launching an EC2 instance:
AWS CloudFormation (https://aws.amazon.com/cloudformation/) helps to achieve infrastructure as code. It supports the automation for AWS operation, for example, to perform the following functions:
- Creating a VPC.
- Creating a subnet on VPC.
- Creating an internet gateway on VPC.
- Creating a routing table to associate a subnet to the internet gateway.
- Creating a security group.
- Creating a VM instance.
- Associating a security group to a VM instance.
The configuration of CloudFormation is written by JSON as shown in the following screenshot:
It supports parameterize, so it is easy to create an additional environment with different parameters (for example, VPC and CIDR) using a JSON file with the same configuration. In addition, it supports the update operation. So, if there is a need to change a part of the infrastructure, there's no need to recreate. CloudFormation can identify a delta of configuration and perform only the necessary infrastructure operations on behalf of you.
AWS CodeDeploy (https://aws.amazon.com/codedeploy/) is also a useful automation tool. But focus on software deployment. It allows the user to define. The following are some actions onto the YAML file:
- Where to download and install.
- How to stop the application.
- How to install the application.
- After installation, how to start and configure an application.
The following screenshot is an example of AWS CodeDeploy configuration file
AmazonCloudWatch is the built-in monitoring tool on AWS. No agent installation is needed; it automatically gathers some metrics from AWS instances and visualizes for DevOps. It also supports to set an alert based on the criteria that you set. The following screenshot is an Amazon CloudWatch metrics for EC2 instance:
Amazon CloudWatch also supports to gather an application log. It requires installing an agent on EC2 instance; however, centralized log management is useful when you need to start managing multiple microservice instances.
ELK is a popular combination of stack that stands for Elasticsearch (https://www.elastic.co/products/elasticsearch), Logstash (https://www.elastic.co/products/logstash), and Kibana (https://www.elastic.co/products/kibana). Logstash helps to aggregate the application log and transform to JSON format and then send to Elasticsearch.
Grafana (https://grafana.com) is another popular visualization tool. It used to be connected with time series database such as Graphite (https://graphiteapp.org) or InfluxDB (https://www.influxdata.com). Time series database is designed to store the data, which is flat and de-normalized numeric data such as CPU usage and network traffic. Unlike RDBMS, time series database has some optimization to save the data space and faster query for numeric data history. Most of DevOps monitoring tools are using time series database in the backend.
The following example is a Grafana that shows Message Queue Server statistics:
Once you start to use several DevOps tools as we saw earlier, you need to go back and forth to visit several consoles to check whether CI and CD pipelines work properly or not. For example, consider the following points:
- Merge the source code to GitHub.
- Trigger the new build on Jenkins.
- Trigger AWS CodeDeploy to deploy the new version of the application.
These events need to be tracked by time sequence, and if there are some troubles, DevOps needs to discuss it with the developer and QA to handle the cases. However, there are some over-communication needs, because DevOps needs to capture the event one by one and then explain, probably via e-mail. It is not efficient and in the meantime the issue is still going on.
There are some communication tools that help to integrate these DevOps tools and anyone can join to look at the event and comment to each other. Slack (https://slack.com) and HipChat (https://www.hipchat.com) are the most popular communication tools.
CI CD and automation work can be achieved easily when used with cloud technology. Especially public cloud API helps DevOps to come up with many CI CD tools. Public cloud such as Amazon Web Services (https://aws.amazon.com) and Google Cloud Platform (https://cloud.google.com) provides some APIs to DevOps to control the cloud infrastructure. DevOps can be a relief from capacity and resource limitation, just pay as you go whenever the resource is needed.
Public cloud will keep growing the same way as software development cycle and architecture design; these are best friends and the important key to achieve your application/service to success.
The following screenshot is a web console for Amazon Web Services:
Google Cloud Platform also has a web console as shown here:
Both cloud services have a free trial period that DevOps engineer can use to try and understand the benefits of cloud infrastructure.
In this chapter, we have discussed the history of software development methodology, programming evolution and DevOps tools. These methodologies and tools support quicker software delivery cycle. Microservice design also helps continuous software update. However, microservice makes complexity of environment management.
The next chapter will describe the Docker container technology, which helps to compose microservice application and manage it in a more efficient and automated way.