In the traditional approach to application development and maintenance, multiple stakeholders, departments, groups, and vendors are involved in the overall software development life cycle (SDLC). Most of us are familiar with the stages of application life cycle management: the business requirements are gathered by a business analyst, then developed by the development team (or could have been outsourced), and tested by QA teams (also could have been outsourced) for functionality and fitness for purpose. Performance and stress testing were also performed in applicable scenarios, by appropriate groups with relevant tools. Then the production deployment process, with a checklist and approvals, was managed by the IT teams at the organization, followed by monitoring and support by maintenance teams. And we notice that each stage of the maturity cycle, from functionality development to usability and maintenance, is managed in silos, by independent teams, departments, processes, and tools. This approach is often fragmented by techniques, frameworks, processes, people, and tools impacting the final product in terms of features, cost, schedule, quality, performance, and other administrative overheads such as interfacing and integration between vendors. Also, in this method the maintenance, support costs, and skill needs are often overlooked. However, both from application life cycle and business points of view, maintenance and support activities are key and important to assess, evaluate, and estimate well in advance.
In this lesson, we will cover the following topics:
Introduction to DevOps
Business application of DevOps.
Business drivers/market trends
DevOps strategy
Benefits of DevOps
Many technological innovations have taken place to challenge the traditional method of IT management in almost every segment. The technological advances and changes are quite profound, rapid, and often intermingled, encompassing multiple fields such as agile methodology, DevOps, big data, cloud, and so on. A comprehensive and holistic approach will undoubtedly be rewarding and derive maximum value for organizations. Many institutions have already embarked on this journey towards the future, adopting these technologies.
The pre-DevOps software development challenges are reluctant to change in systems; deployments fraught with risk, lack of consistency across environments (it works on my machine syndrome), the impact of silos--toss problems across the wall such as teams resulting in duplication of effort, skill sets, and in-fighting. To mitigate the mentioned issues and bridge this gap DevOps emerged as a popular choice.
DevOps (Development plus Operations) has recently taken center stage in the SDLC. DevOps offers process frameworks augmented with open source tools to integrate all the phases of the application life cycle, and ensure they function as a cohesive unit. It helps to align and automate the process across the phases of development, testing, deployment, and support. It includes best practices such as code repositories, build automation, continuous deployment, and others.
DevOps adoption for systems including big data systems and projects is a cultural shift compared to traditional development cycles. The purpose of this book is to put forth the concepts and adoption strategy for an organization, covering the technology areas of DevOps, big data, cloud, data science, in-memory technology, and others. Adopting and adhering to DevOps practices will be rewarding for any organization and allow it to improve on its performance and efficiency.
Acceptance of open source tools for each segment of IT functionality, their popularity, and versatility, is increasing day by day, across the world. As a matter of fact, many new tool variants have been introduced to the market for each segment. The open source tools for DevOps are major contributors to the success of DevOps' adoption in the market by institutions, which is discussed in detail in coming sections.
As we can see, across industries DevOps adoption has seen steady growth year on year:
DevOps penetration in enterprises shows a healthy trend, as per the following figure:
Application of DevOps varies for multiple scenarios, with accrued benefits as listed:
Automation of development cycle: Business needs are met with minimal manual intervention, and a developer can run a build with a choice of open tools through a code repository; the QA team can create a QA system as a replica, and deploy it to production seamlessly and quickly.
Single version of truth - source code management: There are multiple versions of the code, but it is difficult to ascertain the appropriate code for the purpose. We lack a single version of the truth. Code review feedback is through emails and not recorded, leading to confusion and rework.
Consistent configuration management: We develop, test, and build source code on different systems. Validating the platforms and compatibility versions of dependencies is manual and error-prone. It's really challenging to ensure all the systems speak the same language, and have the same versions of the tools, compilers, and so on. Our code works fine on build systems but doesn't when moved to production systems, causing embarrassment regarding business deliverables, and cost overheads to react.
Product readiness to markets: We have a process to develop code, test, and build through defined timelines. There are many manual checks and validations in the process; the integrations between different groups cause our commitments and delivery dates to be unpredictable. We wish to know how close our product is to delivery and its quality periodically, to plan in advance rather than being reactive.
Automation of manual processes: We are following manual processes, which are often error prone, and wish to enhance efficiency by following an automation process wherever applicable. Testing cycle automation, incremental testing, and integrating with the build cycle will expedite product quality, the release cycle, and infrastructure service automation such as creating, starting, stopping, deleting, terminating, and restarting virtual or bare-metal machines.
Containers: Portability of code is the primary challenge. The code works in development and QA environments, but moving to production systems causes multiple challenges such as code not compiling due to dependency issues, build break down, and so on. Building platform agnostic code is a challenge, and maintaining multiple platform versions of development and QA platforms is a huge overhead. Portable container code would alleviate these kinds of issues.
On-premise challenges: We have many on-premise systems. There are multiple challenges, from capacity planning to turnaround time. The Capex and operational expenses are unpredictable. Cloud migration seems to have multiple choices and vendors, so there needs to be an efficient adoption method to ensure results.
Factors contributing to wide-scale popularity and adoption of DevOps among big data systems are listed as follows.
Data is the new form of currency--yes you read right, it's as much a valuable asset as oil and gold. In the past decade, many companies realized the potential of data as an invaluable asset to their growth and performance.
Let's understand how data is valuable. For any organization, data could be in many forms such as, for example, customer data, product data, employee data, and so on. Not having the right data on your employees, customers, or products could be devastating. Its basic knowledge and common sense that the correct data is key to running a business effectively. There is hardly any business today that doesn't depend on data-driven decisions; CEOs these days are relying more on data for business decisions than ever before, such as which product is more successful in the market, how much demand exists area-wise, which price is more competitive, and so on.
Data can be generated through multiple sources, internal, external, and even social media. Internal data is the data generated through internal systems and operations, such as in a bank, adding new customers or customer transactions with the bank through multiple channels such as ATM, online payments, purchases, and so on. External sources could be procuring gold exchange rates and foreign exchange rates from RBI. These days, social media data is widely used for marketing and customer feedback on products. Harnessing the data from all avenues and using it intelligently is key for business success.
Going a step further, a few companies even monetize data, for example, Healthcare IQ, Owens & Minor, State Street Global Corporation, Ad Juggler, comScore, Verisk Analytics, Nielsen, and LexisNexis. These organizations buy raw data such as web analytics on online product sales, or online search records for each brand, reprocess the data into an organized format, and sell it to research analysts or organizations looking for competitor intelligence data to reposition their products in markets.
Let's analyze the factors fueling the growth of data and business. Fundamental changes in market and customer behavior have had a significant impact on the data explosion. Some of the key drivers of change are:
Customer preference: Today, customers have many means of interacting with businesses; for example, a bank provides multiple channels such as ATM withdrawals, online banking, mobile banking, card payments, on-premise banking, and so on. The same is true for purchases; these can be in the shop, online, mobile-based, and so on, which organizations have to maintain for business operations. So, these multiple channels contribute to increased data management.
Social media: Data is flooding in from social media such as Facebook, LinkedIn, and Twitter. On the one hand, they are social interaction sites between individuals; on the other hand, companies also rely on social media to socialize their products. The data posted in terabytes/petabytes, in turn, is used by many organizations for data mining too. This is contributing to the huge data explosion.
Regulations: Companies are required to maintain data in proper formats for a stipulated time, as required by regulatory bodies. For example, to combat money laundering, each organization dealing with finance is required to have clear customer records and credentials to share with regulatory authorities over extended periods of time, such as 10 to 15 years.
Digital world: As we move towards the paperless digital world, we keep adding more digital data, such as e-books and ERP applications to automate many tasks and avoid paperwork. These innovations are generating much of the digital data growth as well.
The next generation will be more data intensive, with the Internet of Things and data science at the forefront, driving business and customer priorities.
Acceptance of cloud platforms as the de facto service line has brought many changes to procuring and managing infrastructure. Provisioning hardware and other types of commodity work on the cloud is also important for improving efficiency, as moving these IT functions to the cloud enhances the efficiency of services, and allows IT departments to shift their focus away from patching operating systems. DevOps with cloud adoption is the most widely implemented popular option. With cloud penetration, addition of infrastructure/servers is just a click away. This, along with credible open source tools, has paved the way for DevOps.
In a fraction of time, build, QA, and pre-prod machines can be added as exact replicas and configurations as required, using open source tools.
Big data is the term used to represent multiple dimensions of data such as large volumes, velocity, and variety, and delivering value for the business. Data comes from multiple sources, such as structured, semi-structured, and unstructured data. The data velocity could be a batch mode, real-time from a machine sensor or online server logs, and streaming data in real time. The volumes of data could be terabytes or petabytes, which are typically stored on Hadoop-based storage and other open source platforms. Big data analytics extends to building social media analytics such as market sentiment analysis based on social media data from Twitter, LinkedIn, Facebook, and so on; this data is useful to understand customer sentiment and support marketing and customer service activities.
Data science as a field has many dimensions and applications. We are familiar with science; we understand the features, behavior patterns, and meaningful insights that result in formulating reusable and established formulas. In a similar way, data can also be investigated to understand the behavior patterns and meaningful insights, through engineering and statistical methods. Hence it can be viewed as data + science, or the science of data. Machine learning is a combination of data extraction, extract, transform, load (ETL) or extract, load, transform (ELT) preparation, and using prediction algorithms to derive meaningful patterns from data to generate business value. These projects have a development life cycle in line with a project or product development. Aligning with DevOps methodologies will provide a valuable benefit for the program evolution.
Traditional software architecture was formerly based on disks as the primary data storage; then the data moved from disk to main memory and CPU to perform aggregations for business logic. This caused the IO overhead of moving large volumes of data back and forth from disk to memory units.
In-memory technology is based on hardware and software innovations to handle the complete business application data in the main memory itself, so the computations are very fast. To enable in-memory computing, many underlying hardware and software advancements have contributed.
The software advancements include the following:
Partitioning of data
No aggregate tables
Insert the only delta
Data compression
Row plus column storage
The hardware advancements include the following:
Multi-core architecture allows massive parallel scaling
Multifold compression
Main memory has scalable capacity
Fast prefetch unlimited size
A good DevOps strategy, discussed in this book, helps the user gain in-depth and wider understanding of its subject and its application to multiple technologies and interfaces, to an organization provides focus, creates a common (unbiased) view of the current problems, develops the future state, unveils opportunities for growth, and results in better business outputs.
A holistic DevOps strategy, at the most basic level, must answer the following questions:
What are our business aims and goals?
How do we plan the roadmap? Where do we begin?
How should we channel our efforts?
What are we trying to accomplish?
What is the schedule for this?
What is the impact to the business?
How do our stakeholders see the value?
What are the benefits and costs of doing it?
A good DevOps strategy for an organization will bring multiple benefits, channel energy to focus on high impact problems, produce clarity to develop the future state, identify growth opportunities, and pave the way for better business outputs.
A DevOps platform strategy will be a unique and extensive program, covering every aspect of the software life cycle, which integrates multiple technologies, platforms, and tools, and posing numerous challenges that need to be handled with skill, precision, and experience.
An organization can consider the introduction of DevOps to cater to specific purposes, such as the following:
Automating infrastructure and workflow configuration management
Automating code repositories, builds, testing, and workflows
Continuous integration and deployment
Virtualization, containerization, and load balancing
Big data and social media projects
Machine-learning projects
There are a wide variety of open source tools to select for adoption in specific segments of DevOps, such as the following:
Docker: A Docker container consists of packaging the application and its dependencies all up in a box. It runs as an isolated process on the host operating system, sharing the kernel with another container. It enjoys resource isolation and allocation benefits like VMs, but is much more portable and efficient.
Kubernetes: Kubernetes is an open source orchestration system for Docker containers. It groups containers into logical units for easy management and discovery, handles scheduling on nodes, and actively manages workloads to ensure their state matches users' declared intentions.
Jenkins: Jenkins is a web-enabled tool used through application or a web server such as Tomcat, for continuous build, deployment, and testing, and is integrated with build tools such as Ant/Maven and the source code repository Git. It also has master and dump slaves.
Ansible: Ansible automates software provisioning, configuration management, and application deployment with agentless, Secured Shell (SSH) mode, Playbooks, Towers, and Yum scripts are the mechanisms.
Chef and Puppet: Chef and Puppet are agent-based pull mechanisms for the deployment automation of work units.
GitHub: Git is a popular open source version control system. It's a web-based hosted service for Git repositories. GitHub allows you to host remote Git repositories, and has a wealth of community-based services that make it ideal for open source projects.
There are comprehensive frameworks readily available, such as RedHat Openshift, Microsoft Azure, and AWS container services, with pre-integrated and configured tools to implement.
A few popular open source tools are listed here:
Source code management: Git, GitHub, Subversion, and Bitbucket
Build management: Maven, Ant, Make, and MSBuild
Testing tools: JUnit, Selenium, Cucumber, and QUnit
Repository management: Nexus, Artifactory, and Docker hub
Continuous integration: Jenkins, Bamboo, TeamCity, and Visual Studio
Configuration provisioning: Chef, Puppet, Ansible, and Salt
Release management: Visual Studio, Serena Release, and StackStorm
Cloud: AWS, Azure, OpenShift, and Rackspace
Deployment management: Rapid Deploy, Code Deploy, and Elastic box
Collaboration: Jira, Team Foundation, and Slack
BI/Monitoring: Kibana, Elasticsearch, and Nagios
Logging: Splunk, Logentries, and Logstash
Container: Linux, Docker, Kubernetes, Swam, AWS, and Azure
Non-adherence to DevOps practices would be challenging for an organization, for the following reasons:
High deployment effort for each of the development, QA, and production systems
Complex manual installation procedures are cumbersome and expensive
Lack of a comprehensive operations manual makes the system difficult to operate
Insufficient trace or log file details makes troubleshooting incomplete
Application-specific issues of performance impact not assessed for other applications
SLA adherence, as required by the business application, would be challenging
Monitoring servers, filesystems, databases, and applications in isolation will have gaps
Business application redundancy for failover is expensive in isolation
DevOps adoption and maturity for big data systems will benefit organizations in the following ways:
DevOps processes can be implemented as standalone or a combination of other processes
Automation frameworks will improve business efficiency
DevOps frameworks will help to build resilience into the application's code
DevOps processes incorporate SLAs for operational requirements
The operations manual (runbook) is prepared in development to aid operations
In matured DevOps processes, runbook-driven development is integrated
In DevOps processes, application-specific monitoring is part of the development process
DevOps planning considers high availability and disaster recovery technology
Resilience is built into the application code in-line with technology features
DevOps full-scripted installation facilitates fully automate deployment
DevOps operation team and developers are familiar with using logging frameworks
The non-functional requirements of operability, maintenance, and monitoring get sufficient attention, along with system development specifications
Continuous integration and continuous delivery eliminates human errors, reduces planned downtime for upgrades, and facilitates productivity improvements
In this lesson, we have learned about the concepts of DevOps, key market trends, along with business drivers leading to DevOps adoptions across systems like big data, cloud, data sciences, and so on. The business scenarios with ample examples for application of DevOps were presented. DevOps adoption with popular open source tools as detailed in coming lessons will enhance multifold productivity benefits to organizations.
In the next lesson, we will discuss the concepts of DevOps frameworks and best practices.
Which among the following are software advancements for in-memory computing?
Multi-core architecture allows massive parallel scaling
Main memory has scalable capacity
Partitioning of data
Fast prefetch unlimited size
Which among the following are hardware advancements for in-memory computing?
Multifold compression
No aggregate tables
Data compression
Row plus column storage
______consists of packaging the application and its dependencies all up in a box.
Jenkins
Docker
Ansible
Kubernetes
Which of the following are tools for source code management?
Splunk
Elastic box
Rackspace
Subversion
Which of the following are tools for release management?
StackStorm
Nagios
Logentries
Chef