The tech industry is constantly changing. The internet was born only a quarter of a century ago but has already transformed the way we live. Every day, over a billion people visit Facebook (https://www.facebook.com/zuck/posts/10102329188394581); every hour, approximately 18,000 hours of videos are uploaded on YouTube (https://merchdope.com/youtube-stats/); and every second, Google processes approximately 40,000 search queries (http://www.internetlivestats.com/google-search-statistics). Being able to handle such a staggering scale of activity isn't easy. Thanks to this book, you will have a practical guide to adopting similar philosophies, tooling, and best practices as those applied by the aforementioned companies. Through the use of Amazon Web Services (AWS), you will be able to build the key principles to manage and scale your infrastructure, your engineering processes, and your applications efficiently with minimal cost and effort.
This first chapter will explain in detail the new paradigms of:
- Thinking in terms of the cloud and not infrastructure
- Adopting a DevOps culture
- Deploying in AWS following DevOps best practices
The following anecdote relates the story of how I discovered that noise can damage hard drives.
In December 2011, sometime between Christmas and New Year's Eve, I received dozens of alerts from OnLive's monitoring system (that was my employer then). Apparently, we had just lost connectivity to our European data center in Luxembourg. I rushed to the network operations center (NOC) hoping that it was only a small glitch in our monitoring system, maybe just a joke after all; with so much redundancy, how could everything go offline? Unfortunately, when I got into the room, the big monitors were all red, which was not a good sign. This was just the beginning of a very long nightmare. An electrician working in our data center had mistakenly triggered the fire alarm; within seconds, the fire suppression system set off and released its aragonite on top of our server racks. Unfortunately, this kind of fire suppression system made so much noise when it released its gas that sound waves instantly killed hundreds and hundreds of hard drives, effectively shutting down our only European facility. It took months for us to get back on our feet.
Where is the cloud when you need it?!
Infor's CEO said it best at the AWS 2014 San Francisco Summit (https://aws.amazon.com/blogs/apn/friends-dont-let-friends-build-data-centers/):
– Charles Phillips
It wasn't long ago that tech companies, small and large, had to have a proper technical operations organization able to build out infrastructures.
The process went a little bit like this:
- Fly to the location you want to set up your infrastructure in to take a tour of different data centers and their facilities. Look at the floor considerations, power considerations, heating, ventilating, and air-conditioning (HVAC), fire prevention systems, physical security, and so on.
- Shop for an internet provider; ultimately, you are talking about servers and a lot more bandwidth, but the process is the same: you want to get internet connectivity for your servers.
- Once this is done, it's time to buy your hardware. Make the right decisions because you will probably spend a big portion of your company's money on buying servers, switches, routers, firewalls, storage, an uninterruptible power supply (UPS) for when you have a power outage, a kernel-based virtual machine (KVM), network cables, the labeler that is dear to every system administrator's heart, and a bunch of spare parts, hard drives, RAID controllers, memory, power cables, and much more.
- At this point, once the hardware is bought and shipped to the data center location, you can rack everything, wire all the servers, and power everything. Your network team can kick in and establish connectivity to the new data center using various links, configuring the edge routers, switches, top-of-rack switches, KVM, and firewalls (sometimes). Your storage team is next and will provide the much-needed network-attached storage (NAS) or storage area network (SAN); next, comes your sysops team, which will image the servers, sometimes upgrade the BIOS, configure hardware RAID, and finally put an OS on these servers.
Not only is this a full-time job for a big team, but it also takes a lot of time and money to even get the team and infrastructure in place.
As you will see in this book, getting new servers up and running with AWS will take us minutes. In fact, more than just providing a server within minutes, you will soon see how to deploy and run a service in minutes and just when you need it.
From a cost standpoint, deploying in a cloud infrastructure such as AWS usually ends up being a lot cheaper than buying your own hardware. If you want to deploy your own hardware, you have to pay upfront for all the hardware mentioned previously (servers, network equipment, and so on) and sometimes for licensed software as well. In a cloud environment, you pay as you go. You can add or remove servers in no time and will only be charged for the duration that the servers have been running. Also, if you take advantage of PaaS and SaaS applications, you usually end up saving even more money by lowering your operating costs as you don't need as many staff to administrate your database, storage, and so on. Most cloud providers, AWS included, also offer tiered pricing and volume discounts. As your service gets bigger and bigger, you end up paying less for each unit of storage, bandwidth, and so on.
As you just saw, when deploying in the cloud, you only pay for the resources you provision. Most cloud companies use this to their advantage to scale their infrastructure up or down as the traffic to their sites changes.
This ability to add or remove new servers and services in no time and on demand is one of the main differentiators of an effective cloud infrastructure. In the example that follows, we can see the amount of traffic hitting Amazon.com during the month of November. Thanks to Black Friday and Cyber Monday, the traffic triples at the end of the month:
If the company was hosting their service in an old-fashioned way, they would need to have enough servers provisioned to handle this traffic such that only 24% of their infrastructure was used on average during the month:
However, thanks to being able to scale dynamically, they are able to provide only what they really need and dynamically absorb the spikes in traffic that Black Friday and Cyber Monday trigger:
Here at Medium, you also see, on a very regular basis, the benefits of having fast auto-scaling capabilities. Very often, stories become viral and the amount of traffic going on Medium drastically changes. On January 21, 2015, to our surprise, the White House posted a transcript of the State of the Union Address minutes before President Obama started his speech:
As you can see in the following graph, thanks to being in the cloud and having auto-scaling capabilities, our platform was able to absorb five times the instant spike of traffic that the announcement caused by doubling the number of servers our front service used. Later, as the traffic started to drain naturally, we automatically removed some hosts from our fleet:
Cloud computing is often broken down into three different types of services, as follows:
- Infrastructure as a Service (IaaS): This is the fundamental block on top of which everything cloud-based is built. IaaS is usually a computing resource in a virtualized environment. It offers a combination of processing power, memory, storage, and network. The most common IaaS entities you will find are virtual machines (VMs), network equipment, such as load balancers or virtual Ethernet interfaces, and storage such as block devices. This layer is very close to the hardware and gives you the full flexibility that you would get deploying your software outside of a cloud. If you have any physical knowledge about data centers, it will also mostly apply to this layer.
- Platform as a Service (PaaS): This layer is where things start to get really interesting with the cloud. When building an application, you will likely need a certain number of common components, such as a data store and a queue. The PaaS layer provides a number of ready-to-use applications to help you build your own services without worrying about administrating and operating those third-party services such as database servers.
- Software as a Service (SaaS): This layer is the icing on the cake. Similar to the PaaS layer, you get access to managed services, but this time these services are a complete solution dedicated to certain purposes, such as management or monitoring tools.
This book covers a fair amount of services of the PaaS and SaaS types. When building an application, relying on these services makes a big difference when compared to the more traditional environment outside of the cloud.
Another key element to success when deploying or migrating to a new infrastructure is to adopt a DevOps mindset.
Running a company with a DevOps culture is all about adopting the right culture for developers and the operations team to work together. For that, DevOps culture precognizes implementing several engineering best practices by relying on tools and technologies that you will discover throughout the book.
DevOps is a new movement that officially started in 2009 in Belgium, when a group of people met at the first DevOpsDays conference, organized by Patrick Debois, to talk about how to apply some agile concepts to infrastructure.
Agile methodologies transformed the way software is developed. In a traditional waterfall model illustrated in the following diagram, a Product team comes up with specifications, a Design team then creates and defines a certain user experience and user interface, the engineering team then starts implementing the requested product or feature and hands off the code to a QA team, which tests and makes sure that the code behaves correctly according to the design specifications. Once all the bugs are fixed, a Release team packages the final code that can be handed off to the Technical Operations Team, which deploys the code and monitors the service over time:
The increasing complexity of developing certain software and technologies showed some limitations with this traditional waterfall pipeline.
The agile transformation addressed some of these issues, allowing for more interaction between the designers, developers, and testers. This change increased the overall quality of the products as these teams now had the opportunity to iterate more on product development; but apart from this, you would still be in a very classical waterfall pipeline:
All the agility added by this new process didn't extend past the QA cycles, and it was time to modernize this aspect of the software development life cycle. This foundational change to the agile process, which allows for more collaboration between the designers, developers, and QA teams, is what DevOps was initially after, but very quickly the DevOps movement started rethinking how developers and operations teams could work together.
In a non-DevOps culture, developers are in charge of developing new products and features and maintaining the existing code, but ultimately they are rewarded when their code is shipped. The incentive is to deliver as quickly as possible.
On the other hand, operations teams, in general, have the responsibility to maintain the uptime of production environments. For these teams, change is evil. New features and services increase the risk of having an outage, and therefore it is important to move with caution.
To minimize the risks of having outages, operations teams usually need to schedule any deployment ahead of time so that they can stage and test any production deployment and maximize their chances of success. It is also very common for the enterprise type of software companies to schedule maintenance windows and, in these cases, this means production changes can only be made a few times a quarter.
Unfortunately, a lot of times deployments won't succeed, and there are many possible reasons for that.
There is a certain correlation that can be made between the size of the change and the risk of introducing critical bugs in the product, as the following diagram demonstrates:
It is often the case that the code produced by developers works fine in a development environment but not in production. A lot of the time, that is because the production environment might be very different from other environments and some unforeseen errors may occur. The common mistakes are that in a development environment, services are collocated on the same servers or there isn't the same level of security, so services can communicate with one another in development but not in production. Another issue is that the development environment might not run the same versions of a certain library, and therefore the interface to communicate with them might differ. The development environment may be running a newer version of a service that has new features that production doesn't have yet, or it's simply a question of scale. The dataset used in development isn't as big as that used in production, and scaling issues might crop up once the new code is out in production.
The last dilemma relates to bad communication.
As Melvin Conway wrote in How Do Committees Invent? (proposing what is now called Conway's law (http://www.melconway.com/research/committees.html)):
In other words, the product you are building reflects the communication of your organization. A lot of the time, problems don't come from the technology but from the people and organization surrounding the technology. If there is any dysfunction among your developers and operations in the organization, this will show.
In a DevOps culture, developers and operations have a different mindset. They help to break down the silos that surround those teams by sharing responsibilities and adopting similar methodologies to improve productivity. They automate everything and use metrics to measure their success.
As we just said, a DevOps culture relies on a certain number of principles: source control everything, automate everything, and measure everything.
Revision control software has been around for many decades now, but too often only the product code is checked on. When practicing DevOps, not only is the application code checked but also its configuration, tests, documentation, and all the infrastructure automation needed to deploy the application in all environments, and everything goes through the regular review process.
Automated software testing predates the history of DevOps, but it is a good starting point. Too often, developers focus on implementing features and forget to add a test to their code. In a DevOps environment, developers are responsible for adding proper testing to their code. QA teams can still exist; however, similar to other engineering teams, they work on building automation around testing.
This topic could deserve its own book, but in a nutshell, when developing code, keep in mind that there are four levels of testing automation to focus on to successfully implement DevOps:
- Unit test: This is to test the functionality of each code block and function.
- Integration testing: This is to make sure that services and components work together.
- User interface testing: This is often the most challenging one to implement successfully.
- System testing: This is end-to-end testing. Let's take an example of a photo-sharing application. Here, the end-to-end testing could involve opening the homepage, signing in, uploading a photo, adding a caption, publishing the photo, and then signing out.
In the last few decades, the size of the average infrastructure and complexity of the stack has skyrocketed. Managing infrastructure on an ad hoc basis, as was once possible, is very error-prone. In a DevOps culture, the provisioning and configuration of servers, networks, and services in general are all done through automation. Configuration management is often what the DevOps movement is known for; however, as you all know now, it is just a small piece of a big puzzle.
As you will know, it is easier to write software in small chunks and deploy these new chunks as soon as possible to make sure that they are working. To get there, companies practicing DevOps rely on continuous integration and continuous deployment pipelines.
Whenever a new chunk of code is ready, the continuous integration pipeline kicks off. Through an automated testing system, the new code is run through all the relevant tests available. If the new code shows no obvious regression, the code is considered valid and can be merged to the main code base. At that point, without further involvement from the developer, a new version of the service (or application) that includes those new changes will be created and handed off to a system called a continuous deployment system.
The continuous deployment system will take the new builds and automatically deploy them to the different environments available. Depending on the complexity of the deployment pipeline, this might include a staging environment, an integration environment, and sometimes a preproduction environment but ultimately, if everything goes as planned without any manual intervention, this new build will get deployed to production.
One misunderstood aspect about practicing continuous integration and continuous deployment is that new features don't have to be accessible to users as soon as they are developed. In this paradigm, developers rely heavily on feature flagging and dark launches. Essentially, whenever you develop new code and want to hide it from the end users, you set a flag in your service configuration to describe who gets access to the new feature and how. At the engineering level, by dark launching a new feature that way, you can send production traffic to the service but hide it from the UI to see the impact it has on your database, or on performance, for example. At the product level, you can decide to enable the new feature for only a small percentage of your users to see if the new feature is working correctly and if the users who have access to the new feature are more engaged than the control group, for example.
Measure everything is the last major principle that DevOps-driven companies adopt. As W. Edwards Deming said: "If you can't measure it, you can't manage it" DevOps is an ever-evolving process that feeds off those metrics to assess and improve the overall quality of the product and the team working on it.
From a tooling and operating standpoint, here are some of the metrics most organizations look at:
- Check how many builds a day are pushed to production.
- Check how often you need to roll back production in your production environment (this is indicated when your testing hasn't caught an important issue).
- The percentage of code coverage.
- Frequency of alerts resulting in paging the on-call engineers for immediate attention.
- Frequency of outages.
- Application performance.
- Mean time to resolution (MTTR), which is the speed at which an outage or a performance issue can be fixed.
At the organizational level, it is also interesting to measure the impact of shifting to a DevOps culture. While it is a lot harder to measure, you can consider the following points:
- The amount of collaboration across teams
- Team autonomy
- Cross-functional work and team efforts
- Fluidity in the product
- Happiness among engineers
- Attitude toward automation
- Obsession with metrics
As you just saw, having a DevOps culture means, first of all, changing the traditional mindset that developers and operations are two separate silos and make both teams collaborate more during all phases of the software development life cycle.
In addition to a new mindset, DevOps culture requires a specific set of tools geared toward automation, deployment, and monitoring:
Amazon with AWS offers a number of services of the PaaS and SaaS types that will let us do just that.
AWS is at the forefront of cloud providers. Launched in 2006 with Amazon Simple Queue Service (SQS) and Amazon Elastic Compute Cloud (EC2), Amazon quickly became the biggest IaaS provider.
They have the biggest infrastructure, the biggest ecosystem, and constantly add new features and release new services. In 2015, they passed 1 million active customers. Over the last few years, they have managed to change people's mindsets about the cloud, and now deploying new services to the cloud is the norm.
Using AWS's managed tools and services is a drastic way to improve your productivity and keep your team lean.
Amazon continually listens to its customers' feedback and looks at the market trends. Therefore, as the DevOps movement started to get established, Amazon released a number of new services tailored toward implementing some DevOps best practices. In this book, you will also see how these services synergize with the DevOps culture.
Amazon's services are like Lego pieces. If you can picture your final product, then you can explore the different services and start combining them the way you would assemble a Lego kit, in order to build the supporting stack needed to build your product quickly and efficiently. Of course, in this case, the "if" is a big if, and unlike Lego, understanding what each piece can do is a lot less visual and colorful than Lego pieces are. This is why this book is written in a very practical way; throughout the different chapters, we are going to take a web application and deploy it like it's our core product. We will see how to scale the infrastructure supporting it so that millions of people can use it and finally make it more secure. And, of course, we will do this following DevOps best practices.
By going through that exercise, you will learn how AWS provides a number of managed services and systems to perform a number of common tasks such as computing, networking, load balancing, storing data, monitoring, programmatically managing infrastructure and deployment, caching, and queueing.
As you saw earlier in this chapter, having a DevOps culture is about rethinking how engineering teams work together by breaking these development and operations silos and bringing a new set of tools to implement the best practices.
AWS helps in many different ways to accomplish this. For some developers, the world of operations can be scary and confusing, but if you want better cooperation between engineers, it is important to expose every aspect of running a service to the entire engineering organization. As an operations engineer, you can't have a gatekeeper mentality toward developers; instead, it's better to make them comfortable accessing production and working on the different components of the platform. A good way to get started with this is in the AWS console:
While it may be a bit overwhelming, it is still a much better experience for people not familiar with this world to navigate this web interface, rather than referring to constantly out-of-date documentation, using SSH and random plays, to discover the topology and configuration of the service.
Finally, as you have seen briefly in the previous section, AWS offers a number of services that fits DevOps methodologies and will ultimately allow us to implement complex solutions in no time.
Some of the major services you will use are, at the compute level, EC2, the service to create virtual servers. Later, as you start looking into how to scale our infrastructure, you will discover Auto Scaling groups, a service that lets you scale pools on EC2 instances to handle traffic spikes and host failures. You will also explore the concept of containers with Docker via Amazon Elastic Container Service (ECS). Lastly, you will create serverless functions via Lambda to run custom code without having to host it on our servers.
To implement our continuous integration and continuous deployment system, you will rely on four services: Amazon Simple Storage Service (S3), the object store service that will allow us to store our artifacts; CodeBuild,which will let us test our code; CodeDeploy, which will let us deploy artifacts to our EC2 instances; and finally CodePipeline, which will let you orchestrate how code is built, tested, and deployed across environments.
To monitor and measure everything, you will rely on CloudWatch and later ElasticSearch/Kibana to collect, index, and visualize metrics and logs. To stream some of our data to these services, you will rely on AWS Kinesis. To send email and SMS alerts, you will use the Amazon Simple Notification Service (SNS).
For infrastructure management, you will rely heavily on CloudFormation, which provides the ability to create templates of infrastructure.
In the end, as you explore ways to better secure our infrastructure, you will encounter Inspector and Trusted Advisor, and explore the AWS Identity and Access Management (IAM) and the Virtual Private Cloud (VPC) services in more detail.
In this chapter, you learned that adopting a DevOps culture means first and foremost changing the way traditional engineering and operations teams operate. Instead of being two isolated teams with opposing goals and responsibilities, companies with a DevOps culture take advantage of the complementary domains of expertise to collaborate better through converging processes and using a new set of tools.
These new processes and tools include not only automating everything from testing to deployment through infrastructure management, but also measuring everything so that you can improve each process over time.
When it comes to cloud services, AWS is leading the effort, with more services than any other cloud provider. All these services are usable via APIs and SDKs, which is good for automation; in addition, AWS has tools and services for each key characteristic of the DevOps culture.
In Chapter 2, Deploying Your First Web Application, you are finally going to get your feet wet and start using AWS. The final goal of the chapter will be to have a Hello World application accessible to anyone on the internet.