Choosing the Right Linux Distribution
In this chapter, we will dive into the Linux world from the very beginning. We will briefly touch on Linux history, explain what a distribution is, and explain what to take into account when choosing one for production use. You are not expected to know anything about Linux, its administration, or the cloud. If you don’t understand some words that we use, worry not. There shouldn’t be a lot of confusing terminology in this chapter, and if there is, we will explain it in later chapters. When you finish reading this chapter, you should be able to understand why there are so many Linuxes out there, how much you should expect to pay for it, and how to think about choosing the right Linux for yourself.
In this chapter, we will cover the following main topics:
- What is Linux and what is a Linux distribution?
- What can you use to help you make the right decision?
- Several major Linux distributions that are quite popular today
This chapter doesn’t have any technical requirements. We won’t run any commands or install any software yet. This will come in the later chapters.
The code we’re presenting in the book is available in the public GitHub repository for your consideration at this address: https://github.com/PacktPublishing/The-Linux-DevOps-Handbook.
What exactly is a Linux distribution?
Linux is the standard operating system for cloud workloads. However, there is not a single Linux operating system that goes by that name. Its ecosystem is quite complex. This comes from how it came to be originally.
Long before Linux was conceived by its creator, Linus Torvalds, there was Unix. Unix source code was – for legal reasons – licensed to anyone who bought it, thus making it very popular among many institutions. This included universities. The code, however, was not entirely free. This didn’t sit well with many people, who believed that software should be free – as in speech or beer – including source code. In the 1980s, a completely free and open implementation of Unix was born under the aegis of the GNU Project. Its goal was to develop an operating system that gave complete control over a computer to the user. The project was successful in that it was able to produce all the software required to run an operating system, except one thing – the kernel.
In 1991, Finnish student Linus Torvalds famously announced his hobby kernel – Linux. He called it at that time “just a hobby – won’t be big and professional like GNU.” It wasn’t supposed to get big and popular. The rest is history. The Linux kernel became popular among developers and administrators and became the missing piece of the GNU Project. Linux is the kernel, the GNU tools are the so-called userland, and together they make the GNU/Linux operating system.
The preceding short story is important to us for two reasons:
- While the GNU userland and the Linux kernel is the most popular combination, you’ll see it is not the only one.
- Linux delivers a kernel and the GNU Project delivers userland tools, but they have to be somehow prepared for installation. Many people and teams had separate ideas on how to do it best. I will expand on this thought next.
The way that a team or a company delivers a GNU/Linux operating system to end users is called a distribution. It facilitates operating system installation, the means to manage software later on, and general notions on how an operating system and the running processes have to be managed.
What makes distributions different?
The open nature of Linux and the GNU Project made it possible for almost anyone to create their own distribution. One of the things that made new users dizzy was the sheer amount of Operating System (OS) versions they could use. The surefire way to start a holy war between Linux users is by asking which distribution is the best.
One of the ways we can group Linux distributions is the format in which they deliver the software (packages) and additional software used to install and remove that software (package managers). There are a number of them, but the two most prevalent are RPM (RPM Package Manager) and DEB packages. Packages are more than just an archive with binaries. They contain scripts that set the software up for use – creating directories, users, permissions, log rules, and a number of other things that we will explain in later chapters.
The RPM family of distributions starts with Red Hat Enterprise Linux (RHEL), created and maintained by the Red Hat company. Closely related is Fedora (a free community distribution sponsored by Red Hat). It also includes CentOS Linux (a free version of RHEL) and Rocky Linux (another free version of RHEL).
The DEB distributions include Debian (where the DEB packages originate from) – a technocracy community project. From the Debian distribution arose a number of distributions based on it, using most of its core components. Most notable is Ubuntu, a server and desktop distribution sponsored by Canonical.
Another way of grouping distributions is by their acceptance of closed software – software that limits the distribution of source code, binaries, or both. This can mean hardware drivers, such as the ones for NVIDIA graphic cards, and user software, such as movie codecs that allow you to play streamed media and DVD and Blu-Ray discs. Some distributions make it easy to install and use them, while some make it more difficult, arguing that we should strive for all software to be open source and free (as in speech and as in beer).
Yet another way to differentiate them is the security framework a given distribution uses. Two of the most notable ones are AppArmor, used mainly by Ubuntu, and SELinux (from the USA’s National Security Agency), used by, among others, Red Hat Enterprise Linux (and its derivatives) and Fedora Linux.
It’s also worth noting that while most Linux distributions use the GNU Project as the userland, the popular one in the cloud Alpine Linux distribution uses its own set of software, written especially with minimum size in mind.
Looking at how the distribution is developed, it can be community-driven (without any commercial entity being an owner of any part of the process and software – Debian being one prime example), commercial (wholly owned by a company – RHEL being one example and SuSE another), and all the mixes in between (Ubuntu and Fedora being examples of a commercially owned distribution with a large body of independent contributors).
Finally, a way we can group distributions is by how well they facilitate the cloud workload. Here, we can look at different aspects:
- The server side: How well a given distribution works as an underlying OS for our infrastructure, virtual machines, and containers
- The service side: How well a given distribution is suited to run our software as a container or a virtual machine
To make things even more confusing and amusing for new adopters, each distribution can have many variants (called flavors, variants, or spins, depending on the distribution lingo) that offer different sets of software or default configurations.
And to finally confuse you, dear reader, for use on a desktop or laptop, Linux offers the best it can give you – a choice. The number of graphical interfaces for the Linux OS can spin the head of even the most experienced user – KDE Plasma, GNOME, Cinnamon Desktop, MATE, Unity Desktop (not related to the Unity 3D game engine), and Xfce. The list is non-exhaustive, subjective, and very limited. They all differ in the ease of use, configurability, the amount of memory and CPU they use, and many other aspects.
The number of distributions is staggering – the go-to site that tracks Linux distributions (https://distrowatch.com/dwres.php?resource=popularity) lists 265 various Linux distributions on its distribution popularity page at the time of writing. The sheer number of them makes it necessary to limit the book to three of our choosing. For the most part, it doesn’t make a difference which one you choose for yourself, except maybe in licensing and subscription if you choose a commercial one. Each time the choice of distribution makes a difference, especially a technical one, we will point it out.
Choosing a distribution is more than just a pragmatic choice. The Linux community is deeply driven by ideals. For some people, they are the most important ideals on which they build their lives. Harsh words have been traded countless times over which text editor is better, based on their user interface, the license they are published with, or the quality of the source code. The same level of emotion is displayed toward the choice of software to run the WWW server or how to accept new contributions. This will inevitably lead to the way the Linux distribution is installed, what tools there are for configuration and maintenance, and how big the selection of software installed on it out of the box is.
Having said that, we have to mention that even though they have strong beliefs, the open-source community, the Linux community included, is a friendly bunch. In most cases, you’ll be able to find help or advice on online forums, and the chances are quite high that you will be able to meet them in person.
- Is the software you wish to run supported on the distribution? Some commercial software limits the number of distributions it publishes packages for. It may be possible to run them on unsupported versions of Linux, but this may be tricky and prone to disruptions.
- Which versions of the software you intend to run are available? Sometimes, the distribution of choice doesn’t update your desired packages often enough. In the world of the cloud, software a few months old may already be outdated and lack important features or security updates.
- What is the licensing for the distribution? Is it free to use or does it require a subscription plan?
- What are your support options? For community-driven free distributions, your options are limited to friendly Linux gurus online and in the vicinity. For commercial offerings, you can pay for various support offerings. Depending on your needs and budget, you can find a mix of support options that will suit your requirements and financial reserves.
- What is your level of comfort with editing configuration files and running long and complex commands? Some distributions offer tools (both command-line and graphical) that make the configuration tasks easier and less error-prone. However, those tools are mostly distribution-specific, and you won’t find them anywhere else.
- How well are cloud-related tools supported on a given distribution? This can be the ease of installation, the recency of the software itself, or the number of steps to configure for use.
- How well is this distribution supported by the cloud of your choosing? This will mean how many cloud operators offer virtual machines with this distribution. How easy is it to obtain a container image with this distribution to run your software in it? How easy do we suspect it to be to build for this distribution and deploy on it?
- How well is it documented on the internet? This will not only include the documentation written by distribution maintainers but also various blog posts and documentation (mainly tutorials and so-called how-to documents) written by its users.
So far, you’ve learned what a Linux distribution is, how distributions differentiate from one another, and what criteria you can use to actually choose one as the core of the system you will manage.
Introducing the distributions
After that bit of a lengthy but condensed history of the Linux OS, it is time to finally explore the few we have chosen to cover in this book. In this section, we will cover the factors we just listed, as we believe they are important in making a decision. Please remember though that while we strive to present you with objective facts and valuations, we cannot escape our own subjective views. Always evaluate on your own before you choose, as it’s highly possible that you will stick with this distribution for many years to come.
A point to note is that we won’t be covering distributions comprehensively. We will only try to create a foundation on which you, dear reader, must build through research.
Debian (https://www.debian.org/) is one of the oldest active Linux distributions. Its development is led by the community-supported Debian Project. It is known for two things – the sheer number of packages that the distribution provides and the slow release of stable versions. The latter has improved in recent years and stable releases are published every two years. Software is delivered in archives called packages. Debian packages’ names have a
.deb file extension and are colloquially called debs. They are kept online in repositories and repositories are broken down into pools. Repositories offer almost 60,000 packages with software in the latest stable release.
Debian always has three versions available (so-called branches) – stable, testing, and unstable. The releases are named after characters from the Toy Story movie franchise. The latest stable release – version 11 – is called Bullseye.
The unstable branch is the rolling branch for developers, people who like living on the edge, or those who require the newest software more than they require stability. Software is accepted into the unstable branch with minimal testing.
The testing branch is where, as the name implies, the testing happens. A lot of testing happens here, thanks to the end users. Packages come here from the unstable branch. The software here is still newer than in the stable branch but not as fresh as in the unstable branch. A few months before the new stable release, the testing branch is frozen. It means that no new software will be accepted, and new versions of the already accepted packages are allowed only if they fix bugs.
Debian is viewed as the most stable distribution there is, and it is used as a platform for various compute clusters, so it is generally installed on bare-metal servers somewhere in a rack in a data center and intended for use consistently over many years.
According to W3Techs (https://w3techs.com/technologies/details/os-linux), Debian makes up for 16% of all servers running on the internet. Its derivative, Ubuntu, runs 33% of them. Together, they account for 49% of all servers. This makes administration skills related to Debian highly marketable.
The Ubuntu Linux distribution (https://ubuntu.com/) is widely credited for making Linux popular on personal computers, and rightly so. Sponsored by Canonical, its mission was to make Linux easily usable for most people. It was one of the first, if not the first, Linux versions to distribute non-free and non-open binary drivers and libraries that made desktop use simpler and more comfortable.
Famously, the first bug report opened for Ubuntu distribution by Mark Shuttleworth (Canonical and Ubuntu founder) was, “Microsoft has majority market share.”
The distribution itself is based on Debian Linux, and in the beginning, being fully binary-compatible was one of the major objectives. As the development has progressed, this has lost some of its importance.
This distribution is developed by the community and Canonical. The main source of income for the company is premium services related to Ubuntu Linux – support, training, and consultations.
Due to the very close-knit relationship between Debian Linux and Ubuntu Linux, many developers and maintainers for one distribution serve the same roles in the other one. This results in a lot of software being packaged for both distributions in parallel.
Ubuntu has three major flavors – Desktop, Server, and Core (for the internet of things). Desktop and Server may differ slightly in how services are configured out of the box, and Core differs a lot
The software is distributed in
.deb packages, the same as with Debian, and the sources are actually imported from the Debian unstable branch. However, this doesn’t mean you can install Debian packages on Ubuntu or vice versa, as they are not necessarily binary-compatible. It should be possible to rebuild and install your own version.
There are four package repositories per release – the free and non-free software supported officially by Canonical is called main and restricted, respectively. Free and non-free software delivered and maintained by the community is called universe and multiverse, respectively.
A word of advice – a widely accepted practice of system upgrades between major versions is to wait for the first sub-release. So, if the currently installed version of the distribution is 2.5 and the new version 3.0 is released, it is wise to wait until 3.1 or even 3.2 is released and upgrade then. This is applicable to all the distributions we list here.
The Long-Term Support (LTS) versions are supported for five years. A new LTS version is released every two years. It is possible to negotiate extended support. This gives a very good timeline to plan major upgrades. A new Ubuntu version is released every six months.
Ubuntu Linux is widely adopted in education and government projects. Famously, the city of Munich, between 2004 and 2013, migrated over 14,000 municipal desktop computers to a variant of Ubuntu with the KDE desktop environment. While the migration saw disturbances politically – other operating system vendors lobbied strongly against this migration – it was considered a success technically.
Ubuntu is the Linux of choice for personal computers. Canonical works very closely with hardware vendors, notably Lenovo and Dell, but lately also with HP, to ensure full compatibility between the distribution and the computers. Dell sells its flagship laptops with Ubuntu preinstalled.
Several sources cite Ubuntu Linux as the most installed Linux distribution on servers and personal computers. The actual number can only be estimated, as Ubuntu doesn’t require any subscription or registration.
As a byproduct of Ubuntu Linux’s popularity, software vendors, more often than not, offer
.deb packages of their software, if they release a Linux version. This is especially true for desktop software.
The amount of unofficial versions, clones, or modified distributions based on Ubuntu is staggering.
Ubuntu has a very active community, both organized and unorganized. It’s quite easy to get a hold of a group of users near your city. This also directly translates to the amount of tutorials and documentation on the internet.
Ubuntu Linux, especially under a support plan, is installed as a foundation for many cloud computing infrastructure deployments. Many telecoms, banking, and insurance companies have chosen Ubuntu Server as their foundation.
Red Hat Enterprise Linux (RHEL)
RHEL (https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux) is a spiritual successor of Red Hat Linux and is developed and maintained by Red Hat Inc. (https://www.redhat.com/). Its main target is the commercial entities market. It is possible to use RHEL for free for development or in production with up to 16 servers (at the time of writing). However, the main advantage of this distribution is the enormous pool of articles that help solve issues and the assistance of support engineers, although the latter can only be acquired through a paid support plan.
RHEL is considered a very stable and solid distribution. It is one of the main choices for banks, insurance companies, and financial markets. It lacks many popular desktop software packages, but on the server side of things, especially as an OS to run other commercial applications, it is a first-class citizen.
The software is distributed in online repository packages that end with
.rpm, hence the name RPMs. The main tool to administer the packages is RPM, with more sophisticated tools – yum, and lately its successor, dnf – also available.
In the true spirit of an open source-based company, Red Hat makes sources for its distribution available. This has led to the creation of a famous free and open clone of RHEL – CentOS. Until fairly recently, it had been quite a popular choice for people who wanted to use RHEL but didn’t want to, or couldn’t, pay a subscription. In 2014, CentOS joined the Red Hat company, and in 2020, Red Hat announced that the versioned releases of CentOS would no longer be available; there would only be the so-called rolling release, which constantly updates packages and does not mirror the RHEL releases. This resulted in a very heated reaction from CentOS users. The original CentOS founder, Gregory Kurtzer, started another clone of RHEL called Rocky Linux. Its main objective is the same as the original CentOS – to deliver a free, open, and community-driven distribution, fully binary-compatible with RHEL.
The RHEL distribution delivers stable versions every few years and supports them for 10 years, starting from release 5. The full support, however, is offered only for a few years. For the rest of the time, Red Hat provides only security fixes and critical updates for your systems, with no new package versions being introduced. Still, this life cycle is what users with large installations or mission-critical systems came to like.
As with Ubuntu, it is possible to negotiate extended support time.
The Red Hat company has a turbulent relationship with the open source community. While the company mostly plays fair, there have been some decisions that the community didn’t like. Lately, it was Red Hat’s decision to change the CentOS release model to a rolling release (https://lists.centos.org/pipermail/centos-announce/2020-December/048208.html).
Fedora (https://fedoraproject.org/wiki/Fedora_Project_Wiki) is a distribution associated with the Red Hat company. While more than 50% of its developers and maintainers are community members not affiliated with Red Hat, the company holds full stewardship over the development. It is a RHEL upstream, which means that this is the real development frontend for the actual RHEL. It doesn’t mean that everything from Fedora is included in the release of RHEL. However, following Fedora closely will yield insight into the current direction of the RHEL distribution.
Contrary to RHEL, for which Fedora is the foundation, the new releases happen every six months. It uses the same package type as RHEL, RPM.
CentOS (https://centos.org) used to be the go-to free version of RHEL. The name is an acronym for Community Enterprise Operating System. Its main goal was to be fully binary-compatible with RHEL and adhere to the same releases and numbering scheme. In 2014, CentOS joined Red Hat, but it was promised that the distribution would keep its independence from the company while benefiting from development and testing resources. Unfortunately, in 2020, Red Hat announced that CentOS 8 would be the last numbered release, and from then on, CentOS Stream would be the only variant. CentOS Stream is a midstream version. This means it sits in the middle between bleeding-edge and fast-paced Fedora and stable and production-ready RHEL. The difference between CentOS Stream and CentOS is that Stream is a development variant, while CentOS was simply a rebuilt and repackaged mirror of the actual final product, RHEL.
All the knowledge, skills, and experience gained when working with RHEL are 100% applicable to CentOS. Since CentOS is the third most-deployed Linux distribution on servers, according to W3Techs (https://w3techs.com/technologies/details/os-linux), the skills are very marketable.
As a response to the situation with the CentOS distribution, its founder announced the creation of Rocky Linux (https://rockylinux.org/). The goals are the same as the original CentOS. The release scheme and numbering follow RHEL. Shortly after the announcement, the GitHub repository of Rocky Linux became top trending (https://web.archive.org/web/20201212220049/https://github.com/trending). Rocky Linux is 100% binary-compatible with CentOS. The project has released a set of tools that easily migrate from CentOS to Rocky Linux without reinstalling the system.
The distribution is quite young, having been founded in 2020, and its popularity is still to be determined. It has made a lot of noise in the community, and it seems that a steady stream of CentOS users have moved to Rocky Linux as their preferred choice.
A very important contribution to the open source world from the Rocky Linux project is the build system. It ensures that even if Rocky Linux shuts down, the community will be able to easily start up a new RHEL clone.
Alpine Linux (https://alpinelinux.org/) is an interesting one. The main programming library and most basic command-line tools are not from the GNU Project. Also, the services management system, currently systemd in most distributions, is uncommon. This makes some of the administration skills from other major distributions non-applicable. The strength of Alpine lies in its size (which is rather small), its security-first mindset, and one of the fastest boot times among existing Linux distributions. Those characteristics, with the boot time being admittedly more important, make it the most popular choice for containers. If you run containerized software or build your own container images, it is very likely that it is on Alpine Linux.
Alpine has its roots in the LEAF (Linux Embedded Appliance Framework; see: https://bering-uclibc.zetam.org/wiki/Main_Page) project – a Linux distribution that fits on a single floppy disk. LEAF is currently a popular choice for embedded markets, routers, and firewalls. Alpine is a bigger distribution, but that sacrifice had to be made, since developers wanted to include several useful but rather large software packages.
The package manager is called apk. The build system is borrowed from another distribution called Gentoo Linux. As Gentoo builds software as it installs it, the portage obviously contains a lot of logic around building software that is used as a part of an OS.
Alpine can run from RAM entirely. There’s even a special mechanism that allows you to initially only load a few required packages from the boot device, and it can be achieved using Alpine’s Local Backup Utility (LBU).
As mentioned before, this is a preferred distribution for container images. You won’t see it running on a large server installation often, if at all. When we cross over to the cloud world, chances are you’ll see a lot of Alpine Linux.
Having said that, every single one of those distributions has a variant for the cloud as a container base image – a way to run your software in the true cloud way.
In this chapter, you learned the basics of popular Linux distributions and how they are different from one another. You should now have some understanding of what you can choose from and what consequences you will need to face – good and bad. To give you an even better idea of how to interact with some cherry-picked Linux distributions, we will look at how to interact with a system using your keyboard in Chapter 2.
The short list in this chapter is just a tiny portion of the Linux distributions available. The list is largely based on the popularity and marketability of skills, as well as our own experience and knowledge that we acquired over the years. They are not, by any means, the best or only choices that you have.
We tried to point out where the main strengths lie and what a user’s relationship is with respective distributions.
It’s not likely we were able to answer all your questions. Each of the Linux distributions from our list has its own book out there, and there is even more knowledge on blogs, wikis, and YouTube tutorials.
In the next chapter, we will dive into the magical world of the command line.