Reader small image

You're reading from  Mastering Ubuntu Server - Fourth Edition

Product typeBook
Published inSep 2022
PublisherPackt
ISBN-139781803234243
Edition4th Edition
Concepts
Right arrow
Author (1)
Jay LaCroix
Jay LaCroix
author image
Jay LaCroix

Jeremy "Jay" LaCroix is a technologist and open-source enthusiast, specializing in Linux. He has a net field experience of 20 years across different firms as a Solutions Architect and holds a master's degree in Information Systems Technology Management from Capella University. In addition, Jay also has an active Linux-focused YouTube channel with over 250K followers and over 20M views, available at LearnLinuxTV, where he posts instructional tutorial videos and other Linux-related content. He has also written Linux Mint Essentials and Mastering Linux Network Administration, published by Packt Publishing.
Read more about Jay LaCroix

Right arrow

Troubleshooting Ubuntu Servers

So far, we’ve covered many topics surrounding Ubuntu Server and worked on some really fun projects. We’ve set up web servers, built automation, and even created infrastructure in the cloud. As the applications and services you’ve implemented age, your organization may depend on them more and more. But what happens if something your organization relies on suddenly becomes unavailable? What do you do when things don’t quite go according to plan?

While it’s impossible for us to account for every possible problem that may come up, there are some common places to look for clues when you run into a problem. In this chapter, we’ll take a look at some common starting points and techniques that you can utilize when it comes to troubleshooting issues with your servers. Building solid troubleshooting skills is an important focus, and with the concepts explored here, you’ll be well on your way.

In this chapter...

Evaluating the scope

When a problem occurs within your servers or network, your systems will exhibit one or more symptoms. Perhaps an application is much slower than normal, maybe users are unable to access the network, or a server suffers from total failure. There are many problems that can come up at any time, and it can be challenging to keep up.

Once you’ve identified the symptoms of the problem, the next goal is to identify the overall scope. Essentially, this means determining (as best you can) where the problem is most likely to reside, and how many systems and services are affected. Sometimes the root cause is obvious. For example, if none of your computers are receiving an IP address from your DHCP server, then you’ll know straight away to start investigating the logs on that particular server concerning its ability (or inability) to do the job designated for it. In other cases, the cause may not be so obvious. Perhaps you have an application that exhibits...

Conducting a root cause analysis

Once you resolve a problem on your server or network, you’ll immediately revel in the awesomeness of your troubleshooting skills. It’s a wonderful feeling to have fixed an issue, becoming the hero within your technology department. But you’re not done yet. The next step is looking toward preventing this problem from happening again. It’s important to look at how the problem started as well as steps you can take in order to help stop the problem from occurring again. This is known as a root cause analysis. A root cause analysis may be a report you file with your manager or within your knowledge-base system, or it could just be a memo you document for yourself. Either way, it’s an important learning opportunity.

A good root cause analysis has several sides to the equation. First, it will demonstrate the events that led to the problem occurring in the first place. Then, it will contain a list of steps that you’...

Viewing system logs

If you’re having trouble finding the root cause, or you just want more information regarding a problem that occurred, consider looking through log files. Linux has great logging capabilities, and many of the applications you may be running are writing log files as events happen. If there’s an issue, you may be able to find information about it within an application’s logs.

There are two primary methods of viewing logs. Historically, for most of Ubuntu’s life, you could simply inspect the log files that are stored within the /var/log directory. The files contained within that directory are standard files and directories, so you can use commands you’ve used in the past to view the contents of text files to view the contents of the log files within the /var/log directory as well. This method of viewing log files is slowly being aged out; however, the majority of applications still store their log files within that directory,...

Tracing network issues

It’s amazing how important TCP/IP networking is to the world today. Of all the protocols in use in modern computing, it’s by far the most widespread. But it’s also one of the most annoying situations to figure out when it’s not working well. Thankfully, Ubuntu features really handy utilities you can use in order to pinpoint what’s going on.

First, let’s look at connectivity. After all, if you can’t connect to a network, your server is essentially useless. In most cases, Ubuntu recognizes just about all network cards without fail, and it will automatically connect your server or workstation to your network if it is within reach of a DHCP server.

While troubleshooting, get the obvious stuff out of the way first. The following may seem like a no-brainer, but you’d be surprised how often one can miss something obvious. I’m going to assume you’ve already checked to make sure network cables...

Troubleshooting resource issues

I don’t know about others, but it seems that a majority of my time troubleshooting servers is usually spent pinpointing resource issues. By resources, I’m referring to CPU, memory, disk, input/output, and so on. Generally, issues come down to a user storing too many large files, a process going haywire that consumes a large amount of CPU, or a server running out of memory. In this section, we’ll go through some of the common things you’re likely to run into while administering Ubuntu servers.

First, let’s revisit topics related to storage. In Chapter 9, Managing Storage Volumes, we went over concepts related to this already, and many of those concepts also apply to troubleshooting as well. Therefore, I won’t spend too much time on those concepts here, but it’s worth a refresher in regard to troubleshooting storage issues. First, whenever you have users that are complaining about being unable to write...

Diagnosing defective RAM

All server and computing components can and will fail eventually, but there are a few pieces of hardware that seem to fail more often than others. Fans, power supplies, and hard disks definitely make the list of common things administrators will end up replacing, but defective memory is also a situation I’m sure you’ll run into eventually.

Although memory sticks becoming defective is something that could happen, I made it the last section in this chapter because unfortunately, I can’t give you a definitive list of symptoms to look out for that indicate that memory is the source of an issue. RAM issues are very mysterious in nature, and each time I’ve run into one, I’ve always stumbled across memory being bad only after troubleshooting everything else. It’s for this reason that nowadays I’ll often test the memory on a server or workstation first since it’s very easy to do. Even if memory has nothing...

Summary

While Ubuntu is generally a very stable and secure platform, it’s important to be prepared for problems occurring and to know how to deal with them. In this chapter, we discussed common troubleshooting we can perform when our servers stop behaving themselves. We started off by evaluating the scope, which gives us an understanding of how many users or servers are affected by the issue. Then, we looked into Ubuntu’s log files, which are a treasure trove of information that we can use to pinpoint issues and narrow down the problem. We also covered several networking issues that can come up, such as issues with DHCP, DNS, and routing. We certainly can’t predict problems before they occur, nor can we be prepared in advance for every type of problem that can possibly happen. However, applying sound logic and common sense to problems will go a long way in helping us figure out the root cause.

In the next chapter, we will take a look at preventing disasters...

Further reading

Join our community on Discord

Join our community’s Discord space for discussions with the author and other readers:

https://packt.link/LWaZ0

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Ubuntu Server - Fourth Edition
Published in: Sep 2022Publisher: PacktISBN-13: 9781803234243
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jay LaCroix

Jeremy "Jay" LaCroix is a technologist and open-source enthusiast, specializing in Linux. He has a net field experience of 20 years across different firms as a Solutions Architect and holds a master's degree in Information Systems Technology Management from Capella University. In addition, Jay also has an active Linux-focused YouTube channel with over 250K followers and over 20M views, available at LearnLinuxTV, where he posts instructional tutorial videos and other Linux-related content. He has also written Linux Mint Essentials and Mastering Linux Network Administration, published by Packt Publishing.
Read more about Jay LaCroix