Threat hunting is a concept that can bring to mind a myriad of different images and ideas. It is a concept that is shrouded in mystery for some, while others might have been able to hone it down to a science, perhaps going as far as applying their findings in new ways. The line that separates these two groups is an understanding that this idea of hunting is, in reality, a loosely based concept that is molded for each unique situation, environment, and the personnel involved.
In the event that you have not heard of this concept of threat hunting before, it is very helpful to understand that there is not a single cookie-cutter cybersecurity solution for any network, enterprise, or incident. A single solution simply does not and cannot exist. There are millions of variables and conditions, both technical and organizational, that will differentiate one organization's network from another. The simple appearance of security might be a deterrent for some adversaries against a target and a challenge to others.
Even if an organization does all of the correct steps, such as ensuring that the network is architected with proper layered defenses, vulnerabilities are thoroughly analyzed, and risks are minimized, there are still important protections to enforce. A continual improvement process must be in place to review all the previous findings to see how the environment has changed. Threat hunting is a critical part of that process for organizations looking to mature their cybersecurity posture and improve their resilience in the digital world.
Of the countless threat hunting events we have had the pleasure of taking part in or observing, no two were ever the same. Each hunt was tailored to the particular technical resources available, enterprise in question, perceived threat, personnel assigned, and business requirements of the client. The aim of this book is to provide you with foundational concepts and requirements needed to take a generic threat hunting framework and mold it into something that will fit a particular use case that a customer would be willing to accept based upon what they are experiencing. This framework will allow you to understand how to build a threat hunting team and define and respond in future hunts to meet business needs while minimizing resource waste and non-value-added efforts.
In this chapter, we will be covering the following topics
- Incident response life cycle
- Why is threat hunting important?
- Application of detection levels
- Book layout
By the end of this chapter, you will be able to do the following:
- Comprehend the difference between cyber threat hunting and other types of cyber defense functions.
- Discuss how threat hunting fits into the NIST incident response life cycle.
- Comprehend the importance of conducting effective threat hunting missions.
Incident response life cycle (hunting as proactive detection)
There are numerous different incident response life cycles that can be found through a short search across the internet. To keep things simple, any time this book references the incident response life cycle, it will be alluding to the one found in the following diagram:
The cycle always starts out with a Preparation phase, regardless of whether it is done purposefully or not. The following two steps, Detection and Analysis and Containment, Eradication, and Recovery, are cycled between as new information is identified and cases expanded. Once everything has been recovered, there will be a Post-Incident Activity phase in which a review of the events can be conducted without any pressure to recover. Good practices can be encouraged and bad practices pruned. Let's take a closer look at each of these phases.
Plan for incidents, document assets and actions, architect for secure solutions, baseline the network, and so on. This is where an organization will prepare for the employment of cybersecurity resources. Even if they completely outsource their risk and response to another entity, the owning organization will take part in this phase. There will always be a level of preparation completed; sometimes it just happens to be that the organization decides not to prepare at all.
Some such examples of activities found within this phase include measuring baseline network activity, reviewing and documenting standard processes, and stress testing response scenarios. For example, if a virus was found on a network, how would the administrators respond? Preparation would allow them to understand the best course of action in relation to the business priorities that would allow them to minimize risk to the organization and its priorities. With inadequate preparation, the next few phases will be purely responsive with a higher level of risk to the organization.
Detection and analysis
During this phase, the organization will identify what is perceived to be benign and what is potentially malicious. This includes detection of activity, analysis of that activity, and a full-scope investigation as needed to determine the root cause and scope of the event. Cyber threat hunting is only a part of this step. The threat hunting step can be iterated over and over before a vulnerability or incident is identified that requires containment, eradication, and recovery. It does well to understand that this phase does not have to be completed by the organization that owns the network. Detection of an event can come from any number of places, including government agencies, hacktivists, underground hacking forums, and news sites.
Some examples of activities found within this phase include monitoring antivirus and firewall lows, comparison of baseline network activity against current network activity, and threat hunting. Anything that brings a particular activity to the focus of a cyber defender could fall under this phase of the cycle.
Containment, eradication, and recovery
Slow down, remove, and recover from the realization of a vulnerability that was exploited. The overarching goal is for the enterprise and organization to leave this phase operating at whatever the previously defined concept of normal was. This phase is largely dependent upon the planning that was conducted during the first phase because it will outline the methods in which the recovery activities are executed. If these actions were not properly planned or completed poorly during the first phase, then this phase will be an extreme struggle in a time of already heightened stress. One item of note is that it is expected for the middle phases of this life cycle to loop back and forth as new information is identified and additional pieces of the puzzle on the adversary are put into place. There will be a clear stopping point: all key data points have been identified and recovered from or all funds for the incident have been expended.
This phase is dependent upon the thoroughness of the previous phase. Some example activities include the locking of accounts, the implementation of additional firewall rules, and having users retake cybersecurity awareness training. Any activity that helps reset the network back to the previous baseline without the offending action could be included in this phase. Many of the organizational-level activities that occur in this phase will be outside the scope of a traditional threat hunting team.
This phase is intended to ensure that the risk is removed and the vulnerability is not exploited again. Within this phase, the organization will attempt to learn from the incident that occurred and the recovery that took place. Unfortunately, at this point, the organization and defenders are normally tired of the whole event and want to be done. This phase is the most overlooked and underaccomplished of the four phases, which explains why many organizations are compromised in the same way repeatedly. Everyone must learn from the correct and incorrect things that occurred in order to not repeat the mistakes of the past. Failure to do so is inviting those same things to happen again to the detriment of the organization.
Some examples of activities that take place include the incident response debrief for an intrusion and the reviewing of patching policies. Many of the organizational-level activities that occur in this phase will be outside the scope of a traditional threat hunting team.
There are many activities that can occur in each phase of the incident response life cycle with stakeholders taking part in some or all of the phases. The most important takeaway to have when working through this cycle is to understand which phase you are in and what you are intending to accomplish. Follow the process and employ the correct teams and personnel as needed. If an adversary is just discovered, do not jump ahead and attempt to begin the removal of any artifacts that are found.
Why is threat hunting important?
Reactive detection methods, such as utilizing signatures of known malicious files (hashes) or monitoring for behaviors synonymous with an attack (heuristics), can fail for a number of reasons. Detection based on known hashes can easily fail as it is simple to change a known malicious file just enough to bypass standard and even advanced antivirus solutions. Any free hex editor can be used to modify a file with a single bit and bypass this defense. Heuristics can also fail as they rely on known bad behaviors while attempting to account for expected administration behavior on the network. This does little for the unknown bad behaviors that are evolving in the threat actors' environments.
Taking the opposite approach and whitelisting known good behavior and applications is a method that an enterprise can take to create a zero-trust environment. The truth behind this concept is that very few organizations can and should fully implement this type of construct. This method is extremely resource-intensive to deploy across an enterprise while keeping services up to date as software and people change. Even then, someone who is masquerading as a legit user following that user's normal behavior could operate under the defense's thresholds.
A proactive detection method such as threat hunting doesn't wait for an alert and doesn't require the administrative overhead to whitelist all approved actions. Threat hunting takes into account the current vulnerabilities, environment, and processes to apply human expertise against the evidence. Threat hunting allows an organization to apply a force multiplier to their cybersecurity processes by augmenting the automated and administrated defenses.
Another reason why threat hunting is important is that it provides a focus for cybersecurity that is from an entirely different point of view (POV) than is normally found in a Security Operations Center (SOC). This different POV eschews the alarms and tools associated with them. Threat hunting wants to look directly at the evidence on the endpoints to determine whether there was some activity that was missed or the SOC tools haven't been updated to detect.
While there are many different methods of detecting adversarial behavior on a network, they can all be put into one of two categories – reactive or proactive. Think of reactive detection like a building alarm that is triggered when a window is opened. Once triggered, security will go and investigate what happened and why that window was opened. Proactive detection, of which threat hunting is one method of detection, does not wait for an alarm to go off. Using the same analogy, this would be a security guard who patrols the building looking for unlocked windows even though no alarms have gone off.
The following is a real-world example:
- Location: High-security facility.
- Reaction detection methods: Alarms on doors and windows; each door is automatically secured with a locking mechanism; entry is protected by a radio frequency identification (RFID) badging in/out system; motion detectors for after business hours or in restricted/unoccupied spaces.
- Behavior (heuristics) tracking methods: Each individual is issued an RFID picture badge to scan into the facility and enter restricted spaces. Members have unique accounts to log in to systems that track what system or resource was accessed at a specific time.
- Proactive detection methods: Security guards will patrol the building and review access/personnel for abnormal or malicious activity and stop random individuals for security checks of bags and accesses. If anything appears out of the ordinary, the security guards have the authority to intervene and review the facts around the particular event before allowing it to continue further.
Without this proactive detection method employed across the building, any activity that mimics an insider or unknown threat would be almost impossible to detect.
True positive: An alert that is triggered by reactive defenses that is valid, in that it meets the intent of the signature or heuristics for which it triggered, for example, an antivirus signature alert of a trojan that was downloaded.
False positive: An alert that is triggered by reactive defenses that is invalid, meaning that it does not meet the intent of the signature or heuristics for which it triggered, for example, an intrusion prevention system firing on someone searching the internet for testmyids.com.
False negative: The lack of a trigger by reactive defenses on abnormal or malicious system behavior or communications during analysis, for example, an adversary emulating an administrator in order to successfully exfiltrate data from the network.
Application of detection levels
Incident response and SOC teams will usually be concerned with having low false positive rates. Remember that these are the alarms that are triggered even though nothing malicious actually occurred. Having a false positive rate that is low will help ensure that any alarms that fire and are brought to the SOC analyst's attention are a true concern. The reason for this is that evaluating and investigating a false positive can cause a massive drain on the incident response or SOC resources. Investigating an alarm that is not malicious in nature and actually a benign activity does not provide any improvement to network defenses. The trade-off for focusing on a low false positive rate is that there will be a higher level of false negatives due to the higher requirements for alerts to trigger. This, in turn, means that there will be a higher percentage of activity that is malicious in nature but will not trigger any alarms.
A threat hunter is concerned with the inverse of SOC requirements. When setting the bar for what is considered anomalous and requiring further investigation, the threat hunting team accepts having a high false positive rate. High false positives will help ensure that the respective false negatives are kept very low. A threat hunting team can accept a high false positive rate due to the scope of their hunt being very narrow compared to the scope an SOC would be monitoring on a day-to-day basis.
The preceding diagram depicts this consideration of false negative versus false positive. For a business just getting into threat hunting, this could mean a paradigm shift for parts of their team in how they measure success on a daily basis. An example would be an organization that uses the false positive rate as a measurement of success. For daily defenses, this will normally be tuned so that it is low, thus enabling the front-line cyber defenders to focus only on the things that truly matter and not waste time with dead ends. When the organization starts hunting and needs to measure their success, the false positive rate for a hunt team should be very high. Leadership looking at those statistics might be trained to think that this is a bad thing when, in fact, it is expected.
This book is laid out in a manner intended to help you better prepare for and understand the contents of each chapter. Each chapter will have five sections:
- Introduction and learning outcomes: This area will introduce you to the main focus of the chapter, as well as outlining the expected high-level areas that you should remember as you review the material. Each learning objective will start with one of the following three words:
- If the objective starts with Identify, then the intention is just for you to have a higher-level understanding of the topic. You do not need to worry about having an expert-level understanding of that material.
- If the objective starts with Comprehend, then the intention is for you to be able to apply the topic and extrapolate how it would fit into a given scenario.
- If the objective starts with Discuss, then the intention is for you to be able to have an educated discussion with another knowledgeable person on the topic. Not only would you fully understand the concept, but you would also be able to apply it in real time to various scenarios.
- Topic focus: This area is the main focus of the chapter and will provide all of the details needed for you to understand the topic.
- Scenarios: This area is broken up into two fictional subscenarios, one focused on an internal hunt team and one focused on an external hunt team. The internal hunt team is one that exists full time within the scenario's organization. The external hunt team is a team that was contracted out by the scenario's organization to perform a specific threat hunt. These scenarios will build upon the previous chapter's scenario.
- Summary: This area will provide you with a summary of the chapter and any higher-level takeaways that you should continue to focus on.
- Review questions: This area will provide you with a chance to test your understanding of the material through a few questions or scenarios aimed at reinforcing the learning objectives stated at the beginning of the chapter.
This structure should help you go through and understand the content of each chapter, and the book at large, in the most efficient manner.
In review, understanding the difference between threat hunting and other forms of cyber defense will be critical for your journey forward. Most cybersecurity defenses are reactive in nature, in that they act as an alarm that is triggered on a known bad event. Unlike many standard defense mechanisms found across networks, cyber threat hunting is a proactive defense mechanism in that it is executed without any warning or indication of malicious activity. With all of that in mind, cyber threat hunting can still be a part of the incident response life cycle.
It is able to do so by providing an additional layer of dynamic and proactive security onto the standard reactive defense mechanisms commonly employed by enterprises. This proactive defense concept is not new and can be found in many organizations' physical security elements. One of the main differences that defenders identify with is that day-to-day defenders will thrive in an environment with a low false positive rate in order to not waste resources. Threat hunters will want a low false negative rate in order to ensure nothing slips past their investigation.
Without proactive defenses, there will be a distinctive limit to what can be achieved in the realm of security. Many advanced technics and adversaries could easily slip past reactive defenses and wreak havoc before being detected.
Now that we know what cyber threat hunting is, we will look at the whys and hows for identifying what is needed for a cyber threat hunt in the next chapter.
Answer the following questions to check your knowledge of this chapter:
- (True or false) Cyber threat hunting is reactive in nature.
- The NIST incident response life cycle is made up of which four stages?
- Preparation, Detection and Analysis, Re-Baselining Systems, Policy Alignment
- Planning, Preparation, Detection, Recovery
- Preparation, Detection and Analysis, Containment, Eradication, and Recovery, Post-Incident Activity
- Planning, Detection, Containment, Post-Incident Activity
- Threat hunting is mainly a part of which phase of the NIST incident response life cycle?
- (True or false) Threat hunting is unique to cyber defense.
- (Insert the correct answer) Steady-state defenses such as incident response will normally want low ______ _______ rates. Threat hunters will normally want high ______ ______ rates.
- False positive
- True positive
- False negative
- True negative
The answers to the review questions are as follows:
- False. Cyber threat hunting is proactive as the hunter does not wait for an alarm or alert before searching for malicious behavior.
- C. See NIST SP 800-61r2 incident response life cycle.
- Detection and Analysis. See NIST SP 800-61r2 incident response life cycle.
- False. The threat hunting concept is used in many different fields.
- False positive; False negative. See the Application of detection levels section of this chapter.