In order to perform threat hunting, it is especially important to have at least a basic understanding of the main cyber threat intelligence concepts. The objective of this chapter is to help you become familiar with the concepts and terminology that are going to be used throughout this book.
In this chapter, we are going to cover the following topics:
- Cyber threat intelligence
- The intelligence cycle
- Defining your IR
- The collection process
- Processing and exploitation
- Bias and analysis
Let's get started!
Cyber threat intelligence
It is not the goal of this book to deep dive into complex issues surrounding the different definitions of intelligence and the multiple aspects of intelligence theory. This chapter is meant to be an introduction to the intelligence process so that you understand what cyber threat intelligence (CTI) is and how it is done, before we cover CTI-driven and data-driven threat hunting. If you think you are well-versed in this matter, you can proceed straight to the next chapter.
If we want to discuss the roots of intelligence discipline, we could probably go back as far as the 19th century, when the first military intelligence departments were founded. We could even argue that the practice of intelligence is as old as warfare itself, and that the history of humanity is full of espionage stories as a result of needing to have the upper hand over the enemy.
It has been stated over and over that in order to have a military advantage, we must be capable not only of understanding ourselves, but also the enemy: how do they think? How many resources do they have? What forces do they have? What is their ultimate goal?
This military need, especially during the two World Wars, led to the growth and evolution of the intelligence field as we know it. Several books and papers have been written about the craft of intelligence, and I sincerely encourage anyone interested in the matter to visit the Intelligence Literature section of the CIA Library (https://www.cia.gov/library/intelligence-literature) where you can find several interesting lectures on the subject.
The definition of intelligence has been under academic discussion among people better-versed in the matter than me for more than two decades. Unfortunately, there is no consensus over the definition of the intelligence practice. In fact, there are those who defend the craft of intelligence as something that can be described, but not defined. In this book, we are going to detach ourselves from such pessimistic views and offer the definition proposed by Alan Breakspear in his paper A New Definition of Intelligence (2012) as a reference:
Based on this, we are going to define CTI as a cybersecurity discipline that attempts to be a proactive measure of computer and network security, which nourishes itself from the traditional intelligence theory.
CTI focuses on data collection and information analysis so that we can gain a better understanding of the threats facing an organization. This helps us protect its assets. The objective of any CTI analyst is to produce and deliver relevant, accurate, and timely curated information – that is, intelligence – so that the recipient organization can learn how to protect itself from a potential threat.
The sum of related data generates information that, through analysis, is transformed into intelligence. However, as we stated previously, intelligence only has value if it is relevant, accurate, and, most importantly, if it is delivered on time. The purpose of intelligence is to serve those responsible for making decisions so they can do so in an informed way. There is no use for this if it is not delivered before the decision must be made.
This means that when we talk about intelligence, we are not only referring to the product itself, but also to all the processes that make the product possible. We will cover this in great detail in this chapter.
Finally, we can classify intelligence according to the time that's been dedicated to studying a specific subject, either by distinguishing between long-term and short-term intelligence, or according to its form; that is, strategic, tactical, or operational intelligence. In this case, the intelligence that's delivered will vary, depending on which recipients are going to receive it.
Strategic intelligence informs the top decision makers – usually called the C-suite: CEO, CFO, COO, CIO, CSO, CISO – and any other chief executive to whom the information could be relevant. The intelligence that's delivered at this level must help the decision makers understand the threat they are up against. The decision makers should get a proper sense of what the main threat capabilities and motivations are (disruption, theft of proprietary information, financial gain, and so on), their probability of being a target, and the possible consequences of this.
Operational intelligence is given to those making day-to-day decisions; that is, those who are in charge of defining priorities and allocating resources. To complete these tasks more efficiently, the intelligence team should provide them with information regarding which groups may target the organization and which ones have been the most recently active.
The deliverable might include CVEs and information regarding the tactic used by, as well as the techniques of, the possible threat. For example, this could be used to assess the urgency to patch certain systems or to add new security layers that will hinder access to them, among other things.
Tactical intelligence should be delivered to those in need of instantaneous information. The recipients should have a complete understanding of what adversary behaviors they should be paying attention to in order to identify the threats that could target the organization.
In this case, the deliverable may include IP addresses, domains and URLs, hashes, registry keys, email artifacts, and more. For example, these could be used to provide context to an alert and evaluate if it is worth involving the incident response (IR) team.
So far, we have defined the concepts surrounding intelligence, CTI, and intelligence levels, but what do we understand by the term threat in the cyber realm?
We define a threat as any circumstance or event that has the potential to exploit vulnerabilities and negatively impact operations, assets (including information and information systems), individuals, and other organizations or societies of an entity.
We could say that the main areas of interest for cyber threat intelligence are cybercrime, cyberterrorism, hacktivism, and cyberespionage. All of these can be roughly defined as organized groups that use technology to infiltrate public and private organizations and governments to steal proprietary information or cause damage to their assets. However, this doesn't mean that other types of threats, such as criminals or insiders, are outside the scope of interest.
Sometimes, the terms threat actor and advanced persistent threat (APT) are used interchangeably, but the truth is that although we can say that every APT is a threat actor, not every threat actor is advanced or persistent. What distinguishes an APT from a threat actor is their high level of operational security (OPSEC), combined with a low detection rate and a high level of success. Keep in mind that this might not apply perfectly to all APT groups. For example, there are some groups that feed on the propaganda from the attack, so they put less effort into not being identified.
In order to generate valuable intelligence, it is important to work with clear and defined concepts so that you can structure the data and generate information. It is not mandatory to choose an existing terminology, but the MITRE Corporation has developed the Structured Threat Information Expression (STIX) (https://oasis-open.github.io/cti-documentation/) in order to facilitate the standardization and sharing of threat intelligence.
So, if we follow the STIX definition (https://stixproject.github.io/data-model/), threat actors are "actual individuals, groups, or organizations believed to be operating with malicious intent." Any threat actor can be defined by any of the following:
- Its type (https://stixproject.github.io/data-model/1.1/stixVocabs/ThreatActorTypeVocab-1.0/)
- Its motivations (https://stixproject.github.io/data-model/1.1/stixVocabs/MotivationVocab-1.1/)
- Its sophistication level (https://stixproject.github.io/data-model/1.1/stixVocabs/ThreatActorSophisticationVocab-1.0/)
- Its intended effect (https://stixproject.github.io/data-model/1.1/stixVocabs/IntendedEffectVocab-1.0/)
- The campaigns it was involved in
- Its Tactics, Techniques, and Procedures (TTPs): https://stixproject.github.io/data-model/1.2/ttp/TTPType/)
In summary, cyber threat intelligence is a tool that should be used to gain better insight into a threat actor's interests and capabilities. It should be used to inform all the teams involved in securing and directing the organization.
To generate good intelligence, it is necessary to define the right set of requirements for understanding the needs of the organization. Once this first step has been accomplished, we can prioritize the threats the team should be focusing on and start monitoring those threat actors that might have the organization among its desired targets. Avoiding the collection of unnecessary data will help us allocate more time and resources, as well as set our primary focus on the threats that are more imminent to the organization.
As Katie Nickels stated in her talk The Cycle of Cyber Threat Intelligence (2019, https://www.youtube.com/watch?v=J7e74QLVxCk), the CTI team is going to be influenced by where they've been placed, so having them at a central position in the structure of the organization will help the team actually support different functions. This can be visualized as follows:
We will now have a look at the intelligence cycle.
The intelligence cycle
Before we dive into the theory of the intelligence cycle, I believe it is worth showing the relationship between data, knowledge, and intelligence practice through what is known as a knowledge pyramid. In it, we can see how the facts, through measurement, are transformed into data that we can extract information from when processing it. When analyzed together, it can be transformed into knowledge. This knowledge interacts with our own experience and forms the basis of what we call wisdom. It is this ultimate wisdom that we rely on for decision-making.
As shown in the following pyramid, we can intertwine this knowledge pyramid with the processes that are part of what is widely known as the intelligence cycle:
In short, here, we can deduce that an intelligence analyst must process data to transform it into wisdom (intelligence), which in the last instance will lead to an action (decision).
Traditionally, the intelligence process is understood as a six-phase cycle: planning and targeting, preparation and collection, processing and exploitation, analysis and production, dissemination and integration, and evaluation and feedback. Each of these phases presents its own particularities and challenges:
We will now look at each of these phases in detail.
Planning and targeting
In this stage of the process, it is important to identify the key assets of the organization, why the organization might be an interesting target, and what the security concerns of those in charge of making decisions are.
It's also important to identify the potential threats that exist and what mitigations can be prioritized (through a process known as threat modeling), as well as establishing a collection framework and collection priorities.
Preparation and collection
It is important to keep in mind that it's impossible to answer all the questions we may have and meet all our IR.
Processing and exploitation
Once the planned data has been collected, the next step is to process it to generate information. The processing method is usually not perfect, and the amount of data that the intelligence team is able to process is always lower than the amount of data that has been gathered. All data that does not get processed is the same as data not collected at all. It's lost intelligence.
Analysis and production
The information that's been gathered so far must be analyzed in order to generate intelligence. There are several techniques that are used for intelligence analysis and to prevent the analyst's bias. The cyber threat intelligence analyst must learn how to filter their personal views and opinions to carry out the analysis.
Dissemination and integration
In this stage, the intelligence that's been produced is distributed to the necessary sectors. Before distribution, the analysts have to consider a variety of things, such as what the most pressing issues are among the intelligence that's been collected, who should receive the report, how urgent the intelligence is or how much detail the recipient needs, if the report should include preventive recommendations, and so on. Sometimes, different reports may need to be created and directed to different audiences.
Evaluation and feedback
This is the final stage of the process and probably the most difficult to achieve, mainly due to the usual lack of feedback from intelligence recipients. Establishing good mechanisms to get feedback helps intelligence producers evaluate the effectiveness of the intelligence that's been generated before they repeat the process over and over, without making the necessary adjustments that will make the intelligence that's produced more relevant to the recipients. As intelligence producers, we want our intelligence to be relevant – we want our intelligence to help the decision makers to make informed decisions. Without gathering the appropriate feedback, we won't know if we are achieving our goal, and we won't know which steps to take to improve our product.
This model has been widely accepted and adopted, especially in the United States of America and among those who follow their academic discussions in an attempt to replicate its methods. Despite this wide acceptance, there have been some vocal criticisms against this model.
Some have pointed out that the current model depends excessively on the data that's been collected, and also that technological advances have allowed us to collect massive amounts of it. This endless harvesting process and the capacity to better represent the data that's been collected leads us to believe that this process is enough for us to understand what is happening.
There have been alternative proposals for the intelligence cycle. For anyone interested in studying more on this matter, there is a particularly interesting contribution that's been published by Davies, Gustafson and Ridgen (2013) titled The Intelligence Cycle is Dead, Long Live the Intelligence Cycle: Rethinking Intelligence Fundamentals for a New Intelligence Doctrine (https://bura.brunel.ac.uk/bitstream/2438/11901/3/Fulltext.pdf), in which what has been labeled the UK Intelligence Cycle is described in detail:
Defining your IR
The first stage in the intelligence cycle is to identify the information that the decision-maker needs. These requirements should be the driving factor in the intelligence team's collection, processing, and analysis phases.
The main problem that occurs when identifying these IRs is that, usually, the decision makers do not know what information they want until they need it. Moreover, other issues, such as resource and budget shortcuts or sociopolitical events, may arise, as well as the difficult task of identifying and satisfying the IRs.
Posing and trying to answer a series of questions, not only the ones stated here as examples, could be a good starting point when you're trying to identify the PIRs (P for priority, referring to those that are more critical) and the IRs of an organization.
When working out your IR, ask yourself the following questions:
What's the mission of my organization?
What threat actors are interested in my organization's industry?
What threat actors are known for targeting my area of operation?
What threat actors could target my organization in order to reach another company I supply a service for?
Has my organization been targeted previously? If so, what type of threat actor did it? What were its motivations?
What asset does my organization need to protect?
What type of exploits should my organization be looking out for?
There are four criteria to keep in mind when validating a PIR: the specificity and the necessity of the question, the feasibility of the collection, and the timeliness of the intelligence that would be generated from it. If the requirement meets all these criteria, we can start the collection process around it. In the next section, we will cover this in detail.
The collection process
Once the IR have been defined, we can proceed with collecting the raw data we need to fulfill them. For this process, we can consult two types of sources: internal sources (such as networks and endpoints) and external sources (such as blogs, threat intelligence feeds, threat reports, public databases, forums, and so on).
The most effective way to carry on the collection process is to use a collection management framework (CMF). Using a CMF allows you to identify data sources and easily track the type of information you are gathering for each. It can also be of use to rate the data that's been obtained from the source, including how long that data has been stored and to track how trustworthy and complete the source is. It is advised that you use the CMF to track not only the external sources, but also the internal ones. Here's an example of what one would look like:
Dragos analysts Lee, Miller, and Stacey wrote an interesting paper (https://dragos.com/wp-content/uploads/CMF_For_ICS.pdf?hsCtaTracking=1b2b0c29-2196-4ebd-a68c-5099dea41ff6|27c19e1c-0374-490d-92f9-b9dcf071f9b5) about using a CMF to explore different methodologies and examples. Another great resource available that can be used to design an advanced collection process is the Collection Management Implementation Framework (https://studylib.net/doc/13115770/collection-management-implementation-framework-what-does-...), designed by the Software Engineering Institute.
Indicators of compromise
So far, we've talked about finding the IR and how to use a CMF. But what data are we going to collect?
An indicator of compromise (IOC), as the name suggests, is an artifact that's been observed in a network or in an operating system that, with high confidence, indicates that it has been compromised. This forensic data is used to understand what happened, but if collected properly, it can also be used to prevent or detect ongoing breaches.
Typical IOCs may include hashes of malicious files, URLs, domains, IPs, paths, filenames, Registry keys, and malware files themselves.
It is important to remember that, in order to be really useful, it is necessary to provide context for the IOCs that have been collected. Here, we can follow the mantra quality over quantity – a huge amount of IOCs does not always mean better data.
Malware, short for malicious software, is not everything, but it can be an incredibly valuable source of information. Before we look at the different types of malware, it is important for us to understand how malware typically works. Here, we need to introduce two concepts: the dropper and the Command and Control (C2 or C2C).
A dropper is a special type of software designed to install a piece of malware. We will sometimes talk about single-staged and two-stage droppers, depending on whether or not the malware code is contained in the dropper. When the malicious code is not contained within the dropper, it will be downloaded to the victim's device from an external source. Some security researchers may call this two-stage type of dropper a downloader, while referring to a two-stage dropper as the one that requires further steps to put different pieces of code together (by decompressing or executing different pieces of code) to build a final piece of malware.
The Command and Control (C2) is an attacker-controlled computer server that's used to send commands to the malware running in the victim's systems. It's the way the malware communicates with its "owner." There are multiple ways that a C2 can be established and, depending on the malware's capabilities, the complexity of the commands and the communication that can be established may vary. For example, threat actors have been seen using cloud-based services, emails, blog comments, GitHub repositories, and DNS queries, among other things, for C2 communication.
There are different types of malware according to their capabilities, and sometimes, one malware piece can be classified as more than one type. The following is a list of the most common ones:
- Worm: An autonomous program capable of replicating and propagating itself through the network.
- Trojan: A program that appears to serve a designated purpose, but also has a hidden malicious capability to bypass security mechanisms, thus abusing the authorization that's been given to it.
- Rootkit: A set of software tools with administrator privileges, designed to hide the presence of other tools and hide their activities.
- Ransomware: A computer program designed to deny access to a system or its information until a ransom has been paid.
- Keylogger: Software or hardware that records keyboard events without the user's knowledge.
- Adware: Malware that offers the user specific advertising.
- Spyware: Software that has been installed onto a system without the knowledge of the owner or the user, with the intention of gathering information about him/her and monitoring his/her activity.
- Scareware: Malware that tricks computer users into visiting compromised websites.
- Backdoor: The method by which someone can obtain administrator user access in a computer system, a network, or a software application.
- Wiper: Malware that erases the hard drive of the computer it infects.
- Exploit kit: A package that's used to manage a collection of exploits that could use malware as a payload. When a victim visits a compromised website, it evaluates the vulnerabilities in the victim's system in order to exploit certain vulnerabilities.
A malware family references a group of malicious software with common characteristics and, most likely, the same author. Sometimes, a malware family can be directly related to a specific threat actor. Sometimes, malware (or a tool) is shared among different groups. This happens a lot with open source malware tools that are publicly available. Leveraging them helps the adversary disguise its identity.
Now let's take a quick look to how we can collect data around pieces of malware.
Using public sources for collection – OSINT
Open Source Intelligence (OSINT) is the process of collecting publicly available data. The most common sources that come to mind when talking about OSINT are social media, blogs, news, and the dark web. Essentially, any data that's made publicly available can be used for OSINT purposes.
There are many great resources for someone looking to start collecting information: VirusTotal (https://www.virustotal.com/), CCSS Forum (https://www.ccssforum.org/), and URLHaus (https://urlhaus.abuse.ch/) are great places to get started with the collection process.
Also, take a look at OSINTCurio.us (https://osintcurio.us/) to learn more about OSINT resources and techniques.
A honeypot is a decoy system that imitates possible targets of attacks. A honeypot can be set up to detect, deflect, or counteract an attacker. All traffic that's received is considered malicious and every interaction with the honeypot can be used to study the attacker's techniques.
There are many types of honeypots (an interesting list can be found here: https://hack2interesting.com/honeypots-lets-collect-it-all/), but they are mostly divided into three categories: low interaction, medium interaction, and high interaction.
Low interaction honeypots simulate the transport layer and provide very limited access to the operating system. Medium interaction honeypots simulate the application layer in order to lure the attacker into sending the payload. Finally, high interaction honeypots usually involve real operating systems and applications. These ones are better for uncovering the abuse of unknown vulnerabilities.
Malware analysis and sandboxing
Static malware analysis refers to analyzing the software that's used without executing it. Reverse engineering or reversing is a form of static malware analysis and is performed using a disassembler such as IDA or the more recent NSA tool, Ghidra, among others.
Dynamic malware analysis is performed by observing the behavior of the malware piece once it's been executed. This type of analysis is usually performed in a controlled environment to avoid infecting production systems.
In the context of malware analysis, a sandbox is an isolated and controlled environment used to dynamically analyze pieces of malware automatically. In a sandbox, the suspected malware piece is executed and its behavior is recorded.
Of course, things are not always this simple, and malware developers implement techniques to prevent the malware from being sandboxed. At the same time, security researchers develop their own techniques to bypass the threat actor's antisandbox techniques. Despite this chase of cat and mouse, sandboxing systems are still a crucial part of the malware analysis process.
There are some great online sandboxing solutions, such as Any Run (any.run) and Hybrid Analysis (https://www.hybrid-analysis.com/). Cuckoo Sandbox (https://cuckoosandbox.org/) is an open source and offline sandboxing system for Windows, Linux, macOS, and Android.
Processing and exploitation
Once the data has been collected, it must be processed and exploited so that it can be converted into intelligence. The IOCs must be provided with context, and their relevance and reliability must be assessed.
We are going to quickly review three of the most commonly used intelligence frameworks: the Cyber Kill Chain®, the Diamond Model, and the MITRE ATT&CK™ Framework. The latter has a full chapter dedicated to it, Chapter 4, Mapping the Adversary.
The Cyber Kill Chain®
There are seven different steps:
- Reconnaissance: Getting to know the victim using non-invasive techniques.
- Weaponization: Generating the malicious payload that is going to be delivered.
- Delivery: Delivering the weaponized artifact.
- Exploitation: Achieving code execution on the victim's system through the exploitation of a vulnerability.
- Installation: Installing the final malware piece.
- Command and Control (C2): Establishing a channel to communicate with the malware on the victim's system.
- Actions on objectives: With full access and communication, the attacker achieves their goal.
This model has been criticized for not being good enough to describe the way some modern attacks work, but at the same time, it has been praised for delimiting the points at which an attack can be stopped:
The Diamond Model
The Diamond Model provides us with a simple way to track breach intrusions since it helps us establish the atomic elements involved in them. It comprises four main features: adversary, infrastructure, capability, and victim. These features are connected by the sociopolitical and technical axes:
We will now have a look at the MITRE ATT&CK™ Framework.
MITRE ATT&CK™ Framework
The MITRE ATT&CK™ Framework is a descriptive model used to label and study the activities that a threat actor is capable of carrying out in order to get a foothold and operate inside an enterprise environment, a cloud environment, smartphones, or even industrial control systems.
The magic behind the ATT&CK™ Framework is that it provides a common taxonomy for the cybersecurity community to describe the adversary's behavior. It works as a common language that both offensive and defensive researchers can use to better understand each other and to better communicate with people not specialized in the field.
On top of that, you not only can use it as you see fit, but you can also build on top of it, creating your own set of tactics, techniques, and procedures (TTPs).
12 tactics are used to encompass different sets of techniques. Each tactic represents a tactical goal; that is, the reason why the threat actor is showing a specific behavior. Each of these tactics is composed of a set of techniques and sub-techniques that describe specific threat actor behaviors.
The procedure is the specific way in which a threat actor implements a specific technique or sub-technique. One procedure can be expanded into multiple techniques and sub-techniques:
We will now have a look at bias and analysis.
Bias and analysis
Once all the necessary information has been processed, it is time to make sense of it; that is, search for the security issues and deliver this intelligence to the different strategic levels meeting the IR that were identified during the planning step.
A lot has been written about how intelligence analysis should be done, especially in excellent books such as Structured Analytic Techniques for Intelligence Analysis (Heuer and Pherson, 2014), Critical Thinking for Strategic Intelligence (Pherson and Pherson, 2016), and Psychology of Intelligence Analysis (Heuer, 1999), among many others. These books employ many metaphors to describe the process of intelligence analysis.
My personal favorite is the one that compares the art of intelligence analysis with the art of mosaics: intelligence analysis is like trying to put the pieces of a mosaic together in which the pattern is not clear and the pieces continue to change in size, shape, and color.
One thing that an intelligence analyst cannot forget is that part of the practice is to challenge their own preconceptions and prejudices ceaselessly. Avoid confirmation bias, not to merely transmit the collected data, but to not fall for mirror imaging, clientelism, layering, and linear thinking. You should never influence the analysis so that it suits your needs or views. There are many techniques that can be used to mitigate analyst bias.
Some common traits are used to define a good intelligence analyst: he or she must have specific knowledge in more than one field; he or she must have a good spoken and written expression; and, most important of all, he or she must have the ability to synthesize the background of a situation almost intuitively.
In conclusion, we can close this chapter with the asseveration that in order to generate effective and relevant intelligence, there has to be a continuous intelligence process in place, with information from both internal and external sources being continually collected, processed, and analyzed.
This analysis must be tackled from different angles and by people with different perspectives and backgrounds in order to minimize the risk of falling into our own cognitive biases.
In addition, establishing good mechanisms for both disseminating quality and relevant intelligence reports, as well as getting feedback from the recipients, is key to enriching and improving this process.
In this chapter, we covered the definitions of cyber threat intelligence (CTI) and advanced persistent threats (APTs). We reviewed each of the steps involved in the intelligence cycle and provided an overview of how to carry out data collection and processing. Finally, we closed this chapter by looking at one of the main challenges that intelligence analysts face: analyst bias.
In the next chapter, we will introduce the concept of threat hunting and the different methodologies and approaches we can follow.