Incident Response Process
In the last chapter, you learned about the three pillars that sustain your security posture, and two of them (detection and response) are directly correlated with the incident response (IR) process. To enhance the foundation of your security posture, you need to have a solid incident response process. This process will dictate how to handle security incidents and rapidly respond to them. Many companies do have an incident response process in place, but they fail to constantly review it to incorporate lessons learned from previous incidents, and on top of that, many are not prepared to handle security incidents in a cloud environment.
In this chapter, we’re going to be covering the following topics:
- The incident response process
- Handling an incident
- Post-incident activity
- Considerations regarding IR in the cloud
First, we will cover the incident response process.
The incident response process
There are many industry standards, recommendations, and best practices that can help you to create your own incident response. You can still use those as a reference to make sure you cover all the relevant phases for your type of business. The one that we are going to use as a reference in this book is the computer security incident response (CSIR)—publication 800-61R2 from NIST. Regardless of the one you select to use as a reference, make sure to adapt it to your own business requirements. Most of the time, in security, the concept of “one size fits all” doesn’t apply; the intent is always to leverage well-known standards and best practices and apply them to your own context. It is important to retain the flexibility to accommodate your business needs in order to provide a better experience when operationalizing it.
While flexibility is key for adapting incident responses to suit individual needs and requirements, it is still invaluable to understand the commonalities between different responses. There are a number of reasons to have an IR process in place, and there are certain steps that will help with both creating an incident response process and putting together an effective incident response team. Additionally, every incident has an incident life cycle, which can be examined to better understand why the incident has occurred, and how to prevent similar issues in the future. We will discuss each of these in more depth to give you a deeper understanding of how to form your own incident response.
Reasons to have an IR process in place
Before we dive into more details about the process itself, it is important to be aware of the terminology that is used, and what the final goal is when using IR as part of enhancing your security posture. Let’s use a fictitious company to illustrate why this is important.
The following diagram has a timeline of events. These events lead the help desk to escalate the issue and start the incident response process:
Figure 2.1: Events timeline leading to escalation and the beginning of the incident response process
While the diagram says that the system was working properly, it is important to learn from this event.
What is considered normal? Do you have a baseline that can give you evidence that the system was running properly? Are you sure there is no evidence of compromise before the email?
Phishing emails are still one of the most common methods used by cybercriminals to entice users to click on a link that leads to a malicious/compromised site.
While technical security controls must be in place to detect and filter these types of attacks, users must be taught how to identify a phishing email.
Many of the traditional sensors (IDS/IPS) used nowadays are not able to identify infiltration and lateral movement.
To enhance your security posture, you will need to improve your technical security controls and reduce the gap between infection and detection.
This is already part of the collateral damage done by this attack. Credentials were compromised, and the user was having trouble authenticating. This sometimes happens because the attackers already changed the user’s password.
There should be technical security controls in place that enable IT to reset the user’s password and, at the same time, enforce multifactor authentication.
Not every single incident is security-related; it is important for the help desk to perform their initial troubleshooting to isolate the issue.
If the technical security controls in place (step 3) were able to identify the attack, or at least provide some evidence of suspicious activity, the help desk wouldn’t have to troubleshoot the issue—it could just directly follow the incident response process.
At this point in time, the help desk is doing what it is supposed to do, collecting evidence that the system was compromised and escalating the issue.
The help desk should obtain as much information as possible about the suspicious activity to justify the reason why they believe that this is a security-related incident.
At this point, the IR process takes over and follows its own path, which may vary according to the company, industry segment, and standard.
It is important to document every single step of the process and, after the incident is resolved, incorporate the lessons learned with the aim of enhancing the overall security posture.
Table 2.1: Security considerations for different steps in an events timeline
While there is much room for improvement in the previous scenario, there is something that exists in this fictitious company that many other companies around the world are missing: the incident response itself. If it were not for the incident response process in place, support professionals would exhaust their troubleshooting efforts by focusing on infrastructure-related issues. Companies that have a good security posture would have an incident response process in place.
- All IT personnel should be trained to know how to handle a security incident.
- All users should be trained to know the core fundamentals of security in order to perform their job more safely, which will help avoid getting infected.
- There should be integration between their help desk system and the incident response team for data sharing.
This scenario could have some variations that could introduce different challenges to overcome. One variation would be if no indicator of compromise (IoC) was found in step 6. In this case, the help desk could easily continue troubleshooting the issue. What if at some point “things” started to work normally again? Is this even possible? Yes, it is! When an IoC is not found it doesn’t mean the environment is clean; now you need to switch gears and start looking for an indicator of attack (IoA), which involves looking for evidence that can show the intent of an attacker. When investigating a case, you may find many IoAs, which may or may not lead to an IoC. The point is, understanding the IoA will lead you to better understand how an attack was executed, and how you can protect against it.
When an attacker infiltrates the network, they usually want to stay invisible, moving laterally from one host to another, compromising multiple systems, and trying to escalate privileges by compromising an account with administrative-level privileges. That’s the reason why it is so important to have good sensors not only in the network but also in the host itself. With good sensors in place, you would be able to not only detect the attack quickly but also identify potential scenarios that could lead to an imminent threat of violation.
In addition to all the factors that were just mentioned, some companies will soon realize that they must have an incident response process in place to be compliant with regulations that are applicable to the industry in which they belong. For example, the Federal Information Security Management Act (FISMA) of 2002 requires federal agencies to have procedures in place to detect, report, and respond to a security incident.
Creating an incident response process
The following diagram shows the foundational areas of the incident response process:
Figure 2.2: The incident response process and its foundational areas of Objective, Scope, Definition/Terminology, Roles and responsibilities, and Priorities/Severity Level
The first step to create your incident response process is to establish the objective—in other words, to answer the question: what’s the purpose of this process? While this might appear redundant as the name seems to be self-explanatory, it is important that you are very clear as to the purpose of the process so that everyone is aware of what this process is trying to accomplish.
Although the incident response process usually has a company-wide scope, it can also have a departmental scope in some scenarios. For this reason, it is important that you define whether this is a company-wide process or not.
Along with the definition, companies must create their own glossary with definitions of the terminology used. Different industries will have different sets of terminologies, and if these terminologies are relevant to a security incident, they must be documented.
In an incident response process, the roles and responsibilities are critical. Without the proper level of authority, the entire process is at risk. The importance of the level of authority in an incident response is evident when you consider the question: Who has the authority to confiscate a computer in order to perform further investigation? By defining the users or groups that have this level of authority, you are ensuring that the entire company is aware of this, and if an incident occurs, they will not question the group that is enforcing the policy.
Another important question to answer is regarding the severity of an incident. What defines a critical incident? The criticality will lead to resource distribution, which brings another question: How are you going to distribute your manpower when an incident occurs? Should you allocate more resources to incident “A” or to incident “B”? Why? These are only some examples of questions that should be answered in order to define the priorities and severity level. To determine the priorities and severity level, you will need to also take into consideration the following aspects of the business:
- Functional impact of the incident on the business: The importance of the affected system for the business will have a direct effect on the incident’s priority. All stakeholders for the affected system should be aware of the issue and will have their input in the determination of priorities.
- Type of information affected by the incident: Every time you deal with personally identifiable information (PII), your incident will have high priority; therefore, this is one of the first elements to verify during an incident. Another factor that can influence the severity is the type of data that was compromised based on the compliance standard your company is using. For example, if your company needs to be HIPAA compliant, you would need to raise the severity level if the data compromised was governed by the HIPAA standards.
- Recoverability: After the initial assessment, it is possible to give an estimate of how long it will take to recover from an incident. Depending on the amount of time to recover, combined with the criticality of the system, this could drive the priority of the incident to high severity.
For example, if an incident occurs and during the investigation process it is identified that a customer’s PII was leaked, how will the company communicate this to the media? In the incident response process, communication with the media should be aligned with the company’s security policy for data disclosure. The legal department should also be involved prior to the press release to ensure that there is no legal issue with the statement. Procedures to engage law enforcement must also be documented in the incident response process. When documenting this, take into consideration the physical location—where the incident took place, where the server is located (if appropriate), and the state. By collecting this information, it will be easier to identify the jurisdiction and avoid conflicts.
Incident response team
Now that you have the fundamental areas covered, you need to put the incident response team together. The format of the team will vary according to the company size, budget, and purpose. A large company may want to use a distributed model, where there are multiple incident response teams with each one having specific attributes and responsibilities. This model can be very useful for organizations that are geo-dispersed, with computing resources located in multiple areas. Other companies may want to centralize the entire incident response team in a single entity. This team will handle incidents regardless of the location. After choosing the model that will be used, the company will start recruiting employees to be part of the team.
The incident response process requires personnel with technically broad knowledge while also requiring deep knowledge in some other areas. The challenge is to find people with depth and breadth in this area, which sometimes leads to the conclusion that you need to hire external people to fill some positions, or even outsource part of the incident response team to a different company.
The budget for the incident response team must also cover continuous improvement via education, and the acquisition of proper tools, software, and hardware. As new threats arise, security professionals working with incident response must be ready and trained to respond well. Many companies fail to keep their workforce up to date, which may expose the company to risk. When outsourcing the incident response process, make sure the company that you are hiring is accountable for constantly training their employees in this field.
If you plan to outsource your incident response operations, make sure you have a well-defined service-level agreement (SLA) that meets the severity levels that were established previously. During this phase, you should also define the team coverage, assuming the need for 24-hour operations.
In this phase you will define:
- Shifts: How many shifts will be necessary for 24-hour coverage?
- Team allocation: Based on these shifts, who is going to work on each shift, including full-time employees and contractors?
- On-call process: It is recommended that you have on-call rotation for technical and management roles in case the issue needs to be escalated.
Defining these areas during this phase is particularly useful as it will allow you to more clearly see the work that the team needs to cover, and thus allocate time and resources accordingly.
Incident life cycle
Every incident that starts must have an end, and what happens in between the beginning and the end are different phases that will determine the outcome of the response process. This is an ongoing process that we call the incident life cycle. What we have described so far can be considered the preparation phase. However, this phase is broader than that—it also has the partial implementation of security controls that were created based on the initial risk assessment (this was supposedly done even before creating the incident response process).
Also included in the preparation phase is the implementation of other security controls, such as:
- Endpoint protection
- Malware protection
- Network security
The preparation phase is not static, and you can see in the following diagram that this phase will receive input from post-incident activity. The post-incident activity is critical to improve the level of preparation for future attacks, because here is where you will perform a postmortem analysis to understand the root cause and see how you can improve your defense to avoid the same type of attack happening in the future. The other phases of the life cycle and how they interact are also shown in this diagram:
Figure 2.3: Phases of the incident life cycle
The detection and containment phases could have multiple interactions within the same incident. Once the loop is over, you will move on to the post-incident activity phase. The sections that follow will cover these last three phases in more detail.
Handling an incident
In order to detect a threat, your detection system must be aware of the attack vectors, and since the threat landscape changes so rapidly, the detection system must be able to dynamically learn more about new threats and new behaviors and trigger an alert if suspicious activity is encountered.
While many attacks will be automatically detected by the detection system, the end user has an important role in identifying and reporting the issue if they find suspicious activity.
For this reason, the end user should also be aware of the different types of attacks and learn how to manually create an incident ticket to address such behaviors. This is something that should be part of the security awareness training.
Even with users being diligent by closely watching for suspicious activities, and with sensors configured to send alerts when an attempt to compromise is detected, the most challenging part of an IR process is still the accuracy of detecting what is truly a security incident.
Oftentimes, you will need to manually gather information from different sources to see if the alert that you received really reflects an attempt to exploit a vulnerability in the system. Keep in mind that data gathering must be done in compliance with the company’s policy. In scenarios where you need to bring the data to a court of law, you need to guarantee the data’s integrity.
The following diagram shows an example where the combination and correlation of multiple logs is necessary in order to identify the attacker’s ultimate intent:
Figure 2.4: The necessity of multiple logs in identifying an attacker’s ultimate intent
In this example, we have many IoCs, and when we put all the pieces together, we can validate the attack. Keep in mind that depending on the level of information that you are collecting in each one of those phases, and how conclusive it is, you may not have evidence of compromise, but you will have evidence of an attack, which is the IoA for this case.
The following table explains the diagram in more detail, assuming that there is enough evidence to determine that the system was compromised:
Endpoint protection and operating system logs can help determine the IoC
Endpoint protection and operating system logs can help determine the IoC
Lateral movement followed by privilege escalation
Server logs and network captures can help determine the IoC
Unauthorized or malicious processes could read or modify the data
Assuming there is a firewall in between the cloud and on-premises resources, the firewall log and the network capture can help determine the IoC
Data extraction and submission to command and control
Table 2.2: Logs used to identify the attacks/operations of a threat actor
As you can see, there are many security controls in place that can help to determine the indication of compromise. However, putting them all together in an attack timeline and cross-referencing the data can be even more powerful.
This brings back a topic that we discussed in the previous chapter: that detection is becoming one of the most important security controls for a company. Sensors that are located across the network (on-premises and in the cloud) will play a big role in identifying suspicious activity and raising alerts. A growing trend in cybersecurity is the leveraging of security intelligence and advanced analytics to detect threats more quickly and reduce false positives. This can save time and enhance the overall accuracy.
Ideally, the monitoring system will be integrated with the sensors to allow you to visualize all events on a single dashboard. This might not be the case if you are using different platforms that don’t allow interaction between one another.
In a scenario like the one presented in Figure 2.4, the integration between the detection and monitoring system can help to connect the dots of multiple malicious actions that were performed in order to achieve the final mission—data extraction and submission to command and control.
Once the incident is detected and confirmed as a true positive, you need to either collect more data or analyze what you already have. If this is an ongoing issue, where the attack is taking place at that exact moment, you need to obtain live data from the attack and rapidly provide remediation to stop the attack. For this reason, detection and analysis are sometimes done almost in parallel to save time, and this time is then used to rapidly respond.
The biggest problem arises when you don’t have enough evidence that there is a security incident taking place, and you need to keep capturing data in order to validate its veracity. Sometimes the incident is not detected by the detection system. Perhaps it is reported by an end user, but they can’t reproduce the issue at that exact moment. There is no tangible data to analyze, and the issue is not happening at the time you arrive. In scenarios like this, you will need to set up the environment to capture data, and instruct the user to contact support when the issue is actually happening.
You can’t determine what’s abnormal if you don’t know what’s normal. In other words, if a user opens a new incident saying that the server’s performance is slow, you must know all the variables before you jump to a conclusion. To know if the server is slow, you must first know what’s considered to be a normal speed. This also applies to networks, appliances, and other devices. In order to establish this understanding, make sure you have the following in place:
- System profile
- Network profile/baseline
- Log-retention policy
- Clock synchronization across all systems
Based on this, you will be able to establish what’s normal across all systems and networks. This will be very useful when an incident occurs, and you need to determine what’s normal before starting to troubleshoot the issue from a security perspective.
Incident handling checklist
Many times, the “simple” makes a big difference when it comes time to determine what to do now and what to do next. That’s why having a simple checklist to go through is very important to keep everyone on the same page. The list below is not definitive; it is only a suggestion that you can use as a foundation to build your own checklist:
- Determine if an incident has actually occurred and start the investigation:
1.1 Analyze the data and potential indicators (IoA and IoC).
1.2 Review potential correlation with other data sources.
1.3 Once you determine that the incident has occurred, document your findings and prioritize the handling of the incident based on the criticality of the incident. Take into consideration the impact and the recoverability effort.
1.4 Report the incident to the appropriate channels.
- Make sure you gather and preserve evidence.
- Perform incident containment.
3.1.1 Quarantining the affected resource
3.1.2 Resetting the password for the compromised credential
- Eradicate the incident using the following steps:
4.1 Ensure that all vulnerabilities that were exploited are mitigated.
4.2 Remove any malware from the compromised system and evaluate the level of trustworthiness of that system. In some cases, it will be necessary to fully reformat the system, as you may not be able to trust that system anymore.
- Recover from the incident.
5.1 There might be multiple steps to recover from an incident, mainly because it depends on the incident. Generally speaking, the steps here may include:
5.1.1 Restoring files from backup
5.1.2 Ensuring that all affected systems are fully functional again
- Perform a post-incident analysis.
6.1 Create a follow-up report with all lessons learned
6.2 Ensure that you are implementing actions to enhance your security posture based on those lessons learned
As mentioned previously, this list is not exhaustive, and these steps should be tailored to suit specific needs. However, this checklist provides a solid baseline to build on for your own incident response requirements.
The incident priority may dictate the containment strategy—for example, if you are dealing with a DDoS attack that was opened as a high-priority incident, the containment strategy must be treated with the same level of criticality. It is rare that situations where the incident is opened as high severity are prescribed medium-priority containment measures unless the issue was somehow resolved in between phases.
Let’s have a look at two real-world scenarios to see how containment strategies, and the lessons learned from a particular incident, may differ depending on incident priority.
Real-world scenario 1
On May 12, 2017, some users called the help desk saying that they were receiving the following screen:
Figure 2.5: A screen from the WannaCry outbreak
After an initial assessment and confirmation of the issue (detection phase), the security team was engaged, and an incident was created. Since many systems were experiencing the same issue, they raised the severity of this incident to high. They used their threat intelligence to rapidly identify that this was a ransomware outbreak, and to prevent other systems from getting infected, they had to apply the MS17-00(3) patch.
At this point, the incident response team was working on three different fronts: one to try to break the ransomware encryption, another to try to identify other systems that were vulnerable to this type of attack, and another one working to communicate the issue to the press.
They consulted their vulnerability management system and identified many other systems that were missing this update. They started the change management process and raised the priority of this change to critical. The management system team deployed this patch to the remaining systems.
The incident response team worked with their anti-malware vendor to break the encryption and gain access to the data again. At this point, all other systems were patched and running without any problems. This concluded the containment eradication and recovery phase.
Lessons learned from scenario 1
After reading this scenario, you can see examples of many areas that were covered throughout this chapter and that will come together during an incident. But an incident is not finished when the issue is resolved. In fact, this is just the beginning of a whole different level of work that needs to be done for every single incident—documenting the lessons learned.
One of the most valuable pieces of information that you have in the post-incident activity phase is the lessons learned. This will help you to keep refining the process through the identification of gaps in the process and areas of improvement. When an incident is fully closed, it will be documented. This documentation must be very detailed, with the full timeline of the incident, the steps that were taken to resolve the problem, what happened during each step, and how the issue was finally resolved outlined in depth.
This documentation will be used as a base to answer the following questions:
- Who identified the security issue, a user or the detection system?
- Was the incident opened with the right priority?
- Did the security operations team perform the initial assessment correctly?
- Is there anything that could be improved at this point?
- Was the data analysis done correctly?
- Was the containment done correctly?
- Is there anything that could be improved at this point?
- How long did it take to resolve this incident?
The answers to these questions will help refine the incident response process and enrich the incident database. The incident management system should have all incidents fully documented and searchable. The goal is to create a knowledge base that can be used for future incidents. Oftentimes, an incident can be resolved using the same steps that were used in a similar previous incident.
Another important point to cover is evidence retention. All the artifacts that were captured during the incident should be stored according to the company’s retention policy unless there are specific guidelines for evidence retention. Keep in mind that if the attacker needs to be prosecuted, the evidence must be kept intact until legal actions are completely settled.
When organizations start to migrate to the cloud and have a hybrid environment (on-premises and connectivity to the cloud), their IR process may need to pass through some revisions to include some deltas that are related to cloud computing. You will learn more about IR in the cloud later in this chapter.
Real-world scenario 2
Sometimes you don’t have a very well-established incident, only clues that you are starting to put together to understand what is happening. In this scenario, the case started with support, because it was initiated by a user that said that their machine was very slow, mainly when accessing the internet.
The support engineer that handled the case did a good job isolating the issue and identified that the process
Powershell.exe was downloading content from a suspicious site. When the IR team received the case, they reviewed the notes from the case to understand what was done. Then they started tracking the IP address to where the PowerShell command was downloading information from. To do that, they used the VirusTotal website and got the result below:
Figure 2.6: VirusTotal scan result
Figure 2.7: VirusTotal scan details tab
Now things are starting to come together, as this IP seems to be correlated with Cobalt Strike. At this point, the IR team didn’t have much knowledge about Cobalt Strike, and they needed to learn more about it. The best place to research threat actors, the software they use, and the techniques they leverage is the MITRE ATT&CK website (attack.mitre.org).
By accessing this page, you can simply click the Search button (located in the upper-right corner) and type in the keywords, in this case, cobalt strike, and the result appears as shown below:
Figure 2.8: Searching on the MITRE ATT&CK website
Once you open the Cobalt Strike page, you can read more about what Cobalt Strike is, the platforms that it targets, the techniques that it uses, and the threat actor groups that are associated with this software. By simply searching PowerShell on this page, you will see the following statement:
Figure 2.9: A technique used by Cobalt Strike
Notice that this usage of PowerShell maps to technique T1059 (https://attack.mitre.org/techniques/T1059). If you open this page, you will learn more about how this technique is used and the intent behind it.
OK, now things are clearer, and you know that you are dealing with Cobalt Strike. While this is a good start, it is imperative to understand how the system got compromised in the first place, because PowerShell was not making a call to that IP address out of nowhere, something triggered that action.
This is the type of case where you will have to trace it back to understand how everything started. The good news is that you have plenty of information on the MITRE ATT&CK website that explains how Cobalt Strike works.
The IR team started looking at different data sources to better understand the entire scenario and they found that the employee that initially opened the case with support complaining about the computer’s performance opened a suspicious document (RTF) that same week. The reason to say that this file was suspicious was the name and the hash of the file:
- File name: once.rtf
- MD5: 2e0cc6890fbf7a469d6c0ae70b5859e7
Figure 2.10: Searching for a file hash
This raises many flags, but to better correlate this with the PowerShell activity, we need more evidence. If you click on the BEHAVIOR tab, you will have that evidence, as shown below:
Figure 2.11: More evidence of malicious use of PowerShell
With this evidence, it is possible to conclude that the initial access was via email (see https://attack.mitre.org/techniques/T1566) and from there the attached file abuses CVE-2017-11882 to execute PowerShell.
Lessons learned from scenario 2
This scenario shows that all you need is a simple click to get compromised, and social engineering is still one of the predominant factors, as it exploits the human factor in order to entice a user to do something. From here the recommendations were:
- Improve the security awareness of training for all users to cover this type of scenario
- Reduce the level of privileges for the user on their own workstations
- Implement AppLocker to block unwanted applications
- Implement EDR in all endpoints to ensure that this type of attack can be caught in the initial phase
- Implement a host-based firewall to block access to suspicious external addresses
There is a lot to learn with a case like this, mainly from the security hygiene perspective and how things can get better. Never lose the opportunity to learn and improve your incident response plan.
Considerations for incident response in the cloud
When we speak about cloud computing, we are talking about a shared responsibility between the cloud provider and the company that is contracting the service. The level of responsibility will vary according to the service model, as shown in the following diagram:
Figure 2.12: Shared responsibility in the cloud
For Software as a Service (SaaS), most of the responsibility is on the cloud provider; in fact, the customer’s responsibility is basically to keep their infrastructure on-premises protected (including the endpoint that is accessing the cloud resource). For Infrastructure as a Service (IaaS), most of the responsibility lies on the customer’s side, including vulnerability and patch management.
Understanding the responsibilities is important in order to understand the data gathering boundaries for incident response purposes. In an IaaS environment, you have full control of the virtual machine and have complete access to all logs provided by the operating system. The only missing information in this model is the underlying network infrastructure and hypervisor logs.
Each cloud provider will have its own policy regarding data gathering for incident response purposes, so make sure that you review the cloud provider policy before requesting any data.
For the SaaS model, the vast majority of the information relevant to an incident response is in the possession of the cloud provider. If suspicious activities are identified in a SaaS service, you should contact the cloud provider directly, or open an incident via the portal. Make sure that you review your SLA to better understand the rules of engagement in an incident response scenario.
However, regardless of your service model, there are a number of key issues to bear in mind when migrating to the cloud—such as adjusting your overall IR process to accommodate cloud-based incidents (including making sure you have the necessary tools to deal with cloud-based issues) and investigating your cloud service provider to ensure they have sufficient IR policies in place.
Updating your IR process to include the cloud
Ideally, you should have one single incident response process that covers both major scenarios—on-premises and cloud. This means you will need to update your current process to include all relevant information related to the cloud.
Make sure that you review the entire IR life cycle to include cloud computing-related aspects. For example, during the preparation, you need to update the contact list to include the cloud provider contact information, on-call process, and so on. The same applies to other phases such as:
- Detection: Depending on the cloud model that you are using, you want to include the cloud provider solution for detection in order to assist you during the investigation.
- Containment: Revisit the cloud provider capabilities to isolate an incident if it occurs, which will also vary according to the cloud model that you are using. For example, if you have a compromised VM in the cloud, you may want to isolate this VM from others in a different virtual network and temporarily block access from outside.
For more information about incident response in the cloud, we recommend that you read Domain 9 of the Cloud Security Alliance Guidance.
Another important aspect of IR in the cloud is to have the appropriate toolset in place. Using on-premises-related tools may not be feasible in the cloud environment, and worse, may give you the false impression that you are doing the right thing.
The reality is that with cloud computing, many security-related tools that were used in the past are not efficient for collecting data and detecting threats. When planning your IR, you must revise your current toolset and identify the potential gaps for your cloud workloads.
In Chapter 12, Active Sensors, we will cover some cloud-based tools that can be used in the IR process, such as Microsoft Defender for Cloud and Microsoft Sentinel.
IR process from the Cloud Solution Provider (CSP) perspective
When planning your migration to the cloud and comparing the different CSPs’ solutions, make sure to understand their own incident response process. What if another tenant in their cloud starts sending attacks against your workloads that reside on the same cloud? How will they respond to that? These are just examples of a couple of questions that you need to think about when planning which CSP will host your workloads.
The following diagram has an example of how a CSP could detect a suspicious event, leverage their IR process to perform the initial response, and notify their customer about the event:
Figure 2.13: How a CSP might detect a potential threat, form an initial response, and notify the customer
The handover between CSP and customer must be very well synchronized, and this should be settled during the planning phase for cloud adoption. If this handover is well co-ordinated with the CSP, and you ensure that cloud-based incidents are accounted for in both your own IR and the CSP’s IR, then you should be far better prepared for these incidents when they arise.
In this chapter, you learned about the incident response process, and how this fits into the overall purpose of enhancing your security posture.
You also learned about the importance of having an incident response process in place to rapidly identify and respond to security incidents. By planning each phase of the incident response life cycle, you create a cohesive process that can be applied to the entire organization. The foundation of the incident response plan is the same for different industries and, on top of this foundation, you can include the customized areas that are relevant to your own business. You also came across the key aspects of handling an incident, and the importance of post-incident activity—which includes full documentation of the lessons learned—and how to use this information as input to improve the overall process. Lastly, you learned the basics of incident response in the cloud and how this can affect your current process.
In the next chapter, you will gain an understanding of the mindset of an attacker, the different stages of an attack, and what usually takes place in each one of these phases. This is an important concept for the rest of the book, considering that the attack and defense exercises will be using the cybersecurity kill chain as a foundation.
- You can download the CSIR publication 800-61R2 from NIST at http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf
- Microsoft Security Response Center: https://technet.microsoft.com/en-us/library/security/ms17-010.aspx
- More information about shared responsibilities for cloud security at https://blog.cloudsecurityalliance.org/2014/11/24/shared-responsibilities-for-security-in-the-cloud-part-1/
- For Microsoft Azure, read this paper for more information about incident response in the cloud: https://gallery.technet.microsoft.com/Azure-Security-Response-in-dd18c678
- For Microsoft Online Services, you can use this form to report abuse originating from Microsoft-hosted services: https://cert.microsoft.com/report.aspx
- Watch the author Yuri Diogenes demonstrating how to use Azure Security Center to investigate a cloud incident: https://channel9.msdn.com/Blogs/Azure-Security-Videos/Azure-Security-Center-in-Incident-Response
- You can download Security Guidance for Critical Areas of Focus in Cloud Computing v4.0 from https://cloudsecurityalliance.org/document/incident-response/
Join our community on Discord
Join our community’s Discord space for discussions with the author and other readers: