When examining the threats to today's information technology, it can seem overwhelming. From simple script kiddies using off-the-shelf code to nation state adversary tools, it is critical to be prepared. For example, an internal employee can download a single instance of ransomware and can have a significant impact on an organization. More complex attacks such as a network exploitation attempt or targeted data breach increases the chaos that a security incident causes. Technical personnel will have their hands full attempting to determine the systems that have been impacted and how they are being manipulated. They will also have to contend with addressing the possible loss of data through compromised systems. Adding to this chaotic situation are senior managers haranguing them for updates and an answer to the all-important questions: How did this happen? and How bad is it?
Having the ability to properly respond to security incidents in an orderly and efficient manner allows organizations to both limit the damage of a potential cyber attack, and also recover from the associated damage that is caused. To facilitate this orderly response, organizations of all sizes have looked at adding an incident response capability to their existing policies, procedures, and processes.
In order to build this capability within the organization, several key components must be addressed. First, organizations need to have a working knowledge of the incident response process. This process outlines the general flow of an incident and the general actions that are taken at each stage. Second, organizations need to have access to personnel who form the nucleus of any incident response capability. Once a team is organized, a formalized plan and associated processes need to be created. This written plan and processes form the orderly structure that an organization can follow during an incident. Finally, with this framework in place, the plan must be continually evaluated, tested, and improved as new threats emerge. Utilizing this framework will position organizations to be prepared for the unfortunate reality that many organizations have already faced, an incident that compromises their security.
We will be covering the following topics in this chapter:
- The incident response process
- The incident response framework
- The incident response plan
- The incident response playbook
- Testing the incident response framework
There is a general path that cyber security incidents follow during their lifetime. If the organization has a mature incident response capability, they will have taken measures to ensure they are prepared to address an incident at each stage of the process. Each incident starts with the first time the organization becomes aware of an event or series of events indicative of malicious activity. This detection can come in the form of a security control alert or external party informing the organization of a potential security issue. Once alerted, the organization moves through analyzing the incident through containment measures to bring the information system back to normal operations. The following diagram shows how these flow in a cycle with Preparation as the starting point. Closer examination reveals that every incident is used to better prepare the organization for future incidents as the Post-Incident Activity, and is utilized in the preparation for the next incident:
The incident response process can be broken down into six distinct phases, each with a set of actions the organization can take to address the incident:
- Preparation: Without good preparation, any subsequent incident response is going to be disorganized and has the potential to make the incident worse. One of the critical components of preparation is the creation of an incident response plan. Once a plan is in place with the necessary staffing, ensure that personnel detailed with incident response duties are properly trained. This includes processes, procedures, and any additional tools necessary for the investigation of an incident. In addition to the plan, tools such as forensics hardware and software should be acquired and incorporated into the overall process. Finally, regular exercises should be conducted to ensure that the organization is trained and familiar with the process.
- Detection: The detection of potential incidents is a complex endeavor. Depending on the size of the organization, they may have over 100 million separate events per day. These events can be records of legitimate actions taken during the normal course of business or be indicators of potentially malicious activity. Couple this mountain of event data with other security controls constantly alerting to activity and you have a situation where analysts are inundated with data and must subsequently sift out the valuable pieces of signal from the vastness of network noise. Even today's cutting-edge Security Incident and Event Management (SIEM) tools lose their effectiveness if they are not properly maintained with regular updates of rule sets that identify what events qualify as a potential incident. The detection phase is that part of the incident response process where the organization first becomes aware of a set of events that possibly indicates malicious activity. This event, or events, that have been detected and are indicative of malicious behavior are then classified as an incident. For example, a security analyst may receive an alert that a specific administrator account was in use during the time where the administrator was on vacation. Detection may also come from external sources. An ISP or law enforcement agency may detect malicious activity originating in an organization's network and contact them and advise them of the situation.
In other instances, users may be the first to indicate a potential security incident. This may be as simple as an employee contacting the help desk and informing a help desk technician that they received an Excel spreadsheet from an unknown source and opened it. They are now complaining that their files on the local system are being encrypted. In each case, an organization would have to escalate each of these events to the level of an incident (which we will cover a little later in this chapter) and begin the reactive process to investigate and remediate.
- Analysis: Once an incident has been detected, personnel from the organization or a trusted third party will begin the analysis phase. In this phase, personnel begin the task of collecting evidence from systems such as running memory, log files, network connections, and running software processes. Depending on the type of incident, this collection can take as little as a few hours to several days.
Once the evidence is collected, it then needs be examined. There are a variety of tools to conduct this analysis, many of which are explored in this book. With these tools, analysts are attempting to ascertain what happened, what it affected, whether any other systems were involved, and whether any confidential data was removed. The ultimate goal of the analysis is to determine the root cause of the incident and reconstruct the actions of the threat actor from initial compromise to detection.
- Containment: Once there is a solid understanding of what the incident is and what systems are involved, organizations can then move into the containment phase. In this phase, organizations take measures to limit the ability for threat actors to continue compromising other network resources, communicating with command and control infrastructures, or exfiltrating confidential data. Containment strategies can range from locking down ports and IP addresses on a firewall to simply removing the network cable from the back of an infected machine. Each type of incident involves its own containment strategy, but having several options allows personnel to stop the bleeding at the source if they are able to detect a security incident before or during the time when threat actors are pilfering data.
- Eradication and recovery: During the eradication phase, the organization removes the threat actor from the impacted network. In the case of a malware infection, the organization may run an enhanced anti-malware solution. Other times, infected machines must be wiped and reimaged. Other activities include removing or changing compromised user accounts. If an organization has identified a vulnerability that was exploited, vendor patches are applied, or software updates are made. Recovery activities are very closely aligned with those that may be found in an organization's business continuity or disaster recovery plans. In this phase of the process, organizations reinstall fresh operating systems or applications. They will also restore data on local systems from backups. As a due diligence step, organizations will also audit their existing user and administrator accounts to ensure that there are no accounts that have been enabled by threat actors. Finally, a comprehensive vulnerability scan is conducted so that the organization is confident that any exploitable vulnerabilities have been removed.
- Post-incident activity: At the conclusion of the incident process is a complete review of the incident with all the principle stakeholders. Post-incident activity includes a complete review of all the actions taken during the incident. What worked, and more importantly, what did not work, are important topics for discussion. These reviews are important because they may highlight specific tasks and actions that had either a positive or negative impact on the outcome of the incident response. It is during this phase of the process that a written report is completed. Documenting the actions taken during the incident is critical to capture both what occurred and whether the incident will ever see the inside of a courtroom. For documentation to be effective, it should be detailed and show a clear chain of events with a focus on the root cause, if it was determined. Personnel involved in the preparation of this report should realize that stakeholders outside of information technology might read this report. As a result, technical jargon or concepts should be explained.
Finally, the organizational personnel should update their own incident response processes with any new information developed during the post-incident debrief and reporting. This incorporation of lessons learned is important as it makes future responses to incidents more effective.
There is a misconception that is often held by people unfamiliar with the realm of incident response. This misconception is that incident response is merely a digital forensics issue. As a result, they will often conflate the two terms. While digital forensics is a critical component of incident response (and for this reason we have included a number of chapters in this book to address digital forensics), there is more to addressing an incident than examining hard drives. It is best to think of forensics as a supporting function of the overall incident response process. Digital forensics serves as the mechanism for understanding the technical aspects of the incident, potentially identifying the root cause, and discovering unidentified access or other malicious activity. For example, some incidents such as Denial of Service (DoS) attacks will require little to no forensic work. On the other hand, a network intrusion involving the compromise of an internal server and Command and Control (C2) traffic leaving the network will require extensive examination of logs, traffic analysis, and examination of memory. From this analysis the root cause may be derived. In both cases, the impacted organization would be able to connect with the incident, but forensics plays a much more important role in the latter case.
Incident response is an information security function that uses the methodologies, tools, and techniques of digital forensics but goes beyond what digital forensics provides by addressing additional elements. These elements include containing possible malware or other exploits, identifying and remediating vulnerabilities, and managing various technical and non-technical personnel. Some incidents may require the analysis of host-based evidence or memory, others may only require a firewall log review but, in each, the responders will follow the incident response process.
Responding to a data breach, ransomware attack, or other security incident should never be an ad hoc process. Undefined processes or procedures will leave an organization unable to both identify the extent of the incident and be able to stop the bleeding in sufficient time to limit damage. Further, attempting to craft plans during an incident may in fact destroy critical evidence, or worse, create more problems.
Having a solid understanding of the incident response process is just the first step to building this capability within an organization. What organizations need is a framework that puts that processes to work utilizing the organization's available resources. The incident response framework describes the components of a functional incident response capability within an organization. This framework is made up of elements such as personnel, policies, and procedures. It is through these elements that an organization builds its capability to respond to incidents.
The first step to building this capability is the decision by senior leadership that the risk to the organization is too significant not to address the possibility of a potential security incident. Once that point is reached, a senior member of the organization will serve as a project sponsor and craft the incident response charter. This charter outlines key elements that will drive the creation of a Computer Security Incident Response Team (CSIRT).
The incident response charter should be a written document that addresses the following:
- Obtain senior leadership support: In order to be a viable part of the organization, the CSIRT requires the support of the senior leadership within the organization. In a private sector institution, it may be difficult to obtain the necessary support and funding, as the CSIRT itself does not provide value in the same way marketing or sales does. What should be understood is that the CSIRT acts as an insurance policy in the event the worst happens. In this manner, a CSRIT can justify its existence by reducing the impact of incidents and thereby reducing the costs associated with a security breach or other malicious activity.
- Define the constituency: The constituency clearly defines which organizational elements and domains the CSIRT has responsibility for. Some organizations have several divisions or subsidiaries that for whatever reason may not be part of the CSIRT's responsibility. The constituency can be defined either as a domain such as local.example.com or an organization name such as ACME Inc. and associated subsidiary organizations.
- Create a mission statement: Mission creep or the gradual expansion of the CSIRT's responsibilities can occur without clear definition of what the defined purpose of the CSIRT is. In order to counter this, a clearly defined mission statement should be included with the written information security plan. For example, the mission of the ACME Inc. CSIRT is to provide timely analysis and actions for security incidents that impact the confidentiality, integrity, and availability of ACME Inc. information systems and personnel.
- Determine service delivery: Along with a mission statement, a clearly defined list of services can also counter the risk of mission creep of the CSIRT. Services are usually divided into two separate categories, proactive and reactive services:
- Proactive services: These include providing training for non-CSIRT staff, providing summaries on emerging security threats, testing and deployment of security tools such as endpoint detection and response tools, and assisting security operations by crafting IDS/IPS alerting rules.
- Reactive services: These primarily revolve around responding to incidents as they occur. For the most part, reactive services address the entire incident response process. This includes the acquisition and examination of evidence, assisting in containment, eradication, and recovery efforts, and finally documenting the incident.
Another critical benefit of an expressly stated charter is to socialize the CSIRT with the entire organization. This is done to remove any rumors or innuendo about the purpose of the team. Employees of the organization may hear words such as digital investigations or incident response team and believe the organization is preparing a secret police specifically designed to ferret out employee misconduct. To counter this, a short statement that includes the mission statement of the CSIRT can be made available to all employees. The CSIRT can also provide periodic updates to senior leadership on incidents handled to demonstrate the purpose of the team.
Once the incident response charter is completed, the next stage is to start staffing the CSIRT. Larger organizations with sufficient resources may be able to task personnel with incident response duties full-time. More often than not though, organizations will have to utilize personnel who have other duties outside incident response. Personnel who comprise the internal CSIRT can be divided into three categories: core team, technical support, and organizational support. Each individual within the CSIRT fulfills a specific task. Building this capability into an organization takes more than just assigning personnel and creating a policy and procedure document. Like any major project initiative, there is a good deal of effort required in order to create a functional CSIRT.
For each of the CSIRT categories, there are specific roles and responsibilities. This wide range of personnel is designed to provide guidance and support through a wide range of incidents ranging from the minor to the catastrophic.
The CSIRT core team consists of personnel who have incident response duties as their full- time job or assume incident response activities when needed. In many instances, the core team is often made up of personnel assigned to the information security team. Other organizations can leverage personnel with expertise in incident response activities. The following are some of the roles that can be incorporated into the core team:
- Incident response coordinator: This is a critical component of any CSIRT. Without clear leadership, the response to a potential incident may be disorganized or with multiple individuals vying for control during an incident, a chaotic situation that can make the incident worse. In many instances, the incident response coordinator is often the Chief Security Officer (CSO), Chief Information Security Officer (CISO), or the Information Security Officer (ISO) as that individual often has overall responsibility for the security of the organization's information. Other organizations may name a single individual who serves as the incident response coordinator.
The incident response coordinator is responsible for management of the CSIRT prior to, during, and after an incident. In terms of preparation, the incident response coordinator will ensure that any plans or policies concerning the CSIRT are reviewed periodically and updated as needed. In addition, the incident response coordinator is responsible for ensuring that the CSIRT team is appropriately trained and oversees testing and training for CSIRT personnel. During an incident, the incident response coordinator is responsible for ensuring the proper response and remediation of an incident and guides the team through the entire incident response process. One of the most important of these tasks during an incident is coordination of the CSIRT with senior leadership. With the stakes of a data breach being high, senior leadership such as the Chief Executive Officer (CEO) will want to be informed of the critical information concerning an incident. It is the responsibility of the incident response coordinator to ensure that the senior leadership is fully informed of the activities associated with an incident using clear and concise language. One stumbling block is that senior leaders within an organization may not have the acumen to understand the technical aspects of an incident, so it is important to speak in language they will understand.
Finally, at the conclusion of an incident, the incident response coordinator is responsible for ensuring that the incident is properly documented and that reports of the CSIRT activity are delivered to the appropriate internal and external stakeholders. In addition, a full debrief of all CSIRT activities is conducted and lessons learned are incorporated into the CSIRT plan.
- CSIRT senior analyst(s): CSIRT senior analysts are personnel with extensive training and experience in incident response and associated skills such as digital forensics or network data examination. They often have several years of experience conducting incident response activities as either a consultant or as part of an enterprise CSIRT.
During the preparation phase of the incident response process, they are involved in ensuring that they have the necessary skills and training to address their specific role in the CSIRT. They are also often directed to assist in the incident response plan review and modification. Finally, senior analysts will often take part in training junior members of the team.
Once an incident has been identified, the senior analysts will engage with other CSIRT members to acquire and analyze evidence, direct containment activities, and assist other personnel with remediation.
At the conclusion of an incident, the senior analysts will ensure that both they and other personnel appropriately document the incident. This will include the preparation of reports to internal and external stakeholders. They will also ensure that any evidence is appropriately archived or destroyed depending on the incident response plan.
- CSIRT analyst(s): The CSIRT analysts are personnel with CSIRT responsibilities that have less exposure or experience in incident response activities. Oftentimes, they have only one or two years' experience of responding to incidents. As a result, they can perform a variety of activities with some of those under the direction of senior analysts.
In terms of preparation phase activities, analysts will develop their skills via training and exercises. They may also take part in reviews and updates to the incident response plan. During an incident, they will be tasked with gathering evidence from potentially compromised hosts, from network devices, or from various log sources. Analysts will also take part in the analysis of evidence and assist other team members in remediation activities.
- Security operations center analyst: Larger enterprises may have an in-house or contracted 24/7 Security Operations Center (SOC) monitoring capability. Analysts assigned to the SOC will often serve as the point person when it comes to incident detection and alerting. As a result, having an SOC analyst as part of the team allows them to be trained in incident identification and response techniques and serve as an almost immediate response to a potential security incident.
- IT security engineer/analyst(s): Depending on the size of the organization, there may be personnel specifically tasked with the deployment, maintenance, and monitoring of security-related software such as anti-virus or hardware such as firewalls or SIEM systems. Having direct access to these devices is critical when an incident has been identified. The personnel assigned these duties will often have a direct role in the entire incident response process.
The IT security engineer or analyst will be responsible for the preparation component of the incident response process. They will be the primary resource to ensure that security applications and devices are properly configured to alert to possible incidents and to ensure that the devices properly log events so that a reconstruction of events can take place.
During an incident, they will be tasked with monitoring security systems for other indicators of malicious behavior. They will also assist the other CSIRT personnel with obtaining evidence from the security devices. Finally, after an incident, these personnel will be tasked with configuring security devices to monitor for suspected behavior to ensure that remediation activities have eradicated the malicious activity on impacted systems.
Technical support personnel are those individuals within the organization who do not have CSIRT activities as part of their day-to-day operations, but rather have expertise or access to systems and processes that may be affected by an incident. For example, the CSIRT may need to engage a server administrator to assist the core team with acquiring evidence from servers such as memory captures, acquiring virtual systems, or offloading log files. Once completed, the server administrator's role is completed and they may have no further involvement in the incident. The following are some of the personnel that can be of assistance to the CSIRT during an incident:
- Network architect/administrator: Often, incidents involve the network infrastructure. This includes attacks on routers, switches, and other network hardware and software. The network architect or administrator is vital for insight into what is normal and abnormal behavior for these devices as well as identifying anomalous network traffic. In incidents where the network infrastructure is involved, these support personnel can assist with obtaining network evidence such as access logs or packet captures.
- Server administrator: Threat actors often target systems within the network where critical or sensitive data is stored. These high-value targets often include domain controllers, file servers, or database servers. Server administrators can aid in acquiring log files from these systems. If the server administrator(s) are also responsible for the maintenance of the active directory structure, they may be able to assist with identifying new user accounts or changes to existing user or administrator accounts.
- Application support: Web applications are a prime target for threat actors. Flaws in coding that allow attacks such as SQL injection or security misconfigurations are responsible for some security breaches. As a result, having application support personnel as part of the CSIRT facilitates the finding of information directly related to application attacks. These individuals will often be able to identify code changes or to confirm vulnerabilities discovered during an investigation into a potential attack against an application.
- Desktop support: Desktop support personnel are often involved in maintaining controls such as data loss prevention and anti-virus on desktop systems. In the event of an incident, they can assist in providing the CSIRT with log files and other evidence. They may also be responsible for cleaning up infected systems during the remediation phase of an incident.
- Help desk: Depending on the organization, help desk personnel are the proverbial canary in the coal mine when it comes to identifying an incident. They are often the first individuals contacted when a user experiences the first signs of a malware infection or other malicious activity. Thus, help desk personnel should be involved in the training of the CSIRT responses and their role in the incident identification and escalation procedures. They may also assist with identifying additional affected personnel in the event of a widespread incident.
Outside of the technical realm, there are still other organizational members that should be included within the CSIRT. Organizational personnel can assist with a variety of non-technical issues that fall outside those that are addressed by the CSIRT core and technical support personnel. These include navigating the internal and external legal environment, assisting with customer communications, or supporting CSIRT personnel while onsite.
The following are some of the organizational support personnel that should be included in a CSIRT plan:
- Legal: Data breaches and other incidents carry a variety of legal issues along with them. Many countries now have breach notification laws where organizations are required to notify customers that their information was put at risk. Other compliance requirements such as HIPAA and the PCI DSS require the impacted organization to contact various external bodies and notify them of a suspected breach. Including legal representation early in the incident response process will ensure that these notifications and any other legal requirements are addressed in a timely fashion. In the event that a breach has been caused by an internal source such as an employee or contractor, the impacted organization may want to recoup losses through civil action. Including legal representation early in the process will allow a more informed decision as to what legal process should be followed.
- Human resources: A good deal of incidents that occur in organizations are perpetrated by employees or contractors. The investigation of actions such as fraud all the way to massive data theft may have to be investigated by the CSIRT. In the event that the target of the investigation is an employee or contractor, the human resources department can assist with ensuring that the CSIRT's actions are in compliance with applicable labor laws and company policies. If an employee or contractor is to be terminated, the CSIRT can coordinate with the human resources personnel so that all proper documentation concerning the incident is complete to reduce the potential of a wrongful termination suit.
- Marketing/communications: If external clients or customers may be adversely impacted by an incident such as a DoS attack or data breach, the marketing or communications department can assist in crafting the appropriate message to assuage fears and ensure that those external entities are receiving the best information possible. When looking back at past data breaches where organizations attempted to keep the details to themselves and customers were not informed, there was a backlash against those organizations. Having a solid communications plan that is put into action early will go a long way to soothing any potential customer or client adverse reactions.
- Facilities: The CSIRT may need access to areas after hours or for a prolonged time. The facilities department can assist the CSIRT in obtaining the necessary access in a timely manner. Facilities also may have access to additional meeting spaces for the CSIRT to utilize in the event of a prolonged incident that requires dedicated workspace and infrastructure.
- Corporate security: The CSIRT may be called in to deal with the theft of network resources or other technology from the organization. Laptop and digital media theft are very common. Corporate security will often have access to surveillance footage from entrances and exits. They may also maintain access badges and visitor logs for the CSIRT to track movement of employees and other personnel within the facility. This can allow a reconstruction of events leading up to a theft or other circumstances that led up to the incident.
Many industries have professional organizations where practitioners, regardless of their employer, can come together to share information. CSIRT personnel may also be tasked with interfacing with law enforcement and government agencies at times, especially if they are targeted as part of a larger attack perpetrated against a number of similar organizations. Having relationships with external organizations and agencies can assist the CSIRT with intelligence sharing and resources in the event of an incident. These resources include the following:
- High Technology Crime Investigation Association (HTCIA): The HTCIA is an international group of professionals and students with a focus on high-tech crime. Resources include everything from digital forensic techniques to wider enterprise-level information that could aid CSIRT personnel with new techniques and methods. For more information, visit the official website: https://htcia.org/.
- InfraGard: For those CSIRT and information security practitioners in the United States, the Federal Bureau of Investigation has created a private-public partnership geared toward networking and information sharing. This partnership allows CSIRT members to share information about trends or discuss past investigations. We can find more information on the website: https://www.infragard.org/.
- Law enforcement: Law enforcement has seen an explosive growth in cyber-related criminal activity. In response, a great many law enforcement organizations have increased their capacity to investigate cyber crime. CSIRT leadership should cultivate a relationship with agencies that have cyber crime investigative capabilities. Law enforcement agencies can provide insight into specific threats or crimes being committed and provide CSIRTs with any specific information that concerns them.
- Vendors: External vendors can be leveraged in the event of an incident and what they can provide is often dependent on the specific line of business the organization has engaged them in. For example, an organization's IPS/IDS solution provider could assist with crafting custom alerting and blocking rules to assist in the detection and containment of malicious activity. Vendors with threat intelligence capability can also provide guidance on malicious activity indicators. Finally, some organizations will need to engage vendors who have an incident response specialty such as reverse engineering malware when those skills fall outside an organization's capability.
Depending on the size of the organization, it is easy to see how the CSIRT can involve a number of people. It is critical to putting together the entire CSIRT that each member is aware of their roles and responsibilities. Each member should also be asked for specific guidance on what expertise can be leveraged during the entire incident response process. This becomes more important in the next part of the incident response framework, which is the creation of an incident response plan.
With the incident response charter written and the CSIRT formed, the next step is to craft the incident response plan. The incident response plan is the document that outlines the high-level structure of an organization's response capability. This is a high-level document that serves as the foundation of the CSIRT. The major components of the incident response plan are as follows:
- Incident response charter: The incident response plan should include the mission statement and constituency from the incident response charter. This gives the plan continuity between the inception of the incident response capability and the incident response plan.
- Expanded services catalog: The initial incident response charter had general service categories with no real detail. The incident response plan should include specific details of what services the CSIRT will be offering. For example, if forensic services are listed as part of the service offering, the incident response plan may state that forensic services include the evidence recovery from hard drives, memory forensics, and reverse engineering potentially malicious code in support of an incident. This allows the CSIRT to clearly delineate between a normal request, say for the searching of a hard drive for an accidentally deleted document not related to an incident, and the imaging of a hard drive in connection with a declared incident.
- CSIRT personnel: As outlined before, there are a great many individuals who comprise the CSIRT. The incident response plan will clearly define these roles and responsibilities. Organizations should expand out from just a name and title and define exactly the roles and responsibilities of each individual. It is not advisable to have a turf war during an incident, and having the roles and responsibilities of the CSIRT personnel clearly defined goes a long way to reducing this possibility.
- Contact list: An up-to-date contact list should be part of the incident response plan. Depending on the organization, the CSIRT may have to respond to an incident 24 hours a day. In this case, the incident response plan should have primary and secondary contact information. Organizations can also make use of a rotating on-call CSIRT member who could serve as the first contact in the event of an incident.
- Internal communication plan: Incidents can produce a good deal of chaos as personnel attempt to ascertain what is happening, what resources they need, and who to engage to address the incident. The incident response plan internal communication guidance can address this chaos. This portion of the plan addresses the flow of information upward and downward between senior leadership and the CSIRT. Communication sideways between the CSIRT core and support personnel should also be addressed. This limits the individuals who are communicating with each other and cuts down on potentially conflicting instructions.
- Training: The incident response plan should also indicate the frequency of training for CSIRT personnel. At a minimum, the entire CSIRT should be put through a tabletop exercise at least annually. In the event that an incident post-mortem analysis indicates a gap in training, that should also be addressed within a reasonable time after conclusion of the incident.
- Maintenance: Organizations of every size continually change. This can include changes to infrastructure, threats, and personnel. The incident response plan should address the frequency of reviews and updates to the incident response plan. For example, if the organization acquires another organization, the CSIRT may have to adjust service offerings or incorporate specific individuals and their roles. At a minimum, the incident response plan should be updated at least annually. Individual team members should also supplement their skills through individual training and certifications through such organizations as SANS or on specific digital forensic tools. Organizations can incorporate lessons learned from any exercises conducted into this update.
Not all incidents are equal in their severity and threat to the organization. For example, a virus that infects several computers in a support area of the organization will dictate a different level of response than an active compromise of a critical server. Treating each incident the same will quickly burn out a CSIRT as they will have to respond the same way to even minor incidents.
As a result, it is important to define within the incident response plan an incident classification schema. By classifying incidents and gauging the response, organizations make better use of the CSIRT and ensure that they are not all engaged in minor issues. The following is a sample classification schema:
- High-level incident: A high-level incident is an incident that is expected to cause significant damage, corruption, or loss of critical and/or strategic company or customer information. A high-level incident may involve widespread or extended loss of system or network resources. The event can potentially cause damage to the organization and its corporate public image and result in the organization being liable. Examples of high-level incidents include, but are not limited to, the following:
- Network intrusion
- Physical compromise of information systems
- Compromise of critical information
- Loss of computer system or removable media containing un-encrypted confidential information
- Widespread and growing malware infection (more than 25% of hosts)
- Targeted attacks against the IT infrastructure
- Phishing attacks using the organization's domain and branding
- Moderate-level incident: A moderate-level incident is an incident that may cause damage, corruption, or loss of replaceable information without compromise (there has been no misuse of sensitive customer information). A moderate-level event may involve significant disruption to a system or network resource. It also may have an impact on the mission of a business unit within the corporation:
- Anticipated or ongoing DoS attack.
- Loss of computer system or removable media containing un-encrypted confidential information.
- Misuse or abuse of authorized access.
- Automated intrusion.
- Confined malware infection.
- Unusual system performance or behavior.
- Installation of malicious software.
- Suspicious changes or computer activity.
- Playbooks can be configured in a number of ways. For example, a written document can be added to the incident response plan for specific types of incidents. Other times, organizations can use a flow diagram utilizing software such as iStudio or Visio. Depending on how the organization chooses to document the playbook, they should create 10-20 that address the range of potential incidents.
- Low-level incident: A low-level incident is an incident that causes inconvenience and/or unintentional damage or loss of recoverable information. The incident will have little impact on the corporation:
- Policy or procedural violations detected through compliance reviews or log reviews
- Lost or stolen laptop or other mobile equipment containing encrypted confidential information
- Installation of unauthorized software
- Malware infection of a single PC
- Incident tracking: Tracking incidents is a critical responsibility of the CSIRT. During an incident, all actions taken by the CSIRT and other personnel during an incident should be noted. These actions should be recorded under a unique incident identifier.
One key aspect of the incident response plan is the use of playbooks. An incident response playbook is a set of instructions and actions to be performed at every step in the incident response process. The playbooks are created to give organizations a clear path through the process, but with a degree of flexibility in the event that the incident under investigation does not fit neatly into the box.
A good indicator of which playbooks are critical is the organization's risk assessment. Examining the risk assessment for any threat rated critical or high will indicate which scenarios need to be addressed via an incident response playbook. Most organizations would identify a number of threats, such as a network intrusion via a zero-day exploit, ransomware, or phishing as critical, requiring preventive and detective controls. As the risk assessment has identified those as critical risks, it is best to start the playbooks with those threats.
For example, let's examine the breakdown of a playbook for a common threat, social engineering. For this playbook, we are going to divide it out into the incident response process that was previously discussed.
- Preparation: In this section, the organization will highlight the preparation that is undertaken. In the case of phishing, this can include employee awareness to identify potential phishing email or the use of an email appliance that scans attachments for malware.
- Detection: For phishing attacks, organizations are often alerted by aware employees or through email security controls. Organizations should also plan on receiving alerts via malware prevention or Host Intrusion Prevention System (HIPS) controls.
- Analysis: If an event is detected, analyzing any evidence available will be critical to classifying and appropriately responding to an incident. In this case, analysis may include examining the compromised host's memory, examining event logs for suspicious entries, and reviewing any network traffic going to and from the host.
- Containment: If a host has been identified as compromised, it should be isolated from the network.
- Eradication: In the event that malware has been identified, it should be removed. If not, the playbook should have an alternative such as reimaging with a known good image.
- Recovery: The recovery stage includes scanning the host for potential vulnerabilities and monitoring the system for any anomalous traffic.
- Post-incident activity: The playbook should also give guidance on what actions should take place after an incident. Many of these actions will be the same across the catalog of playbooks, but are important to include, ensuring that they are completed in full.
The following diagram is a sample playbook for a phishing attack. Note that each phase of the incident response cycle is addressed as well as specific actions that should be taken as part of the response. Additionally, organizations can break specific actions down, such as through log analysis for a certain playbook, for greater detail:
Playbooks are designed to give the CSIRT and any other personnel a set of instructions to follow in an incident. This allows less time to be wasted if a course of action is planned out. Playbooks serve as a guide and they should be updated regularly, especially if they are used in an incident and any key pieces or steps are identified. It should be noted that playbooks are not written in stone and are not a checklist. CSIRT personnel are not bound to the playbook in terms of actions and should be free to undertake additional actions if the incident requires it.
A critical component of the incident response plan is the escalation procedures. Escalation procedures outline who is responsible for moving an event or series of events from just anomalies in the information system to an incident. The CSIRT will become burned out if they are sent to investigate too many false positives. The escalation procedures ensure that the CSIRT is effectively utilized and that personnel are only contacted if their expertise is required.
The procedures start with the parties who are most likely to observe anomalies or events in the system that may be indicative of a larger incident. For example, the help desk may receive a number of calls that indicate a potential malware infection. The escalation procedures may indicate that if malware is detected and cannot be removed via malware prevention controls, they are to contact the CSIRT member on call. That CSIRT member will then take control. If they are able to contain the malware to that single system and identify the infection vector, they will attempt to remove the malware and, barring that, have the system reimaged and redeployed. At that point, the incident has been successfully concluded. The CSIRT member can document the incident and close it out without having to engage any other resources.
Another example where the escalation moves farther up into an all-out CSIRT response can start very simply with an audit of active directory credentials. In this case, a server administrator with access management responsibilities is conducting a semi-annual audit of administrator credentials. During the audit, they identify three new administrator user accounts that do not tie to any known access rights. After further digging, they determine that these user accounts were created within several hours of each other and were created over a weekend. The server administrator contacts the CSIRT for investigation.
The CSIRT analyst looks at the situation and determines that a compromise may have happened. The CSIRT member directs the server administrator to check event logs for any logins using those administrator accounts. The server administrator identifies two logins, one on a database server and another on a web server in the DMZ. The CSIRT analyst then directs the network administrator assigned to the CSIRT to examine network traffic between the SQL database and the web server. Also, based on the circumstances, the CSIRT analyst escalates this to the CSIRT coordinator and informs them of the situation. The CSIRT coordinator then begins the process of engaging other CSIRT core team and technical support members to assist.
After examining the network traffic, it is determined that an external threat actor has compromised both systems and is in the process of exfiltrating the customer database from the internal network. At this point, the CSIRT coordinator identifies this as a high-level incident and begins the process of bringing support personnel into a briefing. As this incident has involved the compromise of customer data, the CSIRT support personnel such as marketing or communications and legal need to become involved. If more resources are required, the CSIRT coordinator will take the lead on making that decision.
The escalation procedures are created to ensure that the appropriate individuals have the proper authority and training to call upon resources when needed. The escalation procedures should also address the involvement of other personnel outside the core CSIRT members based on the severity of the incident. One of the critical functions of the escalation procedures is to clearly define what individuals have the authority to declare anomalous activity an incident. The escalation procedures should also address the involvement of other personnel outside the core CSIRT members, based on the severity of the incident.
So far, there have been a number of areas that have been addressed in terms of preparing for an incident. From an initial understanding of the process involved in incident response, we moved through the creation of an incident response plan and associated playbooks.
Once the capability has been created, it should be run through a table-top exercise to flush out any gaps or deficiencies. This exercise should include a high-level incident scenario that involves the entire team and one of the associated playbooks. A report that details the results of the table-top exercise and any gaps, corrections, or modifications should also be prepared and forwarded to the senior leadership. Once leadership has been informed and acknowledges that the CSIRT is ready to deploy, it is now operational.
As the CSIRT becomes comfortable executing the plan under a structured scenario, they may want to try more complex testing measures. Another option that is available is the Red/Blue or Purple Team exercise. This is where the CSIRT is tasked with responding to an authorized penetration test. Here, the team is able to execute against a live adversary and test the plans and playbooks. This significantly increases the value of the penetration test as it provides both insight into the security of the infrastructure as well as the ability for the organization to respond appropriately.
Regardless of the makeup of the team, another key component of CSIRT deployment is the inclusion of regular training. For CSIRT core members, specific training on emerging threats, forensic techniques, and tools should be ongoing. This can be facilitated through third-party training providers or, if available, in-house training. The technical support members of the CSIRT should receive regular training on techniques and tools available. This is especially important if these members may be called upon during an incident to assist with evidence collection or remediation activities. Finally, the other support members should be included in the annual test of the incident response plan. Just as with the inaugural test, the organization should pick a high-level incident and work through it using a table-top exercise. Another option for the organization is to marry up the test of their incident response plan with a penetration test. If the organization is able to detect the presence of the penetration test, they have the ability to run through the first phases of the incident and craft a tabletop for the remaining portions.
One final component of the ongoing maintenance of the incident response plan is a complete annual review. This annual review is conducted to ensure that any changes in personnel, constituency, or mission that may impact other components of the plan are addressed. In addition to a review of the plan, a complete review of the playbooks is conducted as well. As threats change, it may be necessary to change existing playbooks or add new ones. The CSIRT personnel should also feel free to create a new playbook in the event that a new threat emerges. In this way, the CSIRT will be in a better position to address incidents that may impact their organization. Any major changes or additions should also trigger another table-top exercise to validate the additional plans and playbooks.
Benjamin Franklin is quoted as saying,
By failing to prepare, you are preparing to fail." In many ways, this sentiment is quite accurate when it comes to organizations and the threat of cyber attacks. Preparing for a cyber attack is a critical function that must be taken as seriously as any other aspect of cyber security. Having a solid understanding of the incident response process to build on with an incident response capability can provide organizations with a measure of preparation, so that in the event of an incident, they can respond. Keep in mind as we move forward that forensic techniques, threat intelligence, and reverse engineering are there to assist an organization to get to the end, that is, back up and running.
This chapter explored some of the preparation that goes into building an incident response capability. Selecting a team, creating a plan, and building the playbooks and the escalation procedures allows a CSIRT to effectively address an incident. The CSIRT and associated plans give structure to the digital forensic techniques to be discussed. This discussion begins with the next chapter, where proper evidence handling and documentation is the critical first step in investigating an incident.
- A table-top exercise should be conducted after changes are made to the incident response plan and/or playbooks.
- Which of the following roles would not be a member of the CSIRT core team?
A) Incident coordinator
B) CSIRT analyst
- It is not important to have technical resources available as part of the incident response framework to aid during an incident.
- The risk assessment is a valid data source for identifying high-risk incidents for playbook creation.
- Computer Security Incident Handling Guide, NIST SP 800-61 Rev 2 : https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final
- ENISA Incident Handling in Live Role Play Handbook: https://www.enisa.europa.eu/topics/trainings-for-cybersecurity-specialists/online-training-material/documents/incident-handling-in-live-role-playing-handbook/view
- Incident Handler's Handbook by Patrick Kral, SANS Reading Room: https://www.sans.org/reading-room/whitepapers/incident/incident-handlers-handbook-33901