If you purchased, borrowed or otherwise picked up this book, there is a good chance you are concerned about Industrial Controls System or ICS security in some way. Along with regular cyber security, ICS security is a hot topic these days. Not a day goes by without some company getting compromised, critical infrastructure controls systems getting infiltrated or our personal information getting splattered all over the internet. As a matter of fact, while writing this book, the following major security events occurred, some even influenced the material of this book:
- In May of 2017 the WannaCry ransomware severely impacted the National Health Service (NHS) and locked hospital workers out of critical healthcare patient data:
- In June of 2017 it is discovered that a sophisticated piece of malware, named Crash Override, targeted infrastructure companies in the United States and Europe in 2014 and brought down the Ukraine electric utilities in 2015. At first it was believed the attacks were random acts of aggression with limited intelligence. Research, performed by Dragos unveiled malicious code that sets a new level of sophistication in ICS targeted malware:
- In July of 2017 the NotPetya WiperWorm causes major downtime and revenue loss for companies like companies the Oreo cookie maker Mondelez, drug maker Merck and car manufacturer Honda:
- Not directly related to ICS security but well worth mentioning here as the Equifax breach of September 2017 is a great example of how flawed security can lead to a devastating compromise of customer's personal information. With some due diligence and common security practices this disaster could have been prevented:
By writing this book I am embarking in educating the reader in the process of securing an Industrial control system by applying industry-wide adopted best practice methods and technologies. The book will use a fictive company as a silver lining throughout the learning process. The company isn't directly based on any real-time business but more a cumulative set of experiences of security postures and situations I have encountered over time.
Before we can dive into any security discussions, with this first chapter, we will discuss exactly what an Industrial control system (ICS), is and what it does. We will look at the different parts that make up an Industrial control system. From an architectural perspective, we will examine the individual parts that can be found in modern day ICSes and look at how they work together to accomplish a common task. We will end the chapter with an examination of the various industrial communication protocols that are used to connect all the parts, systems, and devices in an ICS. This includes a high-level explanation of the Purdue model, a reference model commonly used to explain Industrial control system.
From the traffic lights on your drive to work, or the collision avoidance system of the train or metro, to the delivery of electricity that powers the light you use to read this book, to the processing and packaging that went into creating the jug of milk in your fridge, to the coffee grinds for that cup of joe that fuels your day; what all these things have in common are the Industrial control systems driving the measurements, decisions, corrections, and actions that result in the end products and services that we take for granted each day.
The following diagram shows the architecture of a properly designed, modern ICS. The intent of this book is to educate you on the methodologies and considerations that went into the design of an architecture, such as the one shown here:
Technically speaking, the Industrial control system lives in the area marked Industrial Zone of the preceding diagram. However, as we will discuss later in this book, because most ICSes interact with the Enterprise Zone, in order to effectively secure the system as a whole, consideration must also be given to the systems in the Enterprise Zone.
An ICS is a variety of control systems and associated instrumentation used in industrial production technology to achieve a common goal, such as creating a product or delivering a service. From a high-level perspective, ICSes can be categorized by their function. They can have one or several of the functions discussed in the following sections.
The view function encompasses the ability to watch the current state of the automation system in real time. This data can be used by operators, supervisors, maintenance engineers, or other personnel to make business decisions or perform corrective actions. For example, when the operator sees that the temperature of cooker 1 is getting low, they might decide to increase the steam supply of the cooker to compensate this. The view process is passive in nature, merely providing the information or view for a human to react on:
From a security perspective, if an attacker can manipulate the operator's view of the status of the control system or, in other words, can change the values the operator bases their decisions on, the attacker effectively controls the reaction and, therefore, the complete process. For example, by manipulating the displayed value for the temperature of cooker 1, an attacker can make the operator think the temperature is too low or too high and have him or her act upon the manipulated data.
The monitor function is often part of a control loop, such as the automation behind keeping a steady level in a tank. The monitor function will keep an eye on a critical value, such as pressure, temperature, level, and so on, and compare the current value against predefined threshold values, and alarm or interact depending on the setup of the monitoring function. The key difference between the view function and the monitor function is in the determination of deviation. With monitoring functions, this determination is an automated process, whereas with a view function, this determination is made by a human looking at the values. The reaction of the monitor function can range from a pop-up alarm screen to a fully automated system shutdown procedure.
From a security perspective, if an attacker can control the value that the monitor function is looking at, the reaction of the function can be triggered or prevented; for example, a case where a monitoring system is looking at the temperature of cooker 1, preventing the temperature from exceeding 300 degrees Fahrenheit. If an attacker feeds a value of less than 300°F into the system, that system would be tricked into believing all is well, while in actuality, the system could be in meltdown.
The following diagram illustrates the control function:
The control function is where things are controlled, moved, activated, and initiated. The control system is what makes actuators engage, valves open, and motors run. The control actions can either be initiated by an operator pushing a button or changing a set point on an HMI screen, or it can be an automated response as part of the process control.
From a security perspective, if an attacker can manipulate the values (the input) the control system reacts to or if the attacker can change or manipulate the control function itself (the control program), the system can be tricked into doing things it wasn't designed to do or intended for.
Now I can hear you all say that manipulating values is all nice and dandy, but surely that cannot be done with modern switched networks and encrypted network protocols. That would be true if those technologies were implemented and used. The sad state of affairs is that on most, if not all, ICS networks, the confidentiality and integrity parts of the CIA security triage are of less importance than availability. Even worse, for most Industrial control systems, availability ends up being the only design consideration when architecting the system. Combine that with the fact that the ICS communication protocols that run on these networks were never designed with security in mind, and one can start to see the feasibility of the scenarios mentioned.
More about all this will be discussed in later chapters, when we dive deeper into the vulnerabilities mentioned and look at how they can be exploited.
Industrial control system is an all-encompassing term used for various automation systems and its devices, such as Programmable Logic Controllers (PLC), Human Machine Interface (HMI), Supervisory Control and Data Acquisition (SCADA) systems, Distributed Control Systems (DCS), Safety Instrumented Systems (SIS), and many others:
Programmable logic controllers, or PLCs, are at the heart of just about every Industrial control system. These are the devices that take data from sensors via input channels and control actuators via output channels. A typical PLC consists of a microcontroller (the brains) and an array of input and output channels. Input and output channels can be analog, digital, or network-exposed values. These I/O channels often come as add-on cards that attach to the backplane of a PLC. This way, a PLC can be customized to fit many different functions and implementations.
The programming of a PLC can be done via a dedicated USB or serial interface on the device or via the network communications bus that is built into the device or comes as an add-on card. Common networking types in use are Modbus, Ethernet, ControlNet, PROFINET, and others.
PLCs can be deployed as standalone devices, controlling a certain part of the manufacturing process, such as a single machine, or they can be deployed as distributed systems, spanning multiple plants in disperse locations with thousands of I/O points and numerous interconnecting parts.
The HMI is the window into the control system. It visualizes the running process, allowing inspection and manipulation of process values, the showing of alarms, and trending of control values. At its simplest form, an HMI is a standalone touch-enabled device that communicates via a serial or Ethernet encapsulated protocol. More advanced HMI systems can use distributed servers to offer a redundant supply of HMI screens and data:
The Supervisory Control and Data Acquisition system is a term used to describe a combined use of ICS types and devices, all working together on a common task. The following diagram illustrates an example SCADA network. Here, the SCADA network is comprised of all the equipment and components that together form the overall system. SCADA systems are often spread out over a wide geographical area as a result of being applied to power grids, water utilities, pipeline operations, and other control systems that use remote operational stations:
Closely related to the SCADA system is the distributed control system. The differences between a SCADA system and a DCS are very small and the two have become almost indistinguishable over time. Traditionally, though SCADA systems were used for automation tasks that cover a larger geographical area, meaning that parts of the SCADA system are located in separate buildings or facilities as where a DCS is more often confined to a single plant of facility. A DCS is often a large-scale, highly engineered system with a very specific task. It uses a centralized supervisory unit that can control thousands of I/O points. The system is built to last with redundancy applied to all levels of the installation, from redundant networks and network interface attached to redundant server sets to redundant controllers and sensors, all with creating a rigid and solid automation platform in mind.
DCS systems are most commonly found in water management systems, paper and pulp mills, sugar refinery plants, and so on:
Safety instrumented systems, or SIS, are dedicated safety monitoring systems. They are there to safely and gracefully shut down the monitored system or bring that system to a predefined safe state in case of a hardware malfunction. An SIS uses a set of voting systems to determine whether a system is performing normally:
So how does all this tie together? What makes for a solid ICS architecture? To answer this question, we should first discuss the Purdue reference model, or Purdue model for short. As shown in the following figure, Purdue model was adopted from the Purdue Enterprise Reference Architecture (PERA) model by ISA-99 and used as a concept model for ICS network segmentation. It is an industry adopted reference model that shows the interconnections and interdependencies of all the main components of a typical ICS.
The model is a great resource to start the process of figuring out a typical modern ICS architecture:
The Purdue model will be discussed in more detail in a later chapter, but for now, to support our architecture discussion, let's look at a high-level overview. The following sections are based on the complete ICS architecture shown at the beginning of the chapter.
The Purdue model divides this ICS architecture into three zones and six levels. Starting from the top, these are:
- Level 5: Enterprise network
- Level 4: Site business and logistics
- Industrial Demilitarized zone.
- Manufacturing zone (also called the Industrial zone):
- Level 3: Site operations
- Level 2: Area supervisory control
- Level 1: Basic control
- Level 0: The process
The enterprise zone is the part of the ICS where business systems such as ERP and SAP typically live. Here, tasks such as scheduling and supply chain management are performed.
The enterprise zone can be subdivided into two levels:
- Level 5: Enterprise network
- Level 4: Site business and logistics
The systems on the enterprise network normally sit at a corporate level and span multiple facilities or plants. They take data from subordinate systems out in the individual plants and use the accumulated data to report on the overall production status, inventory, and demand. Technically not part of the ICS, the enterprise zone relies on connectivity with the ICS networks to feed the data that drives the business decisions.
Level 4 is home to all the Information Technology (IT) systems that support the production process in a plant of a facility. These systems report production statistics such as uptime and units produced for corporate systems and take orders and business data from the corporate systems to be distributed among the Operation Technology (OT) or ICS systems.
Systems typically found in level 4 include database servers, application servers (web, report, MES), file servers, email clients, supervisor desktops, and so on.
The following figure explains the Industrial Demilitarized Zone in detail:
In between the enterprise zone and systems and the Industrial zone lies the Industrial Demilitarized Zone or IDMZ. Much like a traditional (IT) DMZ, the OT-oriented IDMZ allows you to securely connect networks with different security requirements.
The IDMZ is the result of the efforts taken to create security standards such as the NIST Cybersecurity Framework and NERC CIP. The IDMZ is an information sharing layer between the business or IT systems in levels 4 and 5 and the production or OT systems in levels 3 and lower. By preventing direct communication between IT and OT systems and having a broker service in the IDMZ relay the communications, an extra layer of separation and inspection is added to the overall architecture. Systems in the lower layers are not directly exposed to attacks or compromise. If something were to compromise a system at some point in the IDMZ, the IDMZ could be shut down, the compromise could be contained, and production could continue.
Systems typically found in the Industrial Demilitarized Zone include (web) proxy servers, database replication servers, Microsoft domain controllers, and so on.
The following figure explains the various manufacturing zones:
The manufacturing zone is where the action is; it is the zone where the process lives, by all means, this is the core of the. The manufacturing zone is subdivided into four levels:
- Level 3: Site operations
- Level 2: Area supervisory control
- Level 1: Basic control
- Level 0: The process
Level 3 is where systems that support plant wide control and monitoring functions reside. At this level, the operator is interacting with the overall production systems. Think of centralized control rooms with HMIs and operator terminals that provide an overview of all the systems that run the processes in a plant or facility. The operator uses these HMI systems to perform tasks such as quality control checks, managing uptime, and monitoring alarms, events, and trends.
Level 3, site operations, is also where the OT systems that report back up to IT systems in level 4 live. Systems in lower levels send production data to data collection and aggregation servers in this level, which can then send the data to higher levels or can be queried by systems in higher levels (push versus pull operations).
Systems typically found in level 3 include database servers, application servers (web and report), file servers, Microsoft domain controllers, HMI servers engineering workstations, and so on.
Many of the functions and systems in level 2 are the same as for level 3 but targeted more toward a smaller part or area of the overall system. In this level, specific parts of the system are monitored and managed with HMI systems. Think along the lines of a single machine or skid with a touch screen HMI to start or stop the machine or skid and see some basic running values and manipulate machine or skid-specific thresholds and set points.
Systems typically found in level 2 include HMIs (standalone or system clients), supervisory control systems such as a line control PLC, engineering workstations, and so on.
Level 1 is where all the controlling equipment lives. The main purpose of the devices in this level is to open valves, move actuators, start motors, and so on. Typically found in level 1 are PLCs, Variable Frequency Drives (VFDs), dedicated proportional-integral-derivative (PID) controllers, and so on. Although you could find a PLC in level 2, its function there is of supervisory nature instead of controlling.
Level 0 is where the actual process equipment that we are controlling and monitoring from the higher levels lives. Also known as Equipment Under Control (EUC), level 1 is where we can find devices such as motors, pumps, valves, and sensors that measure speed, temperature, or pressure. As level 0 is where the actual process is performed and where the product is made, it is imperative that things run smoothly and uninterrupted. The slightest disruption in a single device can cause mayhem for all operations.
How do all these parts of an ICS communicate? Traditionally, ICS systems used several distinct and proprietary communication media and protocols. The recent trend has been to adopt many of these proprietary protocols to work on a common medium, Ethernet, and a common communications protocol suite, Internet Protocol (IP). Therefore, you will find technologies such as PROFIBUS, traditionally run over serial cables, converted into PROFINET, which runs on Ethernet and IP. Modbus, which traditionally runs on serial lines, got converted into Modbus TCP/IP, which supports Ethernet and IP. The Common Industrial Protocol (CIP), traditionally run on coax medium via the ControlNet protocol or Controller Area Network (CAN) medium via the DeviceNet protocol now runs on the Industrial Protocol with Ethernet/IP (IP stands for Industrial Protocol in this case).
Chapter 2, Insecure by Inheritance, will provide a detailed explanation on all the aforementioned protocols and point out security concerns for them. For now, we are sticking to the explanation of how these individual protocols and media are used to connect all the parts and systems of a modern-day ICS.
The communication protocols found in a typical Industrial control system can be divided into the following categories; keep in mind that these run within the IP suite.
Regular information technology or IT protocols are are the protocols that are in use on everyday IT networks. Some examples of those protocols include HTTP, HTTPS, SMTP, FTP, TFTP, SMB, and SNMP. This doesn't mean these protocols are used exclusively for IT purposes. Many OT devices, for example, will incorporate a diagnostic web page or use FTP to receive an application or firmware updates:
Traditionally, these protocols were used only outside of the plant floor and ICS networks in levels 4 and 5 of the Purdue model. With the trend of converging OT and IT networks technologies, many of these protocols can now be found all the way down to level 1 and with them their vulnerabilities too, which have plagued regular IT networks for years.
Process automation protocols include PROFIBUS, DeviceNet, ControlNet, Modbus, and CIP. These protocols are used to connect control devices together, be it a PLC to a sensor, a PLC to a PLC, or an engineering workstation to a control device to configure or program the device:
These protocols tend to be found mostly in levels 3 and lower of the Purdue model. A properly configured IDMZ should block any process automation protocol from leaving the Industrial zone.
From a security perspective, these protocols were never designed with security in mind. They forgo using encryption or implementing integrity checks to provide higher performance, stability, or compatibility. This opens them up to replay attacks, modification of the payload, and others. Chapter 2, Insecure by Inheritance, will expose the vulnerabilities per protocol in more depth.
Industrial control system protocols are mainly used for interconnecting devices and systems in different vendors, such as using a generic HMI solution to connect to a Siemens or Rockwell Automation PLC:
The main protocol in this category is OLE for Process Control or OPC. OPC is a series of standards and applications for industrial communications based on OLE, COM, and DCOM technologies developed by Microsoft.
From a security perspective, OPC is a nightmare. The protocol is easy to implement, flexible, and forgiving and provides the programmer with direct access to data registers from a large array of devices from all major vendors, all without any regard for authentication, data confidentiality, or integrity. Even more, the areas where OPC services are implemented ensure that this unprotected data needs to traverse from level 1 all the way to level 4. Someone once told me this joke: only two things can survive a nuclear bomb, cockroaches and OPC servers. The joke refers to the fact that OPC servers can be found anywhere and even though you can kill a bunch in a sweep, you can't kill them all.
The OPC foundation has made great efforts in addressing many security concerns and has developed a more security-oriented architecture, OPC Unified Architecture (OPC UA). The highlights of the OPC Unified Architecture are as follows:
- Functional equivalence, all COM OPC classic specifications are mapped to UA
- Platform independence, from an embedded micro-controller to a cloud-based infrastructure
- Secure encryption, authentication, and auditing
- Extensible, the ability to add new features without affecting existing applications
- Comprehensive information modeling for defining complex information
Building automation protocols allow communication between the parts of the control systems that run applications such as heating, ventilation, and air-conditioning. Protocols in use in this category include BACnet, C-Bus, Modbus, ZigBee, and Z-Wave.
From a security perspective, these protocols tend to be unencrypted and without integrity checking applied, which leaves them open to replay and manipulation attacks. What makes things particularly dangerous is that fact that most of the installed systems are connected to the internet or at least accessible via a modem for the vendor to supply remote support. Oftentimes, the authentication isn't very solid on the boundary of the system and breaking into it is a simple exercise. No other than tech giant Google had its building automation network breached by researchers back in 2013 (https://www.wired.com/2013/05/googles-control-system-hacked/). A breach of the building network system can be a direct entry into the rest of the network if the two are linked or they can give an attacker a means to enter the facility by providing the ability to open doors or disable alarm systems:
Can you remember when the last time the meter guy stopped at your home to take your meter reading? Lots of research and development have been put into creating more convenient ways of getting customer's meter readings from gas, electricity, cooling, and so on. The solutions range from a Radio Frequency (RF)-enabled meter that can be read by proximity to a city block covering radio mesh of smart meters, each solution with its own security challenges:
Protocols typically used for automatic meter reading include AMR, AMI, WiSmart (Wi-Fi), GSM, and Power Line Communication (PLC).
The following diagram shows where these protocols are typically found within the ICS architecture:
The enterprise zone network will see web traffic using the HTTP or HTTPS protocols, email in the form of IMAP, POP3, and SMTP, file transfer and sharing protocols such as FTP and SMB, and many others. All these protocols come with their own security challenges and vulnerabilities. If your ICS network (Industrial zone) and the business network (enterprise zone) use the same physical network, these vulnerabilities can directly affect your production system. Having a common network for business systems and production systems is an insecure practice that is seen all too often still. More on this topic will be discussed in a later chapter.
The enterprise zone is where a plant or facility is connected to the Internet, typically via a setup such as the one shown in the following figure:
- The enterprise network is typically connected to the internet via an Edge Router and some form of a modem that converts an ISP provided service such as a T1 or optical carrier (OC1) medium into an Ethernet that is used throughout the rest of the enterprise network. Dedicated firewalls will securely connect the business network to the ISP network by use of port blocking and traffic monitoring. A common practice for enterprise internet policies is to use a proxy firewall for outbound traffic while highly restricting incoming traffic. Any necessary publicly facing services would be guarded off with a Demilitarized Zone or DMZ.
- Typical services in the Enterprise DMZ are publicly facing web servers, the company's public DNS servers, and the like. The DMZ allows a landing area of public traffic. If a service in the DMZ where to get compromised, the compromise would be contained in the DMZ. Further pivoting attempts would be caught by the enterprise DMZ firewalls.
- The enterprise internal network consists of switches, routers, Layer 3 switches, and end devices such as servers and client computers. Most companies will segment their internal network by means of VLANs. Inter VLAN traffic needs to go through some sort of routing device, such as a Layer 3 switch, a firewall, or a router, at which point, there are Access Control Lists (ACLs) or firewall rules.
In recent years, the Industrial zone has seen a shift from using proprietary OT protocols such as PROFIBUS, DeviceNet, ControlNet, and Modbus to using common IT technologies such as Ethernet and the IP suite protocols. However, a few proprietary protocols and network media can still be found in the lower levels of the ICS systems. The figure below shows where some of these can be found in the ICS architecture, followed with a short description:
- A - Hardwired devices: These are the sensors, actuators, and other devices that use a discrete signal, such as 24 VDC or an analog signal such as 4-20 mA or 0-10 VDC, to operate. These devices are wired directly into a PLC add-on IO card or a remote communication rack with IO cards.
- B - Fieldbus protocols: These are mainly proprietary protocols such as DeviceNet, ControlNet, PROFIBUS, and Modbus and deliver real-time control and monitoring. These protocols can connect end devices such as sensors and actuators directly to a PLC without the need for an IO module. They can also be used to connect PLCs or connect a remote rack to a PLC. Most, if not all, fieldbus protocols are adopted to work on Ethernet and on top of IP.
- C - Nested Ethernet: Though it's not technically a different protocol, nesting Ethernet is a way to hide or obfuscate parts of the control network. They will only be visible by or through the device that they are connected to.
In this chapter, we went over what an Industrial control system is, what it does, and what parts make up an ICS. You learned about some of the common communication protocols and media used to interconnect the parts of an ICS. In the next chapter, we will start looking at some vulnerabilities and weaknesses of ICSes and, more specifically, the communication protocols in use.