We often hear about security, but very often we do not receive a clear definition of what this is, since it's taken for granted. Even if we know what security in general is, sometimes we can miss some pieces of what security means in that specific field. I, personally, like to use this definition of information security—preservation of confidentiality, integrity, and availability of information.
The ISO/IEC 27000:2009 affirms that "In addition, other properties, such as authenticity, accountability, nonrepudiation, and reliability can also be involved."
This highlights the fact that security is a very wide sector, including two very different realms:
Data protection from unauthorized access (confidentiality)
Data integrity and availability
Before we dive in the security realm, we need to look at some important concepts of security.
Access control is the selective restriction of access to some kind of resource (a folder, a file, and a device). There are different types of approaches to access control. The first one is Discretionary Access Control (DAC) in which every user can decide who can, with which permissions, read his/her files.
An example of this is the Unix permission system where, if you create a file, you can choose who will be able to read or change it.
An example of this is a public archive (that is, tax archive), where even if you are the creator of a document, you are not allowed to choose who is able to read it. Only the archive owner will be able to make such decisions.
An evolution of DAC and MAC is Role-based Access Control (RBAC). In RBAC, the permissions are not granted per user, but according to role. This allows big organizations to assign permission to roles and roles to users, making it easier to create, modify, or delete users.
Examples of this type of access controllers are pretty common in day-to-day life. A typical use of RBAC in real life is the authorized personnel only area, where usually all people with certain characteristics (that is, be it an employee of a specific company or be it the work for a specific department) are allowed to enter.
An evolution of RB and MAC is Multi Level Security (MLS). In MLS systems, each user has a trust level and each item has a confidentiality level. The administrator is still the one who is in charge of creating the security policies, as in MAC systems, but the system will ensure that each user will only see the items that have a confidentiality level allowed to him based on some system configurations and the user trust level.
As we have seen in the ISO 27000 definition, there are three words that are very important when speaking of security, Confidentiality, Integrity, and Availability. Even though many other models have been proposed over the years, the CIA model is still the one that is most used. Let's see the various parts of it.
Confidentiality is the first part of the CIA model and is usually the first thing that people consider when they think about security. Many models have been created to grant the confidentiality of information, but the most famous and used by far is the Bell-LaPadula model. Implementing this model means creating multiple levels in which the users are divided and allowing all users of the nth level to read all documents collocated at any level lower or equal to n and to write documents at any level higher or equal to n. This is often characterized by the phrase no read up, no write down.
A lot of security attacks try to break the confidentiality of the data, mainly because it is a very lucrative job. Today companies and governments are willing to pay thousands or even millions of dollars to get information about their competitor's future products or a rival nation's secrets.
One of the easiest ways to grant confidentiality is by using encryption. Encryption cannot solve all confidentiality problems, since we have to be sure that the keys to decrypt the data are not stored with the data; otherwise, the encryption is pointless. Encryption is not the solution to every problem, since encrypting a data set will decrease performances of any operation over it (read/write). Also, encryption brings a possible problem—if the encryption key is lost, this will lead to losing the access to the data set, so encryption can become a hazard to the availability of the data.
You can think of confidentiality as a chain. A chain is as strong as its weakest link. I believe this is one of the most important things to remember about confidentiality, because very often we do a lot of work and spend a lot of money hardening a specific part of the chain leaving other parts very weak, nullifying all our work and the money spent.
I once had a client who engineers and designs his products in a sector where the average expense for R&D of a single product is way beyond the million USD. When I met them, they were very concerned about the confidentiality of one of their not yet released products, since they believed that it involved several years of research and was more advanced than their competitor's projects. They knew that if one of their competitors could obtain that information, he would have been able to fill the gap in less than 6 months. The main focus of this company was the confidentiality of the data; therefore, we created a solution that was based on a single platform (hardware, software, and configurations) and with a limited replication to maximize its confidentiality, even reducing its availability. The data has been divided into four levels based on the importance, using for-the-sake-of-clarity names inspired by the US Department of Defense system, and for each level we assigned different kinds of requirements, additional to the authorization:
Public: All the information at this level was public for all, including people inside the company and outsiders, such as reporters. This information was something the company wanted to be public about. No security clearance or requirements were required.
Confidential: All information at this level was available to people working on the project. Mainly for manuals and generic documentation, such as user manuals, repairman manuals, and so on. People needed to be authorized by their manager.
Secret: All information at this level was available only to selected people working on the project and divided into multiple categories to fine grain permissions. This was used mainly for low-risk economical evaluations and noncritical blueprints. People needed to be authorized directly by the project manager and to use two factor authentications.
Top access control: The information at this level was available only to a handful of people working on the project and was divided into multiple categories to fine grain permissions. It was used for encryption keys and all critical blueprints and economical and legal evaluations. People needed to be authorized directly by the project manager to use three-factor authentications and to be in specific high-security parts of the building.
All the information was stored on a single cluster and encrypted backups that were made daily were shipped to three secure locations. As you can see, Top Secret data could not exit from the building if not heavily encrypted. This helped the company to keep their advantage over competitors.
By integrity we mean maintaining and assuring the accuracy and the consistency of the data during its entire lifecycle. The Biba integrity model is the most known integrity module and works exactly in the opposite way of the Bell-LaPadula model. In fact, it is characterized by the phrase no read down, no write up.
There are some attacks that are structured to destroy integrity. There are two possible reasons why a hacker would be interested in doing this:
A lot of data has legal value only if its integrity has been maintained for the entire life span of the data. An example of this is forensic evidence. So, an attacker could be interested in creating reasonable doubt on the integrity of the data to make it unusable.
Sometimes an attacker would like to change a small element of data that will affect future decisions that are based on that bit of data. An example can be an attacker who wants to edit the value of some stocks, so an automatic trading program would think that selling at a very low price would be a good idea. As soon as the automatic trading program does this transaction, the company (or bank) owning it would have lost a huge amount of money and will be very hard to trace back to the attacker.
An example of integrity is the Internet DNS service, which is a very critical service and has a core composed of a few clusters that have to grant integrity and availability. Availability is really important here because otherwise the Internet would be down for many users. However, its integrity is much more important, because otherwise an attacker could change a DNS value for a big website or a bank and create a perfectly undetectable phishing attack, also known as pharming, at a global scale. Each one of these clusters are managed by a different company or an organization, with different hardware, different software, and different configurations. Availability has been implemented using multiple hardware, software, and configurations to avoid the possibility of a faulty or hackable aspect that can bring down the whole system. Confidentiality is not the focus of this system since the DNS service does not contain any sensible data (or, at least, it shouldn't). Integrity is granted by a pyramidal system in which the top DNS (root DNS) is trusted by all other DNSes. Also, lately, all DNS programs are supporting encryption and untrustworthiness of unknown DNS servers to prevent DNS cache poison attacks, which have now become more frequent.
Availability simply means at any given moment, a document that should be available, has to be available. This means that no matter what has happened to your server, the main server farm, the data has to be available.
You can think of availability as a wire rope. A wire rope holds as long as at least one wire holds, so we can say that a wire rope is as strong as its strongest wire. Naturally, the lesser wires still in place, the more load they will have to carry, so they will be more susceptible to failures.
There is a type of attack that tries to reduce or put out availability, the Denial of Service attack. This family of attacks, also known as DoS or DDoS (if it's Distributed), has become very popular thanks to some groups such as Anonymous, and could create huge losses if the target system creates profits for the company. Also, often, these attacks are combined with attacks to steal the confidential information, since DoS attacks create a huge amount of traffic and could easily be used as a diversion.
In February 2014, CloudFlare, a big content delivery network and distributed DNS company, was attacked by a massive 400Gb/s DDoS attack that caused a huge slow down in CloudFlare services. This was the single biggest DDoS attack in history (until the end of 2014, when this book is being written). Lately, huge DDoS attacks are becoming more frequent. In fact from 2013 to 2014, DDoS attacks over 20Gb/s are doubled.
An interesting case I would like to relay here is the Feedly DDoS attack, which happened between July 10, 2014 and July 14, 2014. During this attack, Feedly servers had been attacked and a person, claiming to be the attacker, asked the company to pay some money to end the attack, which the Feedly company affirms not to have paid. I think this case gives us a lot to think about. Many companies are now moving towards a complete rely on computers, so new forms of extortion could become popular and you should start to think on how to defend yourself and your company.
Another type of DoS attack that is becoming more popular with the coming of public clouds, where you can virtually scale up your infrastructure unlimitedly is the Economic Denial of Sustainability (EDoS). In this kind of attack, the goal is not to max out the resources since that would be pretty difficult, but it is to make it economically unsustainable for the company under attack. This kind of attack could even be a persistent attack where the attacker increases a company cloud bill of 10-20 percent without creating any income for the company. In the long run, this could make a company fail.
As you can imagine, based on the CIA model, there is no way a system can meet 100 percent of the requirements, because confidentiality, availability, and integrity are in contradiction. For instance, to decrease the probability of a leak (also known as loss of confidentiality), we can decide to use a single platform (hardware, software, and configuration) to be able to spend 100 percent of our efforts towards the hardening of this single platform. However, to grant better availability we should try to create many different platforms, as different as possible, to be sure that at least one would survive the attack or failure. How can we handle this? We simply have to understand our system needs and design the perfect mix of the two. I will go over a real-life example here that will give you a better understanding of mixing and matching your resources to your needs.
Recently, I helped a client to figure out how to store files safely. The company was an international company owning more than 10 different buildings in as many countries. The company has had few unhappy situations that lead it to consider it to be more important to keep the data safe. Specifically, the following things happened in the previous months:
Many employees wanted to have an easy way to share their documents between their devices and with colleagues, so they often used unauthorized third-party services
Some employees had been stopped at security controls in airports and the airport security had copied their entire hard drive
Some employees had lost their phones, tablets, and computers full of company information
Some employees had reported data loss after their computer hard drive failed and the IT team had to replace it
An employee left the company revealing his passwords, locking the company out of his data
As often happens, companies decide to change their current system when multiple problems occurs, and they prefer to change to a solution that solves their problems altogether.
The solution we came up with was to create a multiregional cluster with Ceph, which provided the object storage we needed to put all the employer's data into. This allowed us to have multizone redundancy, which was necessary to grant availability. It also allowed us to create all backups in only two places instead of forcing us to have backups at all places. This increased the availability of backups and decreased their cost.
Also, client applications for computers, tablets, and phones have been created to allow the user to manage its files and automatically synchronize all files in the system. A nice feature of these clients is that they encrypt all the data with a password that is dynamically generated for each file and stored on another system (in a different data center) encrypted with the user GNU Privacy Guard (GPG) key. The user GPG key is also kept on an Hardware Security Module in a different Data Center to grant the company the possibility to decrypt a user's data if they leave. This granted a very high level of security and allowed to share a document between two or more colleagues.
To grant better security towards the loss or copy of computers, all company's computers have the hard drive completely encrypted with a key known only to the employer.
This solved all technical problems. To be sure that the people were trained enough to keep the system safe, the company decided to give a 5 days security course to all their employers and to add 1 day every year of mandatory security update course.
No further accidents happened in the company.
I have called this principle the Principle of Insecurity because I have not yet found a better name for it. This principle states that no matter what you do, who you are, and how much money you spend, you will never have a 100 percent secure environment.
An example of this happened on April 7, 2014, when a new version of OpenSSL was published with the announcement of the Heartbleed bug having been fixed. This bug allowed users to extract a memory (RAM) dump from any machines that were running unpatched versions of OpenSSL. OpenSSL was considered safe and therefore the majority of the companies worldwide have used it and embedded it in their products to the point that in April 2014 there was close to no alternative to it. But even if something is very standard and wide used, it does not mean it's 100 percent secure.
Something that is always important to remember when we speak about security, is that money is limited, and it is often hard to evaluate how much money we can spend on security. To evaluate how much money it makes sense to spend on security, a mathematical economic model called the Gordon-Loeb model was developed in 2002, which tells us that it makes sense to spend up to 37 percent of the expected losses that would occur from a security breach. This model is widely used and is a well-accepted analytical model in the economics of cyber/information security literature.
Security is a journey, not a destination. Security is always an ongoing process.
The Principle of Least Privilege (also known as the Principle of Minimal Privilege or the Principle of Least Authority) requires that any user, process, or system has all but only the permissions required to complete the assigned tasks. This is one of the most important principles on security and usually the one that is least considered.
I can write about many examples I have seen where the violation of this principle brought about very bad situations. Not very long ago, I saw a simple process that only needed to access (in read/write) one folder and to read from a database, wiping a machine and the multiple remote disks that were mounted in that moment, because the process was running as root instead of a limited user, as it should have.
What happened was that the process was removing all the files in a subdirectory with the bash command:
rm -rf $VAR/*
$VAR variable was set reading a field in the database. The database did not respond (because it was down) and therefore the variable was empty, allowing the process to run the following:
rm -rf /*
The Principle of Separation of Duties (also known as Principle of Segregation of Duties) requires that a complete task cannot be done by a person alone or that a person cannot perform all actions on a system. The basic idea of this principle is that completely trusting people could be unsafe for these reasons:
People can make mistakes
People can be malicious
People can be corrupted or threatened
People can take advantage of their position
This is always hard to accept for companies, but we have to face the fact that people are not perfect if we want to create a secure environment. The separation of duties (and powers, due to the Principle of Least Privilege) helps the people too, since they will be less prone to take advantage of their position and also they will be less attractive to those who wanted to bribe or threaten them.
A world-famous example of the consequences of failing to keep up with this principle is what happened at the National Security Agency (NSA) in 2013. On June 10, 2013 Edward Joseph Snowden, a private contractor working at NSA, leaked thousands of classified files from the NSA. This was possible because he was allowed to copy (and bring out of the facility) that data without the involvement of other people in the process.
People are often the weaker link of the security chain, so never underestimate people when thinking about security.
The Principle of Internal Security requires that a system is defended by multiple layers of security, each one protecting it from a particular type of attack. Often this principle is stated not as a principle but as a technique with the name Defense in depth and Castle Approach. Data center designers should study a castle's fortification structure, since castles are very good examples of this principle. Very often, I see data centers with only one level of security and once you are able to violate it, you are free to go wherever you want. Castles, on the other hand, have multiple layers of security and even when you pass a security layer, you are still being watched. Also, the defenders in the towers will have a better spot than you because they are in enhanced security facilities, and there are no blind spots where you can hide.
Putting in multiple (different) security layers
Monitoring in and around the security area, leaving no blind spot
Training your people to react immediately to breaches
Don't create strict reaction schemas, because if leaked, these could be used against you
If breaches occur, study them and study countermeasures
Run frequent tests to be sure all systems are active and your people are ready to react
IT security is as much about limiting the damage from breaches as it is about preventing them.
Let's start with some things to remember when we design or verify the compliance of a data center. Very often, data centers are reused over the years for different kinds of data, so it's critically important to check every time that the data center is able to deliver enough security for the kind of data we are putting into it. Also, if we are designing a brand new data center, it would make sense to create it more secure than would suit the current needs (if it makes sense to spend the budget this way), so in the future it will be able to house more data without major work.
Many things that are very cheap or come free when you build something could become very expensive to fix later.
When I have to give my opinion on the location of a data center, I always try to consider any possible disaster that could happen in that location. For this reason, I strongly suggest to never build a data center on areas with high risk of earthquakes, floods, tornadoes, avalanches, or any other natural disaster you can think of. Also, I would suggest avoiding places where accidents can happen, such as places close to airports, highways, dangerous curved roads, power plants, oil refineries, chemical facilities, ammunition factories, and so on. These things are very important for the availability aspect of the CIA model, since those events could destroy your data center and will cause huge economical losses for the company as well as huge data loss. Also, those kind of places are often more expensive to protect with insurance, since they are more dangerous.
First of all, we need a fence (or wall); this will be our first line of defense. This fence has to have one or two entry points (having more would cost much more and would not be very useful). Each of these entry points have to be guarded and have some hard security measures, such as retractable crash barriers. A bomb detector system could be put in place at any entrance if it is a possible risk.
The second line of defense should be a buffer zone between your facility and the fence. This area could be small (10 meters) or very big (100 meters) based on the facility needing security, the country you are building in, and your budget. This buffer zone has to be completely free, should offer no blind spot, and should be under complete surveillance. This will allow security to spot any attempt to bypass our fence. In case of fire, it will also prevent the fire from moving from your facility, to outside and from outside, to your facility and can be used as an assembly point. A parking space can be housed in this area, if it's distant enough from the building and placed in a way that does not confuse the security personnel.
The third line of defense will be the walls of our building. I usually consider the area delimited by this line of defense as the secure zone. Thick concrete walls are cheap and effective barriers against explosive devices and the elements. There are other materials that grant you a better level of security, but can be far more costly. This wall should have the least amount of openings. One or two accesses will be enough. Those accesses have to be guarded, and need surveillance cameras. Windows are not needed, and are usually dangerous. Fire doors have to be exit only, so install doors that do not have handles on the outside. Also, when any of these doors are opened, a loud alarm should sound and trigger a response from the security command center.
A fourth line of defense should be in place inside the building. This area will be designated as high security zone. This allows a third level of authorization, reducing the possibilities of unauthorized access. In this area, no food or liquids should be allowed.
A fifth line of defense could be in place, with another authorization point segmenting the server floor in multiple areas, where only people that have reasons to be in that particular area should be allowed to enter (for Principle of Least Privilege).
As you can see, a lot of authorization points have to be put in place. How can we make an authorization point secure? By deploying man traps, we can use multifactor authentication. These measures can be used in one or more authorization points. Remember that all authorization points should be filmed and all accesses should be logged (in and out) for the record and make sure to check whether everyone left the building in case of an emergency or if there are people still trapped inside it.
Even if a data center is more about computers than humans, people will have to be present in the data center for server maintenance, maintenance of the building, and security reasons. Make sure their life and health is always safe by providing safe places for them to stay and which give them a sense of security. Another thing that could be useful is a system that allows you to recirculate air rather than drawing in air from the outside. This could help protect people and equipment if there was some kind of biological or chemical attack or heavy smoke spreading from a nearby fire. For added security, it is possible to put devices in place to monitor the air for chemical, biological, or radiological contaminants.
A data center has multiple support systems that have to be secured properly, such as power systems, air conditioner, etc. These systems should stay inside the secure zone or could have their own secure zone (another building within the buffer zone). Always remember that some of these systems can be dangerous themselves, so there has to be protection between them and the servers.
My father always says, "never let the thieves think you have something to steal"; this is a suggestion I always give my clients. If you start telling people that at this location you have a data center (or if you even paint on walls, like "[Company XYZ] Data Center"), don't be surprised if some thief comes to take a look.
Consider that you may put unworthy completely encrypted data in the data center, but the thieves will not know what data there is until they steal and analyze one or more disks. Furthermore, they might be interested in the servers themselves—even if bringing out hundreds of racks is not easy, they might be worth millions of dollars on the market.
Have you noticed how much attention the big companies (such as Amazon, Facebook, and Google) put on this? They do not allow people in their data centers unless they are invited. Some of these data centers have been filmed to create documentaries, but even those documentaries do not provide enough information on the data center's location and its security measures, so as to be sure that no one is too attracted by their data centers. Also, very often, the people who are not directly involved in the data center, will not know its exact position.
A hedge or some trees (outside the first fence zone) could help prevent curious people snooping on your site. Also, this prevents people seeing our security measures, this will decrease the probability of being the subject of casual attacks.
"Never let thieves think you have something to steal."
Use high-end hardware that is failure proof
The high-end hardware is usually very expensive, includes redundancy, and is not as failure proof as it's usually sold as. Today, companies usually prefer redundancy of common hardware because it is cheaper, is able to grant better availability, and is easier to deploy and maintain.
When I was starting in the IT field, it was not really clear to me which degree of redundancy was right and which was not. Luckily for me, after a few months of field work, I have had a very interesting conversation about this with a senior technician which explained to me very clearly:
"A system has enough redundancy if I can unplug and replug all cables, one cable at a time, and no user complains."
I have already said this about some specific areas, but it's true for all areas. There should be no blind spot in the camera system and each camera should be in the visual field of at least one other camera.
Also, the recording should be kept in case of a break in, in order to be analyzed to prevent the success of future attempts using the same method.
The legend goes that the pharaohs of Egypt killed the pyramid architects to be sure that the blueprint remained a secret. No matter whether this is true or not, the concept that this legend underlines is surely true: the pharaohs did not want the blueprints of their pyramid in the hands of the thieves.
The same thing should be done by companies too. Inviting visitors to see the high level of security can be counterproductive because an observant visitor could spot some security flaws. Also, this removes the surprise aspect. In fact, if the attacker passed the second layer of security and has no idea about how many other levels there could be, he might be less willing to go forward. Furthermore, it could happen that you are able to open the first door of a man-trap (That is because he stole a badge) but you could fail the biometrical authentication needed to open the second door because you were not expecting it, resulting in a locked man-trap with no possibility to exit.
Often, people ask me what I think about dedicating a room in the office as a data-center. I believe this kind of approach is less safe even if it is well implemented, and very often it is also implemented poorly from a security stand point. I can understand that sometimes the need for security is way less than the one provided by a dedicated facility (always remember the Gordon-Loeb model). In these cases, I strongly suggest to implement it as best as possible and to extend some security policies for the whole building.
Often, I have seen data centers in offices implemented as racks in the CTO office, or even as racks in the lobby. Do not do this, as they will make any other efforts to secure your environment useless and a huge waste of resources.
An example of a good implementation of a data center in an office will be:
An hedge to protect the propriety
A fence (with guarded entrance)
The parking lot
A 10 meters buffer zone
A building (with guarded entrance)
A secure zone that can be accessed by employees and escorted visitors (with man-trap access)
A secure elevator requiring an authorized badge to go to the data center floor (this will be the high security zone)
A man-trap entrance to the data center with multifactor authentication
Eventual doors in the data center for granular access
This way you are able to keep multiple authorization points without having to use a different facility. This is still less secure than a dedicated facility, but can be a good balance between security and cost. Also, this will make the whole office more secure.
As we have already mentioned in the preceding paragraphs, we can split the servers with secure doors for more granular access. Why should we do this? Isn't it enough to be sure that all people entering the data center are authorized? Very often this is not enough because all the people who are authorized to enter in the data center will be allowed to touch every single device in it so we are still not compliant with the Principle of Least Privilege.
Some companies solve this problem with a locked rack, while others resolve it with segmented data centers, or even with both approaches. Both the approaches have ups and downs, for instance, you might prefer a segmented data center approach because:
Rack doors are often uncomfortable and require a wider aisle
Open racks have a better air flow than locked racks (this is not always true)
Open racks are way cheaper than locked racks
Less flexible (the person has or has not access to multiple racks)
Walls and doors have to be placed during the data center construction and cannot be moved later
A combined solution can solve some of these disadvantages. Another mixed option is the locking cages, which are easier to install than walls but are often easier to break in.
To implement more, the Separation of Duties principle is possible to require two authorized people to be present at the same time to unlock a door or it could require a badge of type A for unlocking the doors in the data center and a badge of type B to unlock the racks.
Often my clients ask me what they should log and what they should not log. My usual answer is. "What would you like to know if an accident or a data leak would have just happened?" I think this is the whole point, you have to think in the various scenarios which kind of data you would like to have and then start collecting them immediately. The same answer is valid for "For how long should I keep this log?"
The importance of logs is that those are the only traces that can help you to understand what exactly happened and why.
Files on filesystems
Files on SAN or other replicated infrastructure
Lines in a relational and or transactional database
Lines in a NoSQL database
The first option seems very good because hard drives are pretty cheap and you only need a server with a lot of hard drives to make it work. The downsides of this option are multiple:
Scalability: How will you handle the case in which all your drives will be full?
Read performances: How much time will you need to scan all your logs? (consider that data center grade hard drive usually can read between 100MB/s and 200MB/s)
Usability: How will you find the exact data you need?
The second option does solve the first two disadvantages of the first option, but still has the usability issue and can be very costly.
The third option does solve the usability problem, but based on the fact that you have one or more nodes, can show the unreliability and the read performances problems. No matter how you design the node or cluster, you will have huge scalability problems and also some constraints created by the rigid structure of tables.
The last option does solve all problems in my opinion. Even if technically speaking it is a very good option, it will bring some aspects to be considered:
You will need someone with NoSQL/Big Data experience
You will have a high initial cost because NoSQL databases usually need more than three nodes to create a cluster.
While speaking of OpenStack, the best option to store log is OpenStack Data Processing Service (Sahara), since it's a part of OpenStack since October 2014.
The more information you log and with more details, the harder is it to store them and retrieve them. In fact, if you only store one type of data (for example, the time and person that is logging in a machine), you will probably have a few megabytes of data every month and; therefore, it will be very easy to put it in a relational database (such as MariaDB or PostgreSQL) that you already have in place. This is also possible because we have only one kind of data; you can know exactly how each log entry will be presented to your log system. When you start logging thousands of lines per hour, coming from tens or hundreds of sources, and with tens of different formats, the NoSQL storage seems to be the only one that works.
Door access (both entering and exiting)
Server access (SSH, Database, and so on)
All servers logs
Data center environmental metrics (temperature, humidity, and so on)
It's really important that a considered decision is made here to ensure that you have all the logs you need, but on the other hand you will not save a huge amount of logs that you will never use.
Another important thing to decide is for how long to keep the logs. Some countries have specific laws for the minimum time to keep some kinds of logs, while other do not. In my opinion, it depends a lot from company to company, but I usually suggest keeping them for at least 1 year.
A whole year seems to be a lot of time, but it's not; it's the very minimum in my opinion. This is because if you suspect that a person lately is behaving strangely, you will want to look the logs for at least one year to confirm a pattern or a change of pattern.
The best option of all is to keep logs indefinitely, so that you can really go back in the past and have full information about the past.
I have seen, in my life, many more security problems caused by humans than machines. With the people aspect of security I mean all human actions that can increase or decrease security. Humans are in the vast majority of company processes, and can often be the weak link of the chain in multiple occasions, such as in the following examples:
A system administrator disables a firewall (or allows all by default) to speed up a process
A system administrator sends a PEM certificate/PGP private key by e-mail
A user creates a weak password to remember it better
A user writes his password on a piece of paper stitched to the monitor
A user gives his password to a colleague via his phone
As you can see, there are some actions that are committed by system administrators, while others are committed by users, but at the end of the day, they can have a huge impact no matter who committed it. Some of these actions can be prevented using automatic systems, such as using a password grader before accepting a password. Some other actions can be prevented only informing your users and system administrators and teaching them to act properly for your company security and their own.
Lack of information
Malicious actions under threats
Malicious actions for own advantage
A user creates a weak password to remember it better
A user writes his password on a piece of paper stitched to the monitor
A user gives his password to a colleague via his phone
As you can see, I have listed only user actions, because very often those errors are committed by users, not system administrators. The good news for you is that these kinds of errors are usually easy to spot, fix, and prevent.
No dictionary word
At least one uppercase letter, one lowercase letter, one number and a special character, excluding '!', '#', '@', '&', and '$' which are the most common special characters
At least thirteen character-long passwords
Using these three rules, we removed the weak passwords problem, reaching 80 bits of entropy on each password.
To be sure that the people followed the instructions given during the password course to manage the passwords, we identified a few people in the company who were most successful during the course, to help out with looking for colleagues that were handling the passwords unsafely. Those people caught handling unsafe passwords were signed up for another course (4 hours, this time), which was more focused on giving the reasons as to why people should follow the rules, rather than simply teaching them the rules (that were already discussed in the previous course).
As for password sharing and other similar practices, a system has been put in place to be sure that no more than an IP could use a certain username and password at a given moment in time. If more than a user did connect, the account was locked automatically and the user (owner of the account) had to call the IT department directly to ask them to unlock his account. In a few months, these kind of actions will no longer happen. We did not solve the password over telephone problem directly (because is not possible to enforce this kind of rule, unless there is someone listening for all phone calls, which is pretty impossible), but we have made it pretty noticeable by the IT department.
People are lazy and will try to use any possible shortcut that they can think of. I know this is a huge generalization, but it's true more often than not. If you ask people to do a complex process and they see the possibility of having similar results with a much simpler process, the majority of them will use the simpler process and this is more true, when the same person has to do the same process multiple times.
How can you defend your company from this? The first thing to do is to keep the processes as simple as possible, so that people have less advantages to take a shortcut. The second thing to do is to inform all the people that are part of each process the reasons why that process is done in that way and what can be the consequences of a different process.
To explain this at best, I'd like to bring you a very famous example from a different field, aviation. British Airways Flight 5390 became famous because on June 10, 1990, since an windscreen blew due to a panel that was improperly installed. In the process, the captain of the plain, Tim Lancaster, was ejected halfway out of the aircraft. The body of the captain (still alive) was firmly pressed against the window frame where it stayed until the first officer managed to perform an emergency landing in Southampton with no loss of life.
The reason this accident is of such importance is that it shows what can happen when enough information about a process is given to the people who are executing that process. In this case, the problem was that in a replacement done few hours before the flight, the windscreen had been changed and wrong bolts were used. In fact, 84 of the 90 windscreen retention bolts were 0.026 inches (0.66 mm), which is too short in diameter, while the remaining six were 0.1 inches (2.5 mm), too short. This has been possible because the operator that changed the windscreen used a like for like method to select the new bolts, instead of looking up on the maintenance documentation, even if this would have been the right procedure following the official British Airways policies, which required referencing to the maintenance documentation for each component that is being replaced on the planes.
Three out of the five recommendations of the Civil Aviation Authority following this accident, aimed to improve the probability of the right execution of the procedures by the people, mainly through training, and testing including the possible consequences of shortcuts during the processes. The remaining two recommendations were about examining the continued viability of self-certification with regards to safety critical tasks on aircraft and about recognizing the need for the use of corrective glasses, if prescribed, in association with aircraft engineering tasks.
Human error implies that the person doing the action knows what he/she should do, but does it differently because there are external factors acting on them, such as pressure or tiredness.
I have not seen a single office in my life that was not susceptible to pressure or tiredness—obviously a good management can help, but cannot prevent it. What you can do is document everything when you are calm and rested, so when pressure or tiredness grow, it is possible to follow the documentation.
I have seen this in multiple companies' IT departments with no documentation. I know this is pretty common (at least in south Europe) because multiple colleagues of mine have told me that they have had similar experiences. I do remember a specific case in which I went to a company to create an active-active cluster.
In my experience, the presence of documentation for certain procedures creates less pressure on the executors; therefore, the simple fact of having a procedure can decrease one of the cause of errors.
After a few days in the job, the main MySQL database went down and the manager asked me to fix it. After a little bit of analysis, I had in place a workaround promoting the slave to the master, so that the company was able to work again. This was obviously a dirty workaround that had to be fixed very soon. So, after working for hours, when it was safe to shut down the system for enough time, we created a new slave to restore the initial situation. I have asked the manager if this ever happened before and how they fixed it the previous times. He responded saying that it already happened few times, but the person who fixed it the previous time left the company months ago leaving no documentation, since the company never forced him to write it. Having all data on a SAN, we chose to do a SAN copy to improve the speed of the recovery. The result has been a huge mess with doubled LVM IDs that required more than 2 hours to be cleared.
Obviously, I cannot blame the previous technician for the LVM issue, but if he/she would have written a documentation for that procedure, we would have followed it without creating the mess, considering that all that mess happened because a single LVM command had been forgotten planning the work.
As we have seen, human errors and shortcuts are often caused by a lack of information. Sometimes, the lack of information does not result in human errors or shortcuts, but ends up in disasters because the person that is doing the procedure does not know something relevant to the procedure, or has no real idea about the environment it is working on.
The solution is to create the documentation and to update it constantly. Obviously, it is important to read all the documentation too. In my experience, it is really important to have a good tool for documentation. Some companies use Word documents or similar kind of programs. I think this is wrong for mainly the following four reasons:
It's not possible, or very hard, to link each document or section. Every time a system or procedure is mentioned, it will be linked. Each system should have a page with all procedures and configuration linked, and vice versa.
It requires specific software or other kind of not-so-friendly interfaces (such as Google Drive)
It does not support (or supports small) versioning
I think the best way to provide documentation is with a wiki installation or a Git repository containing human readable documentation in a markdown or a similar format. If you go for the Git repository option, remember to export them in HTML too, to be more accessible. In either case, remember to backup your documentation frequently because it's a very important asset.
"The information security industry defines social engineering as an attack that breaches an organization's security defenses by manipulating people and the human tendency to trust."—SysAdmin Audit Networking and Security Institute (SANS Institute)
Humans are in pretty much all processes or can enter into them if they feel the urgency to do so. Humans, also, are very often the weakest link of a security chain since they are flexible, while computers are not.
Humans are flexible and usually try to meet other people's expectations, often accepting a rule violation to do so.
Today, it's possible to create a secure system for a small amount of money that will require multiple times more money to break into it. This is the reason why attackers use people inside the company to drastically reduce the amount of effort needed to break into the system. The majority of times, the attacker exploits the employee's willingness to meet the other person's expectation to get the information they need.
Lately, social engineering has been split into tens of fields based on the vehicle of attack and the goal. We will not go deeper in this topic at the moment.
I would like to bring you an example of social engineering I did, because I didn't believe what happened would have been so easy in that company.
I was placed in a big company to help them increase their security. The manager was willing to undertake a lot of actions in this sense, but thought that social engineering was only a commercial thing used by sellers to sell more useless services and therefore was not willing to implement any social engineering countermeasure. To demonstrate to him the importance of social engineering countermeasures, I pulled out my phone, and called the company front desk hiding my number. I informed the person who responded that I had problems with an invoice calculation, and therefore had to speak with someone in the accounting department. Soon after a person of the accounting department responded. I informed him that I was calling from Microsoft helpdesk and that I had to do some tests with him due to a new update that has been rolled out that morning. The man was really happy about my call because he also had a problem with a scanner that was not able to make it work properly. I said that a part of the procedure required his company password and a lot of other data to verify that everything worked. The incredible part was that he gave me all the information without doubting my intentions. While I was on the phone, the manager was shocked that an employee had shared so much information with an unknown person over the phone.
Sometimes the attacker is not able to circumvent anyone in the company, so he/she might want to identify a person who has enough clearance and is easy to threaten to obtain what we are looking for. In movies, usually, the villain kidnaps a person from the hero's family to obtain what he is looking for. Luckily, in reality, this is not common and usually the threats are much smaller, but still work for the attacker's purpose.
My point of view is that it is really important that the company works to ensure their employees work in a secure environment mainly limiting their powers. Mistreating someone is very dangerous and legally speaking very bad in the majority of countries; therefore, the attacker would like to get a single person with enough power, but if there are no people with this power, an attacker could try different approaches to the problem leaving aside the employees.
Sometimes people commit evil actions and you have to be prepared for this. This kind of inside attack is usually very dangerous because they will be able to ask for favors from their colleagues with legitimacy. If they do not have direct access to the resource they need, they can use social engineering but using their real credentials to gain more trust and to be able to ask for bigger favors or more confidential information.
For this reason, you have to segment the process and have very strict rules that don't allow a person to know more than they are meant to know. Also, it is important to inform your employees and make them aware of this kind of risk.
In this chapter, we have seen an introduction to security as well as a number of best practices to use. These best practices will help you to have a safer environment.
Often, people focus so strongly on securing a system from a specific kind of attack that the system seems inexpugnable from that point of view, but they forget to secure the system from other prospective too, making worthless or their work.
In the next chapter, we will dive into some security challenges you may be facing and into the OpenStack structure.