Home Cloud & Networking VMware vRealize Operations Performance and Capacity Management

VMware vRealize Operations Performance and Capacity Management

By Iwan 'e1' Rahabok
books-svg-icon Book
Subscription FREE
eBook + Subscription $15.99
eBook $28.99
Print + eBook $48.99
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
Subscription FREE
eBook + Subscription $15.99
eBook $28.99
Print + eBook $48.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Virtual Data Center – It's Not a Physical Data Center, Virtualized
About this book
Publication date:
December 2014
Publisher
Packt
Pages
276
ISBN
9781783551682

 

Chapter 1. Virtual Data Center – It's Not a Physical Data Center, Virtualized

In this chapter, we will dive into why seemingly simple technology, a hypervisor and its management console, have a large ramification for the IT industry. In fact, it is turning a lot of things upside down and breaking down silos that have existed for decades in large IT organizations. We will cover the following topics:

  • Why virtualization is not what we think it is

  • A comparison between a physical server and a Virtual Machine (VM)

  • What exactly is a Software-Defined Data Center?

  • A comparison between a physical data center and a virtual data center

  • The impact on how we manage a data center once it is virtualized

 

Our journey into the virtual world


The change caused by virtualization is much larger than the changes brought forward by previous technologies. In the past two or more decades, we transitioned from mainframes to the client/server-based model to the web-based model. These are commonly agreed upon as the main evolution in IT architecture. However, all of these are just technological changes. It changes the architecture, yes, but it does not change the operation in a fundamental way. Both the client-server and web shifts did not talk about the "journey". There was no journey to the client-server based model. However, with virtualization, we talk about the virtualization journey. It is a journey because the changes are massive and involve a lot of people.

Gartner correctly predicted the impact of virtualization in 2007 (http://www.gartner.com/newsroom/id/505040). More than 7 years later we are still in the midst of the journey. Proving how pervasive the change is, here is the following summary on the article from Gartner:

"Virtualization will be the most impactful trend in infrastructure and operations through 2010, changing:

  • How you plan

  • How, what and when you buy

  • How and how quickly you deploy

  • How you manage

  • How you charge

  • Technology, process, culture"

Notice how Gartner talks about change in culture. So, virtualization has a cultural impact too. In fact, I think if your virtualization journey is not fast enough, look at your organization's structure and culture. Have you broken the silos? Do you empower your people to take risk and do things that have never been done before? Are you willing to flatten the organization chart?

So why exactly is virtualization causing such a fundamental shift? To understand this, we need to go back to the very basics, which is what exactly virtualization is. It's pretty common that Chief Information Officers (CIOs) have a misconception about what this is.

Take a look at the following comments. Have you seen them in your organization?

  • "VM is just Physical Machine virtualized. Even VMware said the Guest OS is not aware it's virtualized and it does not run differently."

  • "It is still about monitoring CPU, RAM, Disk, Network. No difference."

  • "It is a technology change. Our management process does not have to change."

  • "All of these VMs must still feed into our main Enterprise IT Management system. This is how we have run our business for decades and it works."

If only life was that simple, we would all be 100 percent virtualized and have no headaches! Virtualization has been around for years, and yet most organizations have not mastered it.

 

Not all "virtualizations" are equal


There are plenty of misconceptions about the topic of virtualization, especially among nontechnical IT folk. The CIOs who have not felt the strategic impact of virtualization (be it a good or a bad experience) tend to carry this misconceptions. Although virtualization looks similar on the cover to a physical world, it is completely re-architected under the hood.

So let's take a look at the first misconceptions: what exactly is virtualization? Because it is an industry trend, virtualization is often generalized to include other technologies that are not virtualized. This is a typical strategy by IT vendors who have similar technology. A popular technology often branded under virtualization is Partitioning; once it is parked under the umbrella of virtualization, both should be managed in the same way. Since both are actually different, customers who try to manage both with a single piece of management software struggle to do well.

Partitioning and virtualization are two very different architectures in computer engineering, resulting in major differences in functionalities. They are shown in the following figure:

Virtualization versus Partitioning

With partitioning, there is no hypervisor that virtualizes the underlying hardware. There is no software layer separating the Virtual Machine (VM) and the physical motherboard. There is, in fact, no VM. This is why some technical manuals in the partitioning technology do not even use the term VM. The manuals use the term domain or partition instead.

There are two variants in the partitioning technology, the hardware level and the OS level, which are covered in the following bullet points:

  • In the hardware-level partitioning, each partition runs directly on the hardware. It is not virtualized. This is why it is more scalable and has less of a performance hit. Because it is not virtualized, it has to have an awareness of the underlying hardware. As a result, it is not fully portable. You cannot move the partition from one hardware model to another. The hardware has to be built for a purpose to support that specific version of the partition. The partitioned OS still needs all the hardware drivers and will not work on other hardware if the compatibility matrix does not match. As a result, even the version of the OS matters, as it is just like the physical server.

  • In the OS partitioning, there is a parent OS that runs directly on the server motherboard. This OS then creates an OS partition, where other "OS" can run. I use the double quotes as it is not exactly the full OS that runs inside that partition. The OS has to be modified and qualified to be able to run as a Zone or Container. Because of this, application compatibility is affected. This is very different to a VM, where there is no application compatibility issue as the hypervisor is transparent to the Guest OS.

We covered the difference from an engineering point of view. However, does it translate into different data center architecture and operations? Take availability, for example. With virtualization, all VMs become protected by HA (High Availability)—100 percent protection and that too done without VM awareness. Nothing needs to be done at the VM layer, no shared or quorum disk and no heartbeat network. With partitioning, the protection has to be configured manually, one by one for each LPAR or LDOM. The underlying platform does not provide that. With virtualization, you can even go beyond five 9s and move to 100 percent with Fault Tolerance. This is not possible in the partitioning approach as there is no hypervisor that replays the CPU instructions. Also, because it is virtualized and transparent to the VM, you can turn on and off the Fault Tolerance capability on demand. Fault tolerance is all defined in the software.

Another area of difference between partitioning and virtualization is Disaster Recovery (DR). With the partitioning technology, the DR site requires another instance to protect the production instance. It is a different instance, with its own OS image, hostname, and IP address. Yes, we can do a SAN boot, but that means another LUN is required to manage, zone, replicate, and so on. DR is not scalable to thousands of servers. To make it scalable, it has to be simpler. Compared to partitioning, virtualization takes a very different approach. The entire VM fits inside a folder; it becomes like a document and we migrate the entire folder as if the folder is are one object. This is what vSphere Replication in Site Recovery Manager does. It does a replication per VM; no need to worry about SAN boot. The entire DR exercise, which can cover thousands of virtual servers, is completely automated and with audit logs automatically generated. Many large enterprises have automated their DR with virtualization. There is probably no company that has automated DR for their entire LPAR or LDOM.

I'm not saying partitioning is an inferior technology. Every technology has its advantages and disadvantages, and addresses different use cases. Before I joined VMware, I was a Sun Microsystems SE for five years, so I'm aware of the benefit of UNIX partitioning. I'm just trying to dispel the myth that partitioning equals virtualization.

As both technologies evolve, the gaps get wider. As a result, managing a partition is different than managing a VM. Be careful when opting for a management solution that claims to manage both. You will probably end up with the most common denominator.

 

Virtual Machine – it is not what you think!


VM is not just a physical server virtualized. Yes, there is a P2V process. However, once it is virtualized, it takes on a new shape. That shape has many new and changed properties, and some old properties are no longer applicable or available. My apologies if the following is not the best analogy:

"We P2V the soul, not the body."

On the surface, a VM looks like a physical server. So let's actually look at the VM property. The following screenshot shows a VM setting in vSphere 5.5. It looks familiar as it has a CPU, Memory, Hard disk, Network adapter, and so on. However, look at it closely. Do you see any property that you don't usually see in a physical server?

VM property in vSphere 5.5

Let me highlight some of the properties that do not exist in a physical server. I'll focus on those properties that have an impact on management, as management is the topic of this book.

At the top of the dialog box, there are four tabs:

  • Virtual Hardware

  • VM Options

  • SDRS Rules

  • vApp Options

The Virtual Hardware tab is the only tab that has similar properties to a physical server. The other three tabs do not have their equivalent server. For example, SDRS Rules pertains to Storage DRS. That means the VM storage can be automatically moved by vCenter. Its location in the data center is not static. This includes the drive where the OS resides (the C:\ drive in Windows). This directly impacts your server management tool. It has to have awareness of Storage DRS, and can no longer assume that a VM is always located in the same datastore or LUN. Compare this with the physical server. Its OS typically resides on a local disk, which is part of the physical server. You don't want your physical server's OS drive being moved around in a data center, do you?

In the Virtual Hardware tab, notice the New device option at the bottom of the screen. Yes, you can add devices, some of them on the fly while Windows or Linux is running. All the VM's devices are defined in the software. This is a major difference to the physical server, where the physical hardware defines it and you cannot change it. With virtualization, you can have the ESXi host with two sockets but the VM has five sockets. Your server management tool needs to be aware of this and recognize that the new Configuration Management Database (CMDB) is now vCenter.

The next screenshot shows a bit more detail. I've expanded the CPU device. Again, what do you see that does not exist in a physical server?

VM CPU and Network property tab in vSphere 5.5

Let me highlight some of the options. Look at Reservation, Limit, and Shares. None of them exist in a physical server, as a physical server is standalone by default. It does not share any resource on the motherboard (CPU and RAM) with another server. With these three levers, you can perform Quality of Service (QoS) in a virtual data center. Another point: QoS is actually built into the platform. This has an impact on management, as the platform is able to do some of the management by itself. There is no need to get another console to do what the platform provides you out of the box.

Other properties in the previous screenshot, such as Hardware virtualization, Performance counters, HT Sharing, and CPU/MMU Virtualization also do not exist in the physical server. It is beyond the scope of this book to explain every feature, and there are many blogs and technical papers freely available on the Internet that explain them. Some of my favorites are http://blogs.vmware.com/performance/ and http://www.vmware.com/vmtn/resources/.

The next screenshot shows the VM Options tab. Again, what properties do you see that do not exist in a physical server?

VM Options tab in vSphere 5.5

I'd like to highlight a few of the properties present in the VM Options tab. The VMware Tools property is a key and highly recommended component. It provides you with drivers and improves manageability. The VMware Tools property is not present in a physical server. A physical server has drivers but none of them are from VMware. A VM, however, is different. Its motherboard (virtual motherboard, naturally) is defined and supplied by VMware. Hence, the drivers are supplied by VMware. VMware Tools is the mechanism to supply those drivers. VMware Tools comes in different versions. So now you need to be aware of VMware Tools and it becomes something you need to manage.

I've just covered a few VM properties from the VM setting dialog box. There are literally hundreds of properties in VM that do not exist in the physical world. Even the same properties are implemented differently. For example, although vSphere supports N_Port ID Virtualization (NPIV), the Guest OS does not see the World Wide Name (WWN). This means the data center management tools have to be aware of the specific implementation by vSphere. And these properties change with every vSphere release. Notice the sentence right at the bottom. It says Compatibility: ESXi 5.5 and later (VM version 10). This is your VM motherboard. It has dependency on the ESXi version and yes, this becomes another new thing to manage too.

Every vSphere release typically adds new properties too, making a VM more manageable than a physical machine, and differentiating a VM further than a physical server.

Hopefully, I've driven home the point that a VM is very different from a physical server. I'll now list the differences from the management point of view. The following table shows the differences that impact how you manage your infrastructure. Let's begin with the core properties:

Property

Physical server

Virtual machine

BIOS

A unique BIOS for every brand and model. Even the same model (for example, HP DL 380 Generation 7) can have multiple versions of BIOS.

BIOS needs updates and management, often with physical access to a data center. This requires downtime.

This is standardized in a VM. There is only one type, which is the VMware motherboard. This is independent from the ESXi motherboard.

VM BIOS needs far less updates and management. The inventory management system no longer needs the BIOS management module.

Virtual HW

Not applicable

This is a new layer below BIOS.

It needs an update on every vSphere release. A data center management system needs to be aware of this as it requires a deep knowledge of vSphere. For example, to upgrade the Virtual Hardware, the VM has to be in the power-off stage.

Drivers

Many drivers are loaded and bundled with the OS.

Need to manage all of these drivers. This is a big area in the physical world, as they vary from model to model and brand to brand. The management tool has rich functionalities, such as checking compatibility, rolling out drivers, rolling back if there is an issue, and so on.

Almost no drivers are loaded with the OS; some drivers are replaced by VMware Tools.

VMware Tools is the new driver, replacing all other drivers. Even with NPIV, the VM does not need the FC HBA driver. VMware Tools needs to be managed, with vCenter being the most common management tool.

Hardware upgrade

It is done offline and is complex.

OS reinstallation and updates are required, hence it is a complex project in the physical world. Sometimes, a hardware upgrade is not even possible without upgrading the application. Virtualization decouples the application from hardware dependency.

It is done online and is simple.

A VM can be upgraded from a 5-year-old hardware to a new one, moving from the local SCSI disk to 10 Gb FCoE, from dual core to a 15-core CPU. So yes, MS-DOS can run on 10 Gb FCoE accessing SSD storage via the PCIe lane. You just need to perform vMotion to the new hardware. As a result, the operation is drastically simplified.

In the preceding table, we compared the core properties of a physical server with a VM. Let's now compare the surrounding properties. The difference is also striking when we compare the area related to the physical server or VM:

Property

Physical server

Virtual machine

Storage

For servers connected to SAN, they can see the SAN and FC fabric. They need HBA drivers and have FC PCI cards, and have multipathing software installed.

Normally needs an advanced filesystem or volume manager to RAID local disk.

No VM is connected to FC fabric or the SAN. VM only sees the local disk. Even with NPIV, the VM does not send FC frames. Multipathing is provided by vSphere, transparent to VM.

There is no need for RAID local disk. It is one virtual disk, not two. Availability is provided at the hardware layer.

Backup

Backup agent and backup LAN needed in the majority of cases.

Not needed in the majority of cases, as backup is done via vSphere VADP API. Agent is only required for application-level backup.

Network

NIC teaming is common. Typically needs two cables per server.

Guest OS is VLAN aware. It is configured inside the OS. Moving VLAN requires reconfiguration.

NIC teaming provided by ESXi. VM is not aware and only sees one vNIC.

VLAN is provided by vSphere, transparent to VM. VM can be moved from one VLAN to another live.

Antivirus (AV)

The AV agent is installed on Guest.

AV consumes OS resources and can be seen by the attacker. AV signature updates cause high storage throughput.

An AV agent runs on the ESXi host as a VM (one per ESXi).

AV does not consume the Guest OS resources and it cannot be seen by the attacker from inside the Guest OS. AV signature updates do not require high IOPS inside the Guest OS. The total IOPS is also lower at the ESXi host level as it is not done per VM.

Lastly, let's take a look at the impact on management and monitoring. As can be seen next, even the way we manage the servers changes once they are converted into VMs:

Property

Physical server

Virtual machine

Monitoring

An agent is commonly deployed. It is typical for a server to have multiple agents.

In-Guest counters are accurate.

A physical server has an average of 5 percent CPU utilization due to the multicore chip. As a result, there is no need to monitor it closely.

An agent is typically not deployed. Certain areas such as application and Guest OS monitoring are still best served by an agent.

The key in-Guest counters are not accurate.

A VM has an average of 50 percent CPU utilization as it is right sized. This is 10 times higher when compared with a physical server. As a result, there is a need to monitor closely, especially when physical resources are oversubscribed. Capacity management becomes a discipline in itself.

Availability

HA is provided by clusterware such as MSCS and Veritas Cluster.

Cloning a physical server is a complex task and requires the boot drive to be on the SAN or LAN, which is not typical.

Snapshot is rarely done, due to cost and complexity.

HA is a built-in core component of vSphere. Most clustered physical servers end up as just a single VM as vSphere HA is good enough.

Cloning can be done easily. It can even be done live. The drawback is that the clone becomes a new area of management.

Snapshot can be done easily. In fact, this is done every time as part of backup process. Snapshot also becomes a new area of management.

Asset

The physical server is an asset and it has book value. It needs proper asset management as components vary among servers.

Here, the stock-take process is required.

VM is not an asset as it has no accounting value. A VM is like a document. It is technically a folder with files in it.

Stock-take is no longer required as the VM cannot exist outside vSphere.

 

Software-Defined Data Center


We covered how a VM differs drastically compared to a physical server. Now let's take a look at the big picture, which is at the data center level. A data center consists of three functions—compute, network, and storage. I use the term compute as we are entering the converged infrastructure era, where the server performs storage too and they are physically in one box. There is no more separation and we cannot say this is the boundary where the server stops and the storage starts.

VMware is moving to virtualize the network and storage functions as well, resulting in a data center that is fully virtualized and defined in the software. The software is the data center. We no longer prepare the architecture in the physical layer. The physical layer is just there to provide resources. These resources are not aware of one another. The stickiness is reduced and they become a commodity. In many cases, the hardware can even be replaced without incurring downtime to the VM.

The next diagram shows one possibility of a data center that is defined in the software. I have drawn the diagram to state a point, so don't take this as the best practice for SDDC architecture. Also, the technology is still evolving, so expect changes in the next several years. In the diagram, there are two physical data centers. Large enterprises will have more physical data centers. The physical data centers are completely independent. Personally, I believe this is a good thing. Ivan Pepelnjak, someone I respect highly on data center networking architecture, states that:

Interconnected things tend to fail at the same time

Note

This specific sentence can be found at http://blog.ipspace.net/2012/10/if-something-can-fail-it-will.html. I also found the following article to be very useful: http://blog.ipspace.net/2013/02/hot-and-cold-vm-mobility.html.

Each of these physical functions (compute, network, and storage) is supported, or shall I say instantiated, in the physical world, by the respective hardware vendors. For a server, you might have vendors (for example, Nutanix, HP, Lenovo, Dell, and so on) that you trust and know. I have drawn two vendors to show the message that they do not define the architecture. They are there to support the function of that layer (for example, Compute Function). So, you can have 10 vSphere clusters: 3 clusters could be Vendor A, and 7 clusters could be Vendor B.

The same approach is then implemented in Physical Data Center 2, but without the mindset that the data centers have to be of the same vendor. Take Storage Function, as an example. You might have Vendor A on data center 1, and Vendor B on data center 2. You are no longer bound by the hardware compatibility; storage array replication normally requires the same model and protocol. You can do this as the physical data centers are completely independent of each other. They are neither connected nor stretched. The replication is done at the hypervisor layer. vSphere 5.5 has built-in host-based replication via TCP/IP. It can replicate individual VMs, and provides finer granularity than LUN-based replication. Replication can be done independently from a storage protocol (FC, iSCSI, or NFS) and VMDK type (thick or thin). You might decide to keep the same storage vendor but that's your choice, not something forced upon you.

On top of these physical data centers, you can define and deploy your virtual data centers. A virtual data center is no longer contained in a single building bound by a physical boundary. Although, bandwidth and latency are still limiting factors, the main thing here is you can architect your physical data centers as one or more logical data centers. You should be able to automatically, with just one click in SRM 5.5, move thousands of servers from data center A to data center B; alternatively, you can perform DR from four branch sites to a common HQ data center.

You are not bound to have one virtual data center per site, although it is easier to map it one-on-one with the current release of vSphere. For example, it is easier if you just have one vCenter per physical data center.

An example of SDDC

The next screenshot shows what a vCenter looks like in vSphere 5.5, the foundation of vCloud Suite. VMware continues integrating and enhancing vCloud Suite, and I would not be surprised to see its capability widening in future releases.

vCenter 5.5

I will zoom in to a part of the screenshot as it's rather small. The left part of the screenshot, shown next, shows that there are three vCenter Servers, and I've expanded each of them to show their data centers, clusters, hosts, and VMs:

From here, we can tell that we no longer need another inventory management software, as we can see all objects and their configurations and how they relate to one another. It is clear from here how many data centers, clusters, ESXi hosts, and VMs we have.

We also get more than static configuration information. Can you see what live or dynamic information is presented here? These are not the types of information you get from CMDB or the inventory management system.

You will notice from the preceding screenshot that I get warnings and alerts, so this is a live environment. I also get information on the capacity and health. At the corner of the screen, you can see the data center CPU, memory, storage capacity, and usage. In the vSphere Replication box, you can see the VM replication status. For example, you can see that it has 7 outgoing replications and 3 incoming replications. In the middle of the screen, you can see Health State, which, by the way, comes from vRealize Operations. In the Infrastructure Navigator box, you get to see what applications are running, such as Application Server and Database Server. This information also comes from vRealize Operations. So, many of the management functions are provided out of the box. These functions are an integral part of vCloud Suite.

The compute function

As a virtualization engineer, I see a cluster as the smallest logical building block in vSphere. I treat it as one computer. You should also perform your capacity management at the cluster level and not at the host level. This is because a VM moves around within a cluster with DRS and Storage DRS. In the virtual data center, you think in terms of a cluster and not a server.

Let's take a look at the cluster called SDDC-Mgmt-Cluster, shown in the next screenshot. We can tell that it has 3 hosts, 24 processors (that's cores, not socket or threads), and 140 GB of RAM (about 4 GB is used by the three instances of VMkernel). We can also tell that it has EVC Mode enabled, and it is based on the Intel Nehalem generation. This means I can add an ESXi host running a newer Intel processor (for example, Westmere) live inside the cluster, and perform vMotion across the CPU generation. On the top-right corner, we can see the capacity used, just like we can see at the vCenter level. In a sense, we can drill down from the vCenter level to the cluster level.

We can also see that HA and DRS are turned on. DRS is set to fully automated, which is what I would recommend as you do not want to manually manage the ESXi host one by one. There is a whole book on vSphere Cluster, as there are many settings on this features. My favorite is by Duncan Epping and Frank Denneman, which is available at http://www.yellow-bricks.com/my-bookstore/.

The ramification of this is that the data center management software needs to understand vSphere well. It has to keep up with the enhancements in vSphere and vCloud Suite. A case in point: vSphere 5.5 in the Update 1 release added Virtual SAN, a software-defined storage integrated into vSphere.

Notice Health State. Again, this information comes from vRealize Operations. If you click on it, it will take you to a more detailed page, showing charts. If you drill down further, it will take you to vRealize Operations.

The Infrastructure Navigator box is useful so you know what applications are running in your cluster. For example, if you have a dedicated cluster for Microsoft SQL Server (as you want to optimize the license) and you see SQL in this cluster (which is not supposed to run the database), you know you need to move the VM. This is important because sometimes as an infrastructure team, you do not have access to go inside the VM. You do not know what's running on top of Windows or Linux.

vSphere 5.5 cluster

The network function

We covered compute. Let's move on to network. The next screenshot shows a distributed virtual switch. As you can see, the distributed switch is an object at the data center level. So it extends across clusters. In some environments, this can result in a very large switch with more than 1,000 ports. In the physical world, this would be a huge switch indeed!

A VM is connected to either a standard switch or a distributed switch. It is not connected to the physical NIC in your ESXi host. The ESXi host physical NICs become the switch's uplinks instead, and generally you have 2 x 10 GE ports. This means that the traditional top-of-rack switch has been entirely virtualized. It runs completely as software, and the following screenshot is where you create, define, and manage it. This means the management software needs to understand the distributed vSwitch and its features. As you will see later, vRealize Operations understands virtual switches and treats networking as a first-class object.

vSphere 5.5 Distributed vSwitch

The previous screenshot shows that the switch has six port groups and two uplinks. Let's drill down into one of the port groups, as shown in the next screenshot. Port group is a capability that is optional in physical switches, but mandatory in a virtual switch. It lets you group a number of switch ports and give it a common property. You can also set policies. As shown in the Policies box, there are many properties that you can set. Port group is essential in managing all the ports connected to the switch.

In the top-right corner, you see the CAPACITY information. So you know how many ports you configured and how many ports are used. This is where virtual networking is different to virtual compute and virtual storage. For compute and storage, you need to have the underlying physical resources to back it up. You cannot create a VM with a 32-core vCPU if the underlying ESXi has less than 32 physical threads. Virtual network is different. Network is an interconnection; it is not a "node"-like compute and storage. It is not backed by physical ports. You can increase the number of ports to basically any number you want. The entire switch lives on memory! You power off the ESXi and there is no more switch.

In the Infrastructure Navigator box, you will again see the list of applications. vRealize Operations is deeply embedded into vSphere, making you feel like it's a single application as it is a single pane of glass. In the past several releases of VMware products; they are becoming one integrated suite and this trend is set to continue.

vSphere 5.5 Distributed vSwitch Port Group

The storage function

Let's now move to storage. The next screenshot shows a vSphere 5.5 datastore cluster. The idea behind a datastore cluster is similar to that of a compute cluster. Let's use an example as it's easier to understand. Say you have a cluster of 8 ESXi hosts, with each host sporting 2 sockets, 24 cores, and 48 threads. In this cluster, you run 160 VMs, giving you a 20:1 consolidation ratio. This is reasonable from a performance management view as the entire cluster has 192 physical cores and 384 physical threads. Based on the general guidelines that Intel Hyper-Threading gives around a 50 percent performance boost, you can use 288 cores as the max thread count. This gives you around 1.8 cores per VM, which is reasonable as most VMs are 2 vCPU and have around 50 percent utilization. These 160 VMs are stored in 8 datastores, or around 20 VMs per datastore.

With the compute node, you need not worry about where a VM is running in that cluster. When you provision a new VM, you do not specify which host will run it. You let DRS decide. As the workload goes up and down, you do not want to manage the placement on an individual ESXi host for 160 VMs. You let DRS do the load balancing, and it will use vMotion on the VM automatically. You treat the entire cluster as if it is a single giant box.

With the storage node, you can do the same thing. When you provision a new VM, you do not specify a datastore for it. If you do want to specify it manually, you need to check which datastore has the most amount of space and the least amount of IOPS. The first piece of information is quite easy to check, but the second one is not. This is the first value of the datastore cluster. It picks a cluster based on both capacity and performance. The second value is based on the ongoing operation. As time passes, VM grows at different rates in terms of both capacity and IOPS. Storage DRS monitors this and makes recommendations for you. The major difference here is the amount of data to be migrated. In vMotion, we normally migrate somewhere between 1 GB to 10 GB of RAM, as the kernel only copies the used RAM (and not the configured RAM). In storage vMotion, we potentially copy 100 GB of data. This takes a lot longer and hence has a greater performance impact. As such, Storage DRS should be performed a lot less frequently, perhaps once a month.

Datastore cluster helps in capacity management, as you basically treat all the datastores as one. You can easily check key information about the datastore cluster, such as the number of VMs, total storage, capacity used, and largest free space you have.

As usual, vRealize Operations provides information about what applications are running in the datastore cluster. This is handy information in a large environment, where you have specific datastores for specific applications.

vSphere 5.5 Datastore Cluster

All together now

We covered all the three elements—compute, storage, and network. How are they related? The next screenshot shows the relationship of the key objects managed by vCenter.

It's handy information in a small environment. If you have a large environment, maps such as the one shown in the next screenshot really become much more complex! In this map, I only have 3 ESXi hosts and 7 datastores, and I have to hide some relationships already. Notice that I did not select the Host to VM and VM to datastore relationship options, because it got way too complicated when I did.

The point of sharing the screenshot is to share that you indeed have your data center in software with the following characteristics:

  • You have your VM as the consumer. You can show both powered-on and powered-off VMs.

  • You have your compute (ESXi), network (port group), and storage (datastore) as the provider. You can show the relationship between your compute to your network and storage.

  • You have the information about the network, storage, and compute your VM is connected to.

Think about it. How difficult will it be to have this type of relationship mapped in the physical data center? I've personally heard comments from customers that they do not know exactly how many servers they have, which network they are connected to, and what applications run on that box. The powered-off server is even harder to find! Even if you can implement a data center management system, which can give you the map, one or two years later you cannot be sure that the map is up-to-date. The management system has to be embedded into the platform. In fact, it's the only point of entry to the virtual platform. It cannot be a separate, detached system.

vSphere Maps

The last point I'd like to bring up is that SDDC is a world in itself. It's not simply your data center virtualized. Look at the following table. It lists some of the objects in vSphere. I have not included NSX, Virtual SAN, or vRealize Suite objects here. These objects do not have their physical equivalent. If they do, they have different properties, generate different events, and are measured by different counters. Plus, all these objects have some relationship with one another. You need to look at vCloud Suite in its entirety to understand it well.

vSphere Objects and their relationships

The downside of this SDDC is that the upgrade of this "giant machine" is a new project for IT. It has to be planned and implemented carefully because it is as good as upgrading the data center while servers, storage, and network are all still running. Using a physical world analogy, it's like renovating your home while living in it.

 

A virtual data center versus a physical data center


We covered SDDC to a certain depth. We can now summarize the key differences between a physical data center and a virtual one. To highlight the differences, I'm assuming in this comparison the physical data center is 0 percent virtualized and the virtual data center is 100 percent virtualized. For the virtual data center, I'm assuming you have also adjusted your operation, because operating a virtual data center with a physical operation mindset results in a lot of frustration and suboptimal virtualization. This means your processes and organization chart have been adapted to a virtual data center.

Data center

The following table summarizes the data centers as physical and virtual data centers:

Physical data center

Virtual data center

This is bounded by one physical site. Data center migration is a major and expensive project.

This is not bound to any physical site. Multiple virtual data centers can exist in one physical data center, and a single virtual data center can span multiple physical data centers. The entire DC can be replicated and migrated.

Server

The following table summarizes servers in physical and virtual data centers:

Physical data center

Virtual data center

1,000 physical servers (just an example, so we can provide a comparison).

It may have 2,000 VMs. The number of VMs is higher for multiple reasons: VM sprawl; the physical server tends to run multiple applications or instances whereas VM runs only one; DR is much easier and hence, more VMs are protected.

Growth is relatively static and predictable, and normally it is just one way (adding more servers).

The number of VMs can go up and down due to dynamic provisioning.

Downtime for hardware maintenance or a technology refresh is a common job in a large environment due to component failure.

Planned downtime is eliminated with vMotion and storage vMotion.

5 to 10 percent average CPU utilization, especially in the CPU with a high core count.

50 to 80 percent utilization for both VM and ESXi.

Racks of physical boxes, often with a top-of-rack access switch and UPS. The data center is a large consumer of power.

Rack space requirements shrink drastically as servers are consolidated and the infrastructure is converged. There is a drastic reduction in space and power.

Low complexity. Lots of repetitive work and coordination work, but not a lot of expertise required.

High complexity. Less quantity, but deep expertise required. A lot less number of people, but each one is an expert.

Availability and performance monitored by management tools, which normally uses an agent. It is typical for a server to have many agents.

Availability and performance monitoring happens via vCenter, and it's agentless for the infrastructure. All other management tools get their data from vCenter, not individual ESXi or VM. Application-level monitoring is typically done using agents.

The word cluster means two servers joined with a heartbeat and shared storage, which is typically SAN.

The word cluster has a very different meaning. It's a group of ESXi hosts sharing the workload. Normally, 8 to 12 hosts, not 2.

High Availability (HA) is provided by clusterware, such as MSCS and Veritas. Every cluster pair needs a shared storage, which is typically SAN. Typically one service needs two physical servers with a physical network heartbeat; hence, most servers are not clustered as the cost and complexity is high.

HA is provided by vSphere HA, including services monitoring via Application HA. All VMs are protected, not just a small percentage. The need for traditional clustering software drops significantly, and a new kind of clustering tool develops. The cluster for VMware integrates with vSphere and uses the vSphere API.

Fault Tolerance is rarely used due to cost and complexity. You need specialized hardware, such as Stratus ftServer.

Fault tolerance is an on-demand feature as it is software-based. For example, you can temporarily turn it on during batch jobs run.

Anti-Virus is installed on every server. Management is harder in a large environment.

Anti-Virus is at the hypervisor level. It is agentless and hence, is no longer visible by malware.

Storage

The following table summarizes storage in physical and virtual data centers:

Physical data center

Virtual data center

1,000 physical servers (just an example, so we can provide a comparison), where IOPS and capacity do not impact each another. A relatively static environment from a storage point of view because normally, only 10 percent of these machines are on SAN/NAS due to cost.

It has a maximum of 2,000 interdependent VMs, which impact one another. A very dynamic environment where management becomes critical because almost all VMs are on a shared storage, including distributed storage.

Every server on SAN has its own dedicated LUN. Some data centers, such as databases, may have multiple LUNs.

Most VMs do not use RDM. They use VMDK and share the VMFS or NFS datastore. The VMDK files may reside in different datastores.

Storage migration is a major downtime, even within the same array. A lot of manual work is required.

Storage migration is live with storage vMotion. Intra-array is faster due to VAAI API.

Backup, especially in the x64 architecture, is done with backup agents. As SAN is relatively more expensive and SAN boot is complex at scale, backup is done via the backup LAN and with the agent installed. This creates its own problem as the backup agents have to be deployed, patched, upgraded, and managed. The backup process also creates high disk I/O, impacting the application performance. Because the data center is network intensive and carries sensitive data, an entire network is born for backup purposes.

The backup service is provided by the hypervisor. It is LAN-free and agentless. Most backup software use the VMware VADP API to do VM backup. No, it does not apply to databases or other applications, but it is good enough for 90 percent of the VM population. Because backup is performed outside the VM, there is no performance impact on the application or Guest OS. There is also no security risk, as the Guest OS Admin cannot see the backup network.

Storage's QoS is taken care of by an array, although the array has no control over the demand of IOPS coming from servers.

Storage's QoS is taken care of by vSphere, which has full control over every VM.

Network

The following table summarizes the network in physical and virtual data centers:

Physical data center

Virtual data center

The access network is typically 1 GE, as it is sufficient for most servers. Typically, it is a top-of-rack entry-level switch.

The top-of-rack switch is generally replaced with the end-of-row distribution switch, as the access switch is completely virtualized. ESXi typically uses 10 GE.

VLAN is normally used for segregation. This results in VLAN complexity.

VLAN is not required (the same VLAN can be blocked) for segregation by NSX.

Impacted by the spanning tree.

No Spanning Tree.

A switch must learn the MAC address as it comes with the server.

No need to learn the MAC address as it's given by vSphere.

Network QoS is provided by core switches.

Network QoS by vSphere and NSX.

DMZ Zone is physically separate. Separation is done at the IP layer. IDS/IPS deployment is normally limited in DMZ due to cost and complexity.

DMZ Zone is logically separate. Separation is not limited to IP and done at the hypervisor layer. IDS/IPS is deployed in all zones as it is also hypervisor-based.

No DR Test network is required. As a result, the same hostname cannot exist on DR Site, making a true DR Test impossible without shutting down production servers.

DR Test Network is required. The same hostname can exist on any site as a result. This means DR Test can be done anytime as it does not impact production.

Firewall is not part of the server. It is typically centrally located. It is not aware of the servers as it's completely independent from it.

Firewall becomes a built-in property of the VM. The rules follow the VM. When a VM is vMotion-ed to another host, the rule follows it and is enforced by the hypervisor.

Firewall scales vertically and independently from the workload (demand from servers). This makes sizing difficult. IT ends up buying the biggest firewall they can afford, hence increasing the cost.

Firewall scales horizontally. It grows with demand, since it is deployed as part of the hypervisor (using NSX). Upfront cost is lower as there is no need to buy a pair of high-end firewall upfront.

Traffic has to be deliberately directed to the firewall. Without it, the traffic "escapes" the firewall.

All traffic passes the firewall as it's embedded into the VM and hypervisor. It cannot "escape" the firewall.

Firewall rules are typically based on the IP address. Changing the IP address equals changing the rules. This results in a database of long and complicated rules. After a while, the firewall admin dare not delete any rules as the database becomes huge and unmanageable.

Rules are not tied to the IP address or hostname. This makes rules much easier. For example, we can say that all VMs in the Contractor Desktop pool cannot talk to each other. This is just one rule. When a VM gets added to this pool, the rule is applied to it.

Load Balancer is typically centrally located. Just like the firewall, sizing becomes difficult and the cost goes higher.

Load Balancer is distributed. It scales with the demand.

Disaster Recovery

The following table summarizes Disaster Recovery (DR) in physical and virtual data centers:

Physical data center

Virtual data center

Architecturally, DR is done on a per-application basis. Every application has its own bespoke solution.

DR is provided as a service by the platform. It is one solution for all applications. This enables data center-wide DR.

The standby server on the DR site is required. This increases the cost. Because the server has to be compatible with the associated production server, this increases complexity in a large environment.

No need for a standby server. The ESXi cluster on the DR site typically runs the non-production workload, which can be suspended (hibernate) during DR. The DR site can be of a different server brand and CPU.

DR is a manual process, relying on a run book written manually. It also requires all hands on deck. An unavailability of key IT resources when disaster strikes can impact the organization's ability to recover.

The entire DR steps are automated. Once management decides to trigger DR, all that needs to be done is to execute the right recovery process in VMware Site Recovery Manager. No manual intervention.

A complete DR dry run is rarely done, as it is time consuming and requires production to be down.

A DR dry run can be done frequently, as it does not impact the production system. It can even be done on the day before the actual planned DR.

The report produced after a DR exercise is manually typed. It is not possible to prove that what is documented in the Microsoft Word or Excel document is what actually happened in the data center.

The report is automatically generated, with no human intervention. It timestamps every step, and provides a status whether it was successful or not. The report can be used as audit proof.

Application

The following table summarizes the application in physical and virtual data centers:

Physical data center

Virtual data center

Licensing is bound by the physical server. It is a relatively simple thing to manage.

Licensing is bound by an entire cluster or per VM. It can be more expensive or cheaper, hence it is complex from a management point of view.

All applications are supported.

Most applications are supported. The ones that are not supported are primarily due to the outdated perception by the ISV vendor. When more apps are developed in the virtual environment, this perception will go away.

Infrastructure team

The following table summarizes the infrastructure team in physical and virtual data centers:

Physical data center

Virtual data center

There's a clear silo between the compute, storage, and network teams. In organizations where the IT team is big, the DR team, Windows team, and Linux team could also be separate teams. There is also a separation between the engineering, integration (projects), and operations (business as usual) teams. The team, in turn, needs layers of management. This results in rigidity in IT.

With virtualization, IT is taking the game to the next level. It's a lot more powerful than the previous architecture. When you take the game to the next level, the enemy is also stronger. In this case, the expertise required is deeper and the experience requirement is more extensive. Earlier, you may have needed 10 people to manage 1,000 physical servers. With virtualization, you might only need three people to manage 2,000 VMs on 100 ESXi hosts. However, these 3 people have deeper expertise and more experience than the 10 people combined.

 

Management disciplines impacted by virtualization


We covered all the changes introduced by virtualization. Virtualization changes the architecture of IT, turning operation as usual from best practice to dated practice. The following table now summarizes from the pillar of management, so that we can see the impact from a specific discipline:

Area impacted

Why is it impacted?

Performance management

This gets harder as the performance of ESXi/VM/Datastore can impact one another. The entire environment is no longer static. VM activities such as vMotion, Storage vMotion, provisioning, power on, and so on also add to the workload. So, there is VM workload and infrastructure workload. Performance issues can originate from any component.

Troubleshooting something that is dynamic is difficult. Unlike a physical data center, the first thing we need to check is the overall health because of the interdependency. Only when we are satisfied that the problem is not wide-spread that we zoom in to a specific object (for example, VM, ESXi, and datastore).

Performance degradations can also be caused by configuration changes. These configuration changes occur more frequently than a physical data center as many of them can be done live.

QoS becomes mandatory due to shared resources.

A new requirement is application visibility. We can no longer troubleshoot in isolation without knowing which applications run inside that VM.

Availability management

vCloud Suite relies heavily on shared storage. The availability of this storage becomes critical. Enterprise should consider storage as an integral part of the platform, and not a subsystem managed by a different team.

Clustering software is mostly replaced with vSphere.

Backup is mostly agentless and LAN-free.

DR becomes a service provided by the platform.

Capacity management

Capacity management becomes a complex process. You need a tool that understands the dynamic nature of vCloud Suite.

Compliance management

Compliance becomes more complex due to the lack of physical segregation.

vCloud Suite itself is a big area that needs to be in compliance.

Security

Access to vCloud Suite needs to be properly controlled.

Configuration management (related to Change management)

vCloud Suite became the new source of truth, displacing the CMDB (as it is detached from the environment it manages). The need for another database to manage the virtual environment has to be weighed in as there is already a de facto database, which is vCenter. For example, if vCenter shows a VM is running, but there is no record in CMDB, do you power off and delete the VM? Certainly not. As a result, CMDB becomes less important as vCloud Suite itself provides the data.

VM configuration changes need to be tracked. Changes happen more often and faster.

vSphere becomes another area where configuration management needs to be applied.

Patch management

The data center itself becomes the software, which needs to be patched and upgraded. This can be automated to a large extent.

Because it is software, it needs to have a non-production copy.

Financial management

Chargeback (or showback at the minimal) becomes mandatory as the infrastructure is no longer owned by the application team. Shared resources means users do not expect to pay the full price.

Asset management

Drastically simplified as the VM is not an asset. Most network and storage appliances become software.

ESXi is the new asset, but it can't be changed without central management (vCenter) being alerted. The configuration is also standardized.

Stock-take is no longer applicable for the VM and top-of-rack access switch. Inventory is built-in in vSphere and NSX.

Operations management

Although ITIL principles do not change, the details of a lot of processes change drastically. We covered some of them previously.

 

Summary


I hope you enjoyed the comparison and found it useful. We covered, to a great extent, the impact caused by virtualization and the changes it introduces. We started by clarifying that virtualization is a different technology compared to partitioning. We then explained that once a physical server is converted into a virtual machine, it takes on a very different form and has radically different properties. The changes range from the core property of the server itself to how we manage it. This, in turn, creates a ripple effect in the bigger picture. The entire data center changes once we virtualize it.

In the next chapter, we will cover capacity management in greater depth, as it is an area that is made more complex once you virtualize your data center.

About the Author
  • Iwan 'e1' Rahabok

    Iwan 'e1' Rahabok was the first VMware SE for strategic accounts in ASEAN. Joining VMware in 2008 from Sun Microsystems, he has seen how enterprises adopt virtualization and cloud computing and reap the benefits while overcoming the challenges. It is a journey that is very much ongoing and the book reflects a subset of that undertaking. Iwan was one of the first to achieve the VCAP-DCD certification globally and has since helped others to achieve the same, via his participation in the community. He started the user community in ASEAN, and today, the group is one of the largest VMware communities on Facebook. Iwan is a member of VMware CTO Ambassadors program since 2014, representing the Asia Pacific region at the global level and representing the product team and CTO office to the Asia Pacific customers. He is a vExpert since 2013, and has been helping others to achieve this global recognition for their contribution to the VMware community. After graduating from Bond University, Australia, Iwan moved to Singapore in 1994, where he has lived ever since.

    Browse publications by this author
VMware vRealize Operations Performance and Capacity Management
Unlock this book and the full library FREE for 7 days
Start now