vSphere High Performance Cookbook

By Prasenjit Sarkar
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. CPU Performance Design

About this book

VMware vSphere is the key virtualization technology in today’s market. vSphere is a complex tool and incorrect design and deployment can create performance-related problems. vSphere High Performance Cookbook is focused on solving those problems as well as providing best practices and performance-enhancing techniques.

vSphere High Performance Cookbook offers a comprehensive understanding of the different components of vSphere and the interaction of these components with the physical layer which includes the CPU, memory, network, and storage. If you want to improve or troubleshoot vSphere performance then this book is for you!

vSphere High Performance Cookbook will teach you how to tune and grow a VMware vSphere 5 infrastructure. This book focuses on tuning, optimizing, and scaling the infrastructure using the vSphere Client graphical user interface. This book will enable the reader with the knowledge, skills, and abilities to build and run a high-performing VMware vSphere virtual infrastructure.

You will learn how to configure and manage ESXi CPU, memory, networking, and storage for sophisticated, enterprise-scale environments. You will also learn how to manage changes to the vSphere environment and optimize the performance of all vSphere components.

This book also focuses on high value and often overlooked performance-related topics such as NUMA Aware CPU Scheduler, VMM Scheduler, Core Sharing, the Virtual Memory Reclamation technique, Checksum offloading, VM DirectPath I/O, queuing on storage array, command queuing, vCenter Server design, and virtual machine and application tuning.

By the end of this book you will be able to identify, diagnose, and troubleshoot operational faults and critical performance issues in vSphere.

Publication date:
July 2013
Publisher
Packt
Pages
240
ISBN
9781782170006

 

Chapter 1. CPU Performance Design

In this chapter, we will cover the tasks related with CPU performance design. You will learn the following aspects of CPU performance design:

  • Critical performance consideration – VMM scheduler

  • CPU scheduler – processor topology/cache aware

  • Ready time – warning sign

  • Hyperthreaded core sharing

  • Spotting CPU overcommitment

  • Fighting guest CPU saturation in SMP VMs

  • Controlling CPU resources using resource settings

  • What is most important to monitor in CPU performance

  • CPU performance best practices

 

Introduction


Ideally, a performance problem should be defined within the context of an ongoing performance management process. Performance management refers to the process of establishing performance requirements for applications, in the form of a service-level agreement (SLA), and then tracking and analyzing the achieved performance to ensure that those requirements are met. A complete performance management methodology includes collecting and maintaining baseline performance data for applications, systems, and subsystems, for example, storage and network.

In the context of performance management, a performance problem exists when an application fails to meet its predetermined SLA. Depending on the specific SLA, the failure might be in the form of excessively long response times or throughput below some defined threshold.

ESX/ESXi and virtual machine performance tuning is complicated because virtual machines share underlying physical resources, and in particular the CPU.

Finally, configuration issues or inadvertent user errors might lead to poor performance. For example, a user might use a symmetric multiprocessing (SMP) virtual machine when a single processor virtual machine would work well. You might also see a situation where a user sets shares but then forgets about resetting them, resulting in poor performance because of the changing characteristics of other virtual machines in the system.

If you overcommit any of these resources, you might see performance bottlenecks. For example, if too many virtual machines are CPU intensive, you might see slow performance because all of the virtual machines need to share the underlying physical CPU.

 

Critical performance consideration – VMM scheduler


The virtual machine monitor (VMM) is a thin layer that provides a virtual x86 hardware environment to the guest operating system on a virtual machine. This hardware includes a virtual CPU, virtual I/O devices, and timers. The VMM leverages key technologies in the VMkernel, such as scheduling, memory management, and the network and storage stacks.

Each VMM is devoted to one virtual machine. To run multiple virtual machines, the VMkernel starts multiple VMM instances, also known as worlds. Each VMM instance partitions and shares the CPU, memory, and I/O devices to successfully virtualize the system. The VMM can be implemented by using hardware virtualization, software virtualization (binary translation), or paravirtualization (which is deprecated) techniques.

Paravirtualization refers to the communication between the guest operating system and the hypervisor to improve performance and efficiency. The value proposition of paravirtualization is in the lower virtualization overhead, but the performance advantage of paravirtualization over hardware or software virtualization can vary greatly depending on the workload. Because paravirtualization cannot support unmodified operating systems (for example, Windows 2000/XP), its compatibility and portability is poor.

Paravirtualization can also introduce significant support and maintainability issues in production environments because it requires deep modifications to the operating system kernel and for this reason it was most widely deployed on Linux-based operating systems.

Getting ready

To step through this recipe, you need a running ESXi Server, a Virtual Machine, vCenter Server, and a working installation of the vSphere Client. No other prerequisites are required.

How to do it...

Let's get started:

  1. Open up VMware vSphere Client.

  2. Log in to the vCenter Server.

  3. In the virtual machine inventory, right-click on the virtual machine, and then click on Edit Settings. The Virtual Machine Properties dialog box appears.

  4. Click on the Options tab.

  5. Change the CPU/MMU Virtualization option under Advanced to one of the following options:

    • Automatic

    • Use software for instruction set and MMU virtualization

    • Use Intel VT-X/AMD-V for instruction set virtualization and software for MMU virtualization

    • Use Intel VT-X/AMD-V for instruction set virtualization and Intel EPT/AMD RVI for MMU virtualization

  6. Click on OK to save your changes.

  7. For the change to take effect, perform one of these actions:

    • Reset the virtual machine

    • Suspend and then resume the virtual machine

    • vMotion the virtual machine

How it works...

The VMM determines a set of possible monitor modes to use, and then picks one to use as the default monitor mode, unless something other than Automatic has been specified. The decision is based on:

  • The physical CPU's features and guest operating system type

  • Configuration file settings

There are three valid combinations for the monitor mode, as follows:

  • BT: Binary translation and shadow page tables

  • HV: AMD-V or Intel VT-x and shadow page tables

  • HWMMU: AMD-V with RVI, or Intel VT-x with EPT (RVI is inseparable from AMD-V, and EPT is inseparable from Intel VT-x)

BT, HV, and HWMMU are abbreviations used by ESXi to identify each combination.

When a virtual machine is powering on, the VMM inspects the physical CPU's features and the guest operating system type to determine the set of possible execution modes. The VMM first finds the set of modes allowed. Then it restricts the allowed modes by configuration file settings. Finally, among the remaining candidates, it chooses the preferred mode, which is the default monitor mode. This default mode is then used if you have left Automatic selected.

For the majority of workloads, the default monitor mode chosen by the VMM works best. The default monitor mode for each guest operating system on each CPU has been carefully selected after a performance evaluation of available choices. However, some applications have special characteristics that can result in better performance when using a non-default monitor mode. These should be treated as exceptions, not the rule.

The chosen settings are honored by the VMM only if the settings are supported on the intended hardware. For example, if you select Use software instruction set and MMU virtualization for a 64-bit guest operating system running on a 64-bit Intel processor, the VMM will choose Intel VT-x for CPU virtualization instead of BT. This is because BT is not supported by the 64-bit guest operating system on this processor.

There's more...

The virtual CPU consists of the virtual instruction set and the virtual memory management unit (MMU). An instruction set is a list of instructions that a CPU executes. The MMU is the hardware that maintains the mapping between the virtual addresses and the physical addresses in the memory.

The combination of techniques used to virtualize the instruction set and memory determines the monitor execution mode (also called the monitor mode). The VMM identifies the VMware ESXi hardware platform and its available CPU features, and then chooses a monitor mode for a particular guest operating system on that hardware platform. The VMM might choose a monitor mode that uses hardware virtualization techniques, software virtualization techniques, or a combination of hardware and software techniques.

We always had a challenge in hardware virtualization. x86 operating systems are designed to run directly on the bare metal hardware, so they assume that they have full control on the computer hardware. The x86 architecture offers four levels of privilege to operating systems and applications to manage access to the computer hardware: ring 0, ring 1, ring 2, and ring 3. User-level applications typically run in ring 3, the operating system needs to have direct access to the memory and hardware, and must execute its privileged instructions in ring 0.

Binary translation allows the VMM to run in ring 0 for isolation and performance, while moving the guest operating system to ring 1. Ring 1 is a higher privilege level than ring 3 and a lower privilege level than ring 0.

VMware can virtualize any x86 operating systems by using a combination of binary translation and direct execution techniques. With binary translation, the VMM dynamically translates all guest operating system instructions and caches the results for future use. The translator in the VMM does not perform a mapping from one architecture to another; that would be emulation not translation. Instead, it translates from the full unrestricted x86 instruction set issued by the guest operating system to a subset that is safe to execute inside the VMM. In particular, the binary translator replaces privileged instructions with sequences of instructions that perform the privileged operations in the virtual machine rather than on the physical machine. This translation enforces encapsulation of the virtual machine while preserving the x86 semantics as seen from the perspective of the virtual machine.

Meanwhile, user-level code is directly executed on the processor for high-performance virtualization. Each VMM provides each virtual machine with all of the services of the physical system, including a virtual BIOS, virtual devices, and virtualized memory management.

In addition to software virtualization, there is support for hardware virtualization. This allows some of the work of running virtual CPU instructions to be offloaded onto the physical hardware. Intel has the Intel Virtualization Technology (Intel VT-x) feature. AMD has the AMD Virtualization (AMD-V) feature. Intel VT-x and AMD-V are similar in aim but different in detail. Both designs aim to simplify virtualization techniques.

 

CPU scheduler – processor topology/cache aware


ESXi Server has an advanced CPU scheduler geared towards providing high performance, fairness, and isolation of virtual machines running on Intel/AMD x86 architectures.

The ESXi CPU scheduler is designed with the following objectives:

  • Performance isolation: Multi-VM fairness.

  • Co-scheduling: illusion that all vCPUs are concurrently online.

  • Performance: high throughput, low latency, high scalability, and low overhead.

  • Power efficiency: saving power without losing performance.

  • Wide Adoption: enabling all the optimizations on diverse processor architecture.

There can be only one active process per CPU at any given instant, for example, multiple vCPUs can run on the same pCPU, just not at one instant, but there are often more processes than CPUs. Therefore, queuing will occur, and the scheduler is responsible for controlling the queue, handling priorities, and preempting the use of the CPU.

The main tasks of the CPU scheduler are to choose which world is to be scheduled to a processor. In order to give each world a chance to run, the scheduler dedicates a time slice (also known as the duration a world can be executed (usually 10-20 ms, 50 for VMkernel by default)) for each process and then migrates the state of the world between run, wait, costop, and ready.

ESXi implements the proportional share-based algorithm. It associates each world with a share of CPU resource across all virtual machines. This is called entitlement and is calculated from the user-provided resource specifications, such as shares, reservations, and limits.

Getting ready

To step through this recipe, you need a running ESXi Server, a Virtual Machine, and a working installation of vSphere Client. No other prerequisites are required.

How to do it...

Let's get started:

  1. Log in to the VMware vSphere Client.

  2. In the virtual machine inventory, right-click on the virtual machine, and click on Edit Settings. The Virtual Machine Properties dialog box appears.

  3. Click on the Options tab.

  4. Under the Advanced section, click on General Row.

  5. Now on the right-hand side click on the Configuration Parameters button.

  6. Now click on the Add Row button at the bottom and add the parameter sched.cpu.vsmpConsolidate and on the Value section type TRUE.

  7. The final screen should like the following screenshot and then click on OK to save the setting.

How it works...

The CPU scheduler uses processor topology information to optimize the placement of vCPUs onto different sockets.

The CPU scheduler spreads the load across all the sockets to maximize the aggregate amount of cache available.

Cores within a single socket typically use a shared last-level cache. Use of a shared last-level cache can improve vCPU performance if the CPU is running memory-intensive workloads.

By default, the CPU scheduler spreads the load across all sockets in under-committed systems. This improves performance by maximizing the aggregate amount of cache available to the running vCPUs. For such workloads, it can be beneficial to schedule all of the vCPUs on the same socket, with a shared last-level cache, even when the ESXi host is under committed. In such scenarios, you can override the default behavior of the spreading vCPUs across packages by including the following configuration option in the virtual machine's VMX configuration file, sched.cpu.vsmpConsolidate=TRUE. However, it is usually better to stick with the default behavior.

 

Ready time – warning sign


To achieve the best performance in a consolidated environment, you must consider a ready time.

Ready time is the time that the vCPU waits, in the queue, for the pCPU (or physical Core) to be ready to execute its instruction. The scheduler handles the queue and when there is contention, and the processing resources are stressed, the queue might become long.

The ready time describes how much of the last observation period a specific world spent waiting in the queue. The ready time for a particular world (for example, a vCPU) is how much time during that interval was spent waiting in the queue to get access to a pCPU. The ready time can be expressed in percentage per vCPU over the observation time and statistically it can't be zero on average.

The value of the ready time, therefore, is an indicator of how long the VM was denied access to the pCPU resources which it wanted to use. This makes it a good indicator of performance.

When multiple processes are trying to use the same physical CPU, that CPU might not be immediately available, and a process must wait before the ESXi host can allocate a CPU to it.

The CPU scheduler manages access to the physical CPUs on the host system. A short spike in CPU used or CPU ready indicates that you are making the best use of the host resources. However, if both values are constantly high, the hosts are probably overloaded and performance is likely poor.

Generally, if the CPU used value for a virtual machine is above 90 percent and the CPU ready value is above 20 percent per vCPU (high number of vCPUs), performance is negatively affected.

This latency may impact the performance of the guest operating system and the running applications within a virtual machine.

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of CPU-hungry virtual machines, VMware vCenter Server, and a working installation of vSphere Client. No other prerequisites are required.

How to do it...

Let's get started:

  1. Open up vSphere Client.

  2. Log in to the VMware vCenter Server.

  3. On the home screen, navigate to Hosts and Clusters.

  4. Expand the left-hand navigation list.

  5. Navigate to one of the CPU-hungry virtual machines.

  6. Navigate to the Performance screen.

  7. Navigate to the Advanced view.

  8. Click on Chart Options.

  9. Navigate to CPU from the Chart metrics.

  10. Navigate to the VM object.

    1. Select only Demand, Ready, and Usage in MHz.

      The key metrics when investigating a potential CPU issue are:

    • Demand: Amount of CPU that the virtual machine is trying to use.

    • Usage: Amount of CPU that the virtual machine is actually being allowed to use.

    • Ready: Amount of time for which the virtual machine is ready to run but (has work it wants to do) but was unable to because vSphere could not find physical resources to run the virtual machine on.

  11. Click on Ok.

In the following screenshot you will see the high ready time for the virtual machine:

Notice the amount of CPU this virtual machine is demanding and compare that to the amount of CPU usage the virtual machine is actually being able to get (usage in MHz). The virtual machine is demanding more than it is currently being allowed to use.

Notice that the virtual machine is also seeing a large amount of ready time.

Note

Ready time greater than 10 percent could be a performance concern. However, some less CPU-sensitive applications and virtual machines can have much higher values of ready time and still perform satisfactorily.

How it works...

Bad performance is when the users are unhappy. But that's subjective and hard to measure. We can measure other metrics easily, but they don't correlate perfectly with whether user's expectations are met. We want to find metrics that correlate well (though never perfectly) with user satisfaction. It's always the case that the final answer to "Is there a performance problem?" is subjective, but we can use objective metrics to make reasonable bets, and decide when it's worth asking the users if they're satisfied with the performance.

A vCPU is in ready state when the vCPU is ready to run (that is, it has a task it wants to execute) but is unable to run because the vSphere scheduler is unable to find physical host CPU resources to run the virtual machine on. One potential reason for elevated ready time is that the virtual machine is constrained by a user-set CPU limit or resource pool limit, reported as max limited (MLMTD). The amount of CPU denied because of a limit is measured as the metric max limited (MLMTD).

Ready time is reported in two different values between resxtop/esxtop and vCenter Server. In resxtop/esxtop, it is reported in an easily-understood percentage format. A figure of 5 percent means that the virtual machine spent 5 percent of its last sample period waiting for available CPU resources (only true for 1-vCPU VMs). In vCenter Server, ready time is reported as a time measurement. For example, in vCenter Server's real-time data, which produces sample values every 20,000 milliseconds, a figure of 1,000 milliseconds is reported for a 5 percent ready time. A figure of 2,000 milliseconds is reported for a 10 percent ready time.

Tip

As you may know that vCenter reports ready time in milliseconds (ms), use the following formula to convert the ms value to a percentage:

                                                 Metric Value (In Millisecond)
Metric Value (In Percent) = ------------------------------------------------	x 100
                                                 Total Time of Sample Period
                            (By default 20000 ms in vCenter for real-time graphs)

Although high ready time typically signifies CPU contention, the condition does not always warrant corrective action. If the value for ready time is close in value to the amount of time used on the CPU, and if the increased ready time occurs with occasional spikes in CPU activity but does not persist for extended periods of time, this might not indicate a performance problem. The brief performance hit is often within the accepted performance variance and does not require any action on the part of the administrator.

 

Hyperthreaded core sharing


The Hyperthreaded (HT) core sharing option enables us to define the different types of physical core sharing techniques with the virtual machines.

A Hyperthreaded processor (or lCPU) has the same number of function units as an older, non-Hyperthreaded processor. HT offers two execution contexts, so that it can achieve better function unit utilization by letting more than one thread execute concurrently. On the other hand, if you're running two programs which compete for the same function units, there is no advantage at all on having both running concurrently. When one is running, the other is necessarily waiting on the same function units.

A dual core processor has two times as many function units as a single-core processor, and can really run two programs concurrently with no competition for function units. A CPU socket can contain multiple cores. Each core can do CPU-type work. Twice as many cores will be able to do (roughly) twice as much work. If a core also has Hyperthreading enabled, then each core has two logical processors. However, two lCPUs cannot do twice as much work as one.

Getting ready

To step through this recipe, you need a running ESXi Server, a running virtual machine, VMware vCenter Server, and a working installation of vSphere Client. No other prerequisites are required.

How to do it...

Let's get started:

  1. Open up VMware vSphere Client.

  2. Log in to the vCenter Server.

  3. From the home screen, navigate to Hosts and Clusters.

  4. Expand the left-hand navigation list.

  5. Navigate to any one of the virtual machine.

  6. Right-click on the virtual machine and select Edit Settings.

  7. Click on the Resources tab.

  8. Click on Advanced CPU.

  9. Under Hyperthreaded Core Sharing, use the drop-down list to select any one of the available options.

There are three different HT sharing methods, as follows:

  • Any

  • None

  • Internal

How it works...

The following table elaborates the three methods of core sharing:

Option

Description

Any

The default for all virtual machines on a Hyperthreaded system. The virtual CPUs of a virtual machine with this setting can freely share cores with other virtual CPUs from this or any other virtual machine at any time.

None

Virtual CPUs of a virtual machine should not share cores with each other or with virtual CPUs from other virtual machines. That is, each virtual CPU from this virtual machine should always get a whole core to itself, with the other logical CPUs on that core being placed into the halted state.

Internal

This option is similar to none. Virtual CPUs from this virtual machine cannot share cores with virtual CPUs from other virtual machines. They can share cores with the other virtual CPUs from the same virtual machine. You can select this option only for SMP virtual machines. If applied to a uniprocessor virtual machine, the system changes this option to none.

These options have no effect on the fairness or CPU time allocation. Regardless of a virtual machine's hyperthreading settings, it still receives CPU time proportional to its CPU shares, and constrained by its CPU reservation and CPU limit values.

There's more...

If there are running VMs on the same virtual infrastructure cluster with different numbers of vCPU (for example, one vCPU and two vCPUs) then there is a good chance that one vCPU of your dual vCPU VM can work alone on one physical CPU and the other vCPU has to share a physical CPU with another VM. This causes tremendous synchronization overhead between the two vCPUs (you don't have this in physical multi-CPU machines because this sync is hardware based) which can cause the system process within the VM to go up from 50 percent to 100 percent CPU load.

 

Spotting CPU overcommitment


When we provision the CPU resources, which is the number of vCPUs allocated to running the virtual machines and that is greater than the number of physical cores on a host, is called CPU overcommitment.

CPU overcommitment is a normal practice in many situations; however, you need to monitor it closely. It increases the consolidation ratio.

CPU overcommitment is not recommended in order to satisfy or guarantee the workload of a tier-1 application with a tight SLA. CPU overcommitment may be successfully leveraged to highly consolidate and reduce the power consumption of light workloads on modern, multi-core systems.

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of running CPU-hungry virtual machines, a SSH client (Putty), vCenter Server, and a working installation of vSphere Client. No other prerequisites are required.

The following table elaborates on Esxtop CPU Performance Metrics:

Esxtop Metric

Description

Implication

%RDY

Percentage of time a vCPU in a run queue is waiting for the CPU scheduler to let it run on a physical CPU.

A high %RDY time (use 20 percent as a starting point) may indicate the virtual machine is under resource contention. Monitor this; if the application speed is ok, a higher threshold may be tolerated.

%USED

Percentage of possible CPU processing cycles which were actually used for work during this time interval.

The %USED value alone does not necessarily indicate that the CPUs are overcommitted. However high %RDY values, plus high %USED values, are a sure indicator that your CPU resources are overcommitted.

How to do it...

To spot CPU overcommitment there are a few CPU resource parameters which you should monitor closely. Those are:

  1. Log in to the ESXi Server through the SSH client.

  2. Type esxtop and hit enter.

  3. Monitor the preceding values to understand CPU overcommitment.

This example uses esxtop to detect CPU overcommitment. Looking at the pCPU line near the top of the screen, you can determine that this host's two CPUs are 100 percent utilized. Four active virtual machines are shown, Res-Hungry-1 to Res-Hungry-4. These virtual machines are active because they have relatively high values in the %USED column. The values in the %USED column alone do not necessarily indicate that the CPUs are overcommitted. In the %RDY column, you see that the three active virtual machines have relatively high values. High %RDY values, plus high %USED values, are a sure indicator that your CPU resources are overcommitted.

From the CPU view, navigate to a VM and press the E key to expand the view. It will give a detailed vCPU view for the VM. This is important because at a quick level, CPU ready as a metric is best referenced when looking at performance concerns more broadly than a specific VM. If there is high ready percentage noted, contention could be an issue, particularly if other VMs show high utilization when more vCPUs than physical cores are present. In that case, other VMs could be leading to high ready time on a low idle VM. So, long story short, if the CPU ready time is high on VMs on a host, it's time to verify that no other VMs are seeing performance issues.

You can also use vCenter performance chart to spot the CPU overcommitment, as follows:

  1. Log in to the vCenter Server using vSphere Client.

  2. On the home screen, navigate to Hosts and Clusters.

  3. Go to the ESXi host.

  4. Click on the Performance tab.

  5. Navigate to the CPU from the Switch To drop-down menu on the right-hand side.

  6. Navigate to the Advanced tab and click on the Chart Options.

  7. Navigate to the ESXi host in the Objects section.

  8. Select only Used and Ready in the Counters section and click on OK.

Now you will see the ready time and the used time in the graph and you can spot the overcommitment. The following screenshot is an example output:

The following example shows that the host has high used time.

How it works...

Although high ready time typically signifies a CPU contention, the condition does not always warrant corrective action. If the value for ready time is also accompanied by high used time then it might signify that the host is overcommitted.

So used time and ready time for an host might signal contention. However, the host might not be over-committed, due to workload availability.

There might be periods of activity and periods that are idle. So the CPU is not over-committed all the time. Another very common source of high ready time for VMs, even when pCPU utilization is low, is due to storage being slow. A vCPU, which occupies a pCPU, can issue a storage I/O and then sits in the WAIT state on the pCPU blocking other vCPUs. Other vCPUs accumulate ready time; this vCPU and this pCPU accumulate wait time (which is not a part of the used or utilized time).

 

Fighting guest CPU saturation in SMP VMs


Guest CPU saturation happens when the application and operating system running in a virtual machine use all of the CPU resources that the ESXi host is providing for that virtual machine. However, this guest CPU saturation does not necessarily indicate that a performance problem exists.

Compute-intensive applications commonly use all of the available CPU resources, but this is expected and might be acceptable (as long as the end user thinks that the job is completing quickly enough). Even less-intensive applications might experience periods of high CPU demand without experiencing performance problems. However, if a performance problem exists when guest CPU saturation is occurring, steps should be taken to eliminate the condition.

When a virtual machine is configured with more than one vCPU but actively uses only one of those vCPUs, resources that could be used to perform useful work are being wasted. At this time you may see a potential performance problem—at least from the most active vCPU perspective.

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of running CPU-hungry virtual machines, vCenter Server, and a working installation of vSphere Client. No other prerequisites are required.

How to do it...

To spot CPU overcommitment in the guest OS there are two CPU resource parameters which you should monitor closely as follows:

  • The ready time

  • The usage percentage

  1. Log in to the vCenter Server using vSphere Client.

  2. From the home screen, navigate to Hosts and Clusters.

  3. Expand the ESXi host and go to the CPU hungry VM.

  4. Click on the Performance tab.

  5. Navigate to the CPU from the Switch To drop-down menu on the right-hand side.

  6. Navigate to the Advanced tab and click on the Chart Options.

  7. Select only Usage Average in Percentage, Ready, and Used in the Counters section and click on OK

The preceding example shows the high usage and used value. We can see it is 100 percent.

The preceding example shows that after the CPU increase in the VM, the percentage of CPU usage dropped down to 52 percent.

How it works...

So for a SMP VM if you see it is the high CPU resources demanding, it may happen that either the application is single threaded or the guest operating system is configured with uniprocessor HAL.

Many applications are written with only a single thread of control. These applications cannot take advantage of more than one processor core.

In order for a virtual machine to take advantage of multiple vCPUs, the guest operating system running on the virtual machine must be able to recognize and use multiple processor cores. If the virtual machine is doing all of its work on vCPU0, the guest operating system might be configured with a kernel or a HAL that can recognize only a single processor core.

You have two possible approaches to solving performance problems related to guest CPU saturation:

  • Increase the CPU resources provided to the application.

  • Increase the efficiency with which the virtual machine uses CPU resources.

Adding CPU resources is often the easiest choice, particularly in a virtualized environment. If a virtual machine continues to experience CPU saturation even after adding CPU resources, the tuning and behavior of the application and operating system should be investigated.

 

Controlling CPU resources using resource settings


If you cannot rebalance the CPU load or increase the processor efficiency even after all of the recipes discussed earlier, then it might be something else which keeps the host CPU still saturated.

Now that could be a resource pool and its allocation of resources towards the virtual machine.

Many applications, such as batch jobs, respond to a lack of CPU resources by taking longer to complete but still produce correct and useful results. Other applications might experience failure or might be unable to meet the critical business requirements when denied sufficient CPU resources.

The resource controls available in vSphere can be used to ensure that the resource-sensitive applications always get sufficient CPU resources, even when host CPU saturation exists. You need to make sure that you understand how shares, reservations, and limits work when applied to resource pools or to individual VMs. The default values ensure that ESXi will be efficient and fair to all VMs. Change from the default settings only when you understand the consequences.

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of running CPU hungry virtual machines, vCenter Server, and a working installation of vSphere Client. No other prerequisites are required.

How to do it...

Let's get started:

  1. Log in to the vCenter Server using vSphere Client.

  2. From the home screen, navigate to Hosts and Clusters.

  3. Expand the ESXi host and navigate to the CPU hungry virtual machine.

  4. Click on the Performance tab.

  5. Go to CPU from the Switch To drop-down menu on the right-hand side.

  6. Go to Advanced tab and click on the Chart Options.

  7. Select only Ready and Used in the Counters section and click on OK.

Now if there is a lower limit configured on the VM and at the same time if it is craving for a resource, then you will see a high ready time and a low used metric. An example of what it may look like is given in the following image:

Look at the preceding example and see when the VM is craving for more CPU resource, if you put a limit on top of it, then it will experience a high ready time and a low used time. Here in the above example this VM is set with a limit of 500MHz.

Now to rectify this, we can change the limit value and the VM should perform better with a low ready time and a high used value.

  1. Right-click on the CPU-hungry virtual machine and select Edit Settings.

  2. Click on the Resources tab.

  3. Click on CPU.

  4. Change the Share Value to High (2000 Shares).

  5. Change the Limit value to 2000MHz and Reservation to 2000MHz.

  6. Click on OK.

Now the VM should look and perform as shown in the following screenshot:

 

What is most important to monitor in CPU performance


Before you jump onto conclusion as to what to monitor for the CPU performance, you need to make sure that you know what affects the CPU performance. Things that can affect the CPU performance include:

  • CPU affinity: When you pin down a virtual CPU to a physical CPU, it may happen that your resource gets imbalanced. So, this is not advised until you have a strong reason to do that.

  • CPU prioritization: When CPU contention happens, the CPU scheduler will be forced to prioritize VMs based on entitlement and queue requests.

  • SMP VMs: If your application is not multithreaded then there is no benefit adding more CPU resources in VMs. In fact, the extra idle vCPUs add overhead that prevent some more useful work from being done.

  • Idle VMs: You may have too many idle VMs, which you think should not eat up resources. However, in reality CPU interrupt, shares, reservations, and specially limit settings can still be created for those VMs if they were changed from their default settings.

So, now you know what affects the CPU performance. You can now look at what it takes to monitor the CPU performance.

You can categorize the factors that should be monitored for the CPU performance into three main sections:

  • Host CPU usage

  • VM CPU usage

  • VM CPU ready time

To monitor these sections you need to know the esxtop counters and those are:

  • PCPU Used (%)

  • Per group statistics

    • %Used

    • %Sys

    • %RDY

    • %Wait

    • %CSTP

    • %MLMTD

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of running CPU-hungry virtual machines, and a SSH Client (for example, Putty). No other prerequisites are required.

How to do it...

Let's get started:

  1. Log in to the ESXi host using SSH client (Putty).

  2. Run esxtop and monitor the statistics. The following screenshot is an example output:

  3. Now look at the performance counters as mentioned previously. In the following example output, look at the different metrics.

In the preceding example, you can see our pCPU 0 and pCPU 1 are heavily being used (100 percent and 73 percent UTIL respectively) and it shows the following figure:

Now in the preceding example, you see that the %Used value for the four CPU-hungry virtual machines are pretty high.

Also look at the %RDY screen, and you will see a high ready time, which indicates a performance problem.

The following list is a quick explanation for each of these metrics:

  • PCPU USED (%): This is the CPU utilization per physical CPU.

  • %USED: This is the physical CPU usage by per group.

  • %SYS: This is the VMkernel system activity time.

  • %RDY: This is the ready time. This is referred as the amount of time that the group spent ready to run, but waiting for the CPU to be available. Note that this is not adjusted for the number of vCPUs. You should expand the group to see %Ready for each vCPU, or at least divide this by the number of vCPUs to use an average per vCPU.

  • %WAIT: This is the percentage of time spent in blocked or busy state. This includes idle time and also the time waiting for I/O from the disk or network.

  • %CSTP: This is referred as the percentage of time spent in the VMkernel, on behalf of the group for processing interrupts. %CSTP for a vCPU is how much time the vCPU spent not running in order to allow the extra vCPUs in the same VM to catch up. High values suggest that this VM has more vCPUs than it needs and the performance might be suffering.

  • %MLMTD: This is the amount of time spent ready to run, but not scheduled because of a CPU limit.

 

CPU performance best practices


CPU virtualization adds varying amount of overhead, because of this you may need to fine tune the CPU performance and need to know what are the standard best practices.

Following are the standard CPU performance best practices:

  • You need to avoid using SMP VMs unless it is required by the application running inside the guest OS. That means if the application is not multithreaded then there is no benefit of using SMP VM.

  • You should prioritize the VM CPU usage with proportional share algorithm.

  • Use DRS (Distributed Resource Scheduler) and vMotion to redistribute VMs and reduce contention.

  • Use the latest available virtual hardware for the VMs.

  • Reduce the number of VMs running inside a single host. This way, you can not only reduce the contention, but also reduce the fault domain configuration.

  • You should leverage the application tuning guide from the vendor to tune your VMs for best performance.

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of running virtual machines, and a working installation of vSphere Client. No other prerequisites are required.

How to do it…

Let's get started:

  1. For the first best practice, you need to check whether the application is single threaded or multi-threaded. If it is single threaded, then avoid running SMP VM.

  2. You need to log in to vCenter using vSphere Client, then go to the Home tab. Once there, go to the VM and look at the Summary tab.

  3. Now you can see whether the VM has one vCPU or multiple vCPUs. You see whether it's using them by looking at %Utilization or similar metric for each vCPU. This Summary tab doesn't tell us whether the app is single threaded or multi-threaded.

  4. For the second best practice, you need to prioritize the VM CPU using shares and reservation. Depending on the customer SLA, this has to be defined.

  5. You need to log in to the vCenter using vSphere Client, then go to the Home tab. Once there, go to the VM, right-click on it, and then select Edit Settings.

  6. Now go to the Resources tab and select CPU. Here, you need to define the Shares and Reservation values depending on your SLA and the performance factors. By default, ESXi is efficient and fair. It does not waste physical resources. If all the demands can be met, all will. If not all demands can be satisfied, the deprivation is shared equitably among VMs, by default.

    VMs can use, and then adjust the shares, reservations, or limits settings. But be sure that you know how they work first.

  7. For the third best practice, you need to have a vSphere Cluster and have DRS enabled for this. DRS would load balance the VMs across the ESXi hosts using vMotion.

    The first screenshot shows that the DRS is enabled on this vSphere Cluster:

    The second screenshot shows the automation level and migration threshold.

  8. For the fourth best practice, you first need to see what virtual hardware the VM is running on, and if it is not current then you need to upgrade that. Virtual hardware version can limit the number of vCPUs.

  9. You need to log in to the vCenter using vSphere Client, then go to the Home tab. Once there, go to VM and look at the Summary tab.

  10. In the following example it is hardware Version 8, which is old and we can upgrade it to hardware Version 9.

  11. Now to upgrade the virtual hardware of a VM, it has to be powered off and then right-click on the VM and go to Upgrade Virtual Hardware. It should give you a warning.

    Tip

    Take a snapshot prior to upgrading in order to mitigate the rare occurrence of a failure to boot the Guest Operating System after upgrading.

  12. Once you click on OK, the virtual hardware version will be upgraded.

  13. For the fifth recommendation, you need to limit the number of vCPUs required by the VMs that would run on the host and the number of sockets/cores available in each physical host. Remember the golden rule of "Don't keep all your eggs in one basket" can be retrieved based on fault domain tolerance and customer SLA. There is no simple answer to this. Monitor the VMs for performance and adjust as necessary.

  14. For the last recommendation, you need to get the vendor application tuning guide and follow that to tune your virtual environment. A typical example is Exchange 2010 Best Practices guide on VMware.

About the Author

  • Prasenjit Sarkar

    Prasenjit Sarkar is a product manager at Oracle for their public cloud, with a focus on cloud strategy, Oracle Ravello, cloud-native applications, and the API platform. His primary focus is driving Oracle's cloud computing business with commercial and public sector customers, helping to shape and deliver a strategy to build broad use of Oracle's Infrastructure as a Service offerings, such as Compute, Storage, and Database as a Service. He is also responsible for developing public/private cloud integration strategies, customers' cloud computing architecture visions, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions that Oracle can offer.

    He has also authored six industry-leading books on virtualization, SDN, and physical compute, among others.

    He has six successful patents and six more patents pending at the US PTO. He has also authored numerous research articles.

    Browse publications by this author
Book Title
Access this book, plus 7,500 other titles for FREE
Access now