- VMware Performance and Capacity Management(Second Edition)
- Iwan 'e1' Rahabok
- 523字
- 2021-07-09 20:00:28
Performance versus capacity
Now that we know what IaaS performance is, what metric should we use to measure it?
A lot of customers mistake capacity with performance. They associate low utilization with high performance, for example:
- Low ESXi CPU utilization means its performance is good (it is fast)
- Low VM CPU utilization means its performance is good (it is fast)
If you ponder these points, you will see the failure in the logic. There are several reasons why your ESXi utilization is irrelevant:
- Whether the ESXi has high or low utilization has nothing to do with its performance. An ESXi does not become slower as its utilization goes from 5 percent to 50 percent. It's still running at the same speed!
- An ESXi with low utilization has a better chance of serving all its VM better than an ESXi with high utilization. There is certainly a correlation. The question is, how do you quantify that correlation? You cannot say that 25 percent ESXi CPU utilization means none of its VMs has CPU contention. Defending your IaaS by citing your ESXi utilization is not a good idea.
- VM utilization can certainly impact its own performance. While that specific issue is about performance, it is not about your IaaS performance. If a VM has four vCPUs, and it's using all four at 100 percent, your IaaS role is to provide the resources as per your SLA. Whether that's enough to meet the VM's business requirement is between the VM owner and the business owner. A VM can certainly have performance issues while your IaaS has excellent performance.
Let's use a table to compare performance and capacity. It helps to disassociate VM performance from IaaS performance:

Now that we know their differences, we can expect that they use different counters:

Can you notice which counters are not included in the preceding table?
Commonly used counters such as VM CPU utilization and VM RAM utilization have been excluded. As discussed, these counters are not counters you can use to quantify your IaaS performance or IaaS capacity. They are counters for inpidual VMs.
Bonus question! Can you notice what else is missing?
There are counters that are specific to SDDC that give you a clue on the performance of your IaaS. A poor value on these counters will hit performance. Examples include:
- VM kernel network latency, network dropped packets, and packet retransmits
- vMotion stunned time and vMotion downtime
- VSAN SSD cache hit rate
- NSX Edge VM performance (as it is is in the datapath)
- F5 load balancer (as it in the datapath)
- Horizon View Security Server performance (as it is in the datapath)
- Trend Micro Deep Security virtual appliance (as it is in the datapath)
Do you then monitor them all?
No. They are secondary counters. All these counters are required for performance troubleshooting. They are not required for performance monitoring. If you need more details, review Chapter 3, SDDC Management, where we covered the difference between monitoring and troubleshooting.
I think by now it should be clear to you. To help you explain it to your management and customers, I've created the following diagram:

Performance and Capacity