This post was written by Sander Rodenhuis and Posted on 3 november 2017

Some time ago I had to write an RFP to select a new hardware platform for a new private cloud solution based on Vmware VSAN 6.2. In this post I will explain how we approached this, the challenges we faced and I will share some best practices based on my experiences.

Calculating the 10% cache size

I wanted to keep the requirements in the RFP as compact as possible. Therefor we decided to define the VM capacity required (based on the VSAN TCO calculator environment requirements) and stated that the sizing needed to comply to the VSAN Storage Design Considerations, as documented in the VSAN 6.2 Design and Sizing Guide. We were aiming at a full flash solution. This would offer the ability to take advantage of using deduplication, compression and VSAN erasure coding (in combination with RAID-5/6), because space saving was a high priority.

When we received the quotations, we had to check if the offered cache capacity would meet our requirements. Because we had shared the sizing inputs based on the TCO calculator, we thought we could check the if the offered solution would meet our sizing requirements by filling in the offered specifications by customizing the ready nod configuration. But during this time, the new VSAN TCO calculator (for full flash solutions) wasn’t yet available and the outcome of the calculator differed with the offered specifications. This meant we had to check the calculations ourselves.

How remarkable, it turned out that all vendors had used a different approach on how to calculate the 10% cache size. One vendor calculated the cache size by taking the total capacity of two RAID-5-6 disk group without FTT=1. Another vendor just took 10% of the total capacity per host without FTT=1.

I calculated it based on the required VM disks, FFT=1 and included the RAID-5/6 overhead (1.33), which (of course) resulted in a different cache size. Here is a good post on the 10% rule for VSAN caching by Duncan Epping.

Best practice 1

So if you are planning to write a RFP to select a new hardware platform, be sure to specify exactly how the cache size needs to be calculated (or even better, calculate it yourself). If you are not sure on how to calculate the exact storage requirements, ask someone who can. Note that there is also an update on how to calculate the cache size in an all flash scenario. Read the full update here. There is also a new version of the VSAN TCO Calculator, specifically for an all flash scenario. You can find it here.

Specifying a vCPU:pCPU ratio

Now this is a topic that always seems to lead to some discussion: what is a vCPU:pCPU ratio? A vCPU:pCPU ratio defines how many vCPU can be appointed to a single physical CPU. For example: vSphere will see each physical core as a single vCPU (or two when hyperthreading is enabled). If we say that the ratio vCPU:pCPU ratio is 5:1, then this means that for every vCPU vSphere sees, 5 vCPUs can be assigned to VMs (or 10 if hyperthreading is enabled). In the VSAN TCO calculator is isn’t possible to specify this. Now I must say that in our case the workloads were basically idle. In this case defining a ratio was relevant for capacity planning because the amount of required storage capacity was relatively high. If we didn’t define a ratio, we would end up with too much storage capacity (because we would end up with a lot of hosts because of an unnecessary high CPU requirement. Also keep in mind that VSAN has some overhead on CPU. VSAN has been designed to consume no more than 10% of CPU resources.

Best practice 2

When storage capacity is more important than CPU capacity, specify a vCPU:pCPU ratio. There has been a lot of discussion on what would be a preferred ratio (and if you would really want to do it). In this Vmware blog, Mark Achtemichuk explains why he thinks we need to move away from static ratios, but instead provide value by ensuring efficient consumption of hardware investments, and support the ever increasing dynamic nature of the business. But try to avoid to end up with an overprovisioned hardware environment! That’s why in some scenarios defining a static ratio as a requirement is still opportune.