Sizer Thresholds – What has changed and Why?
What are thresholds in Sizer?
Sizer has a feature called thresholds. These are defined individually for each of the sizing resources – cores, memory, SSDs, HDDs & GPUs (wherever applicable). These thresholds ensure that the total available resources in the nodes(cluster) are sufficient to meet the workload requirements and also account for some buffers for the unforeseen surges in workload applications
What has changed in thresholds?
Up until July 2021, the threshold defaults across cores/memory/SSD/HDD used to be 95% as can be seen(and modified) under the policy screen as shown below.
Note that the default was set to 95% which is also the maximum allowed. Users can go for a lower threshold (more conservative sizing with more buffer for future spikes). However, under no circumstances, sizer allowed to go higher than the default – greater than 95% – to provide for a 5% margin for accomodating sizing errors/estimates and workload usage uncertainties.
Starting August 2021, Sizer would be changing the defaults for these thresholds to 85% across ALL resources[cores/memory/ssds/hdds) as shown below.
Note that the defaults have moved left to 85% , however, the maximum allowable utilization of the cluster resources still remains at 95%.
Why?
Why the change?
Having the maximum allowable and default both at 95% at times did not provide enough margin for sizing estimate errors or unforeseen workload usage or spikes as only 5% left. Given making accurate estimates is hard, we felt it was prudent to provide more slack with a 85% threshold.
To be clear though, many sizings have been done successfully at the old 95% level. This move was also supported by Sizer users doing manual sizings who often opted for more slack. This change was done to be more prudent versus any sizing issue.
When is it best to leave it at 85% Threshold
We feel for most sizings this is the more prudent level. Allows for more room for estimate errors and for that matter customer growth
When might it be fine to go to 95% Threshold
Certainly numerous sizings have been done with 95% threshold and customers were happy. We still do allow 95% to be the threshold. These are the N+0 thresholds and so at N+1 there is a lot more slack. The 95% level hits when one node is taken offline like for upgrades. If the customer does upgrades during off-hours, their Core and RAM requirements are a lot less than normal and do not hit the higher threshold anyway. Again we feel it is more prudent to leave it at 85%, and going higher just means you need to be comfortable with your sizing estimates and especially when the cluster is at N+0 (during an upgrade)
What are the implications to existing sizings?
First-the new sizings :
All new sizings (effective 9th August 2021) will have default thresholds at 85%. And since it is a significant change which impacts ALL new sizings and ALL users(internal/partners/customers), there will be a BANNER displayed prominently for two weeks for general awareness.
Implications to existing sizings :
There will be NO impact or implication to the sizings created before 9th August 2021. The existing sizings would continue with the default threshold of 95% and would calculate the Utilisation %ages, N+0,1 etc based on the previous default threshold of 95%. Thus, there won’t be any resizing or a new recommendation for existing sizings and those sizings and their recommendation holds good for that scenario.
Cloning an existing scenario:
Cloning an existing sizing will be treated as a new sizing created after 9th August,2021 and thus, new sizing rules and default thresholds will apply.
One implication of this can be that there will be an increase in utilisation %ages across the cluster resources. This is because now, only 85% of the resources would be considered available for running the workload as against 95% earlier. This unavailability or in other words – reservation – of additional 10% of resources may drive to a higher node count (or make an existing N+1 solution as N+0) in some edge circumstances.
User can choose to resize for the new defaults , which may lead to higher node or core count – but that is for the better-as explained above-providing for margings and spikes – or- since it is a clone for an exsiting sizing which may have been sold to the customer – user can , alternatively, go to the threshold setting and move it to the right- back at 95%- which would then give back the same recommendation as the original sizing.