Feb 2024 release

New workload : Prism Central

  • Prism Central as a new sizing option in the workload dropdown.
  • Size for Single VM / Scale-out deployment options
  • Considers resource requirements for add-ons : Self Service/Flow/Prism Operations
  • Please note that this is an added option where users can manually add PC to the cluster. In addition, selecting Prism Central from solutions option continue to exist.

Split large workloads into multiple clusters:

  • Earlier, extremely large workloads gave a no optimal solution message when the workloads do not fit in a single cluster.
  • With split workload functionality, Sizer automatically creates multiple clusters and splits workloads across the clusters optimally.
  • Currently supported for Cluster(raw) sizing only
  • This should be especially helpful in two scenarios: large imports from Collector and sizing on NC2 on AWS/Azure with smaller max node limit.

Insights imports: Auto create existing cluster

  • When importing from Insights, there is an option to auto-recreate existing cluster.
  • This will create the cluster with the existing HW configuration (today it defaults to latest gen models and users must manually configure existing HW)
  • Only additional nodes need to be manually configured if considering for expansion.  

Proposals:

  • Latest HW spec on NX G9 and HPE DX G11

Usability enhancements:

  • View workload:
    • Currently it captures only essential workload inputs.
    • Now there is a ‘View All details’ that opens the workload page in view mode to view all inputs.
    • No need to clone just to see workload inputs.

Platforms: Dell 16G (Intel SPR) platforms

A short demo on splitting workloads into multiple clusters:

Storage calculation for clusters with mixed capacity nodes

This article explains the logic for storage calculations for clusters having node with different storage capacities

What has changed?

Previously, the capacity calculations had been based on aggregate capacity across nodes in the cluster. This total capacity used to be the base for calculating usable and effective capacity in the cluster.

For example: Consider 3 nodes , N1=20TB, N2=20TB and N3=10TB

Based on the above, the total capacity available is 20+20+10 = 50TB and assuming (N+1) , the available nodes are N2+N3 = 30TB. Thus, 15TB can be used for data and 15TB for RF (assuming RF2)

With the new update: Sizer also ensures that the RF copy of the data and the data itself do not share the same node.

In the above example: after N+1, two nodes are available N2 = 20TB ,N3 = 10TB.

If we allow writing 15TB of data (and 15 for RF), part of the data and RF has to be on same node as N3 is only 10TB. So, to ensure the RF copy and data is on separate nodes, the usable storage in this case would be 20TB ( 10TB of data on N2 and its RF on N3 or vice versa).

Note: Although the same logic is used for both homogeneous and mixed capacity clusters, the difference is seen primarily for the mixed capacity clusters.

Here is a detailed write up on how the usable storage is calculated for clusters with mixed capacity nodes for different scenarios across RF2 and RF3

Algorithm for RF2

If we have only one node with non-zero capacity Cx, then in RF2 the replication is done between the different disks of the same node and hence the extent store in this case will be Cx / 2 (RF), else one of the below cases applies. Let us say, we have nodes with capacities C1, C2, C3, …., C[n] which are in sorted order according to their capacities. There are 2 cases to consider for RF2, to compute the effective raw storage capacity:

Case-1: C1 + C2 + C3 + …. + C[n-1] <= C[n] 
If this is the case, then the total amount of storage that can be replicated with a factor of 2 is  ∑(C1, C2, C3, …., C[n-1])

Case-2: C1 + C2 + C3 + …. + C[n-1] > C[n]
If this is the case, then the (total storage capacity) / 2 (RF) can be replicated among the available nodes. In other words, half the total capacity can be replicated.

Algorithm for RF3

Let us say, we have nodes with capacities C1, C2, C3, …., C[n] which are in sorted order according to their capacities. Algorithm for RF3 is slightly different from that of RF2 because we need to accommodate the replica of data on 2 nodes, as opposed to a single node on RF2.

  1. Since there are 3 replicas to place, we calculate the capacity difference between the 2nd largest (C[n-1]) and the 3rd largest (C[n-2]) entities as ‘diff’. This information is necessary so that given an optimal placement scenario where the first replica is placed on the entity with the smallest capacity, the second replica is placed on the entity with the largest capacity (C[n]) and the third replica is placed on the entity with the 2nd largest capacity (C[n-1]); the difference between the 2nd and the 3rd largest capacities ((C[n-1]) – (C[n-2])) will help us quickly deduce when the 2nd largest entity will become equal to the 3rd largest entity by virtue of space consumed on the former via replica placement.
  2. By deducting either the ‘diff’ calculated above (or) the capacity of the smallest entity and simulating RF3 placement such that C[n-2] and C[n-1] have now become equal (note that the difference between C[n] and C[n-1] will remain constant during this since the same capacity is deducted from both of them), in O(N) we arrive at the possibility of:
    • Case-1:  Only 3 entities remain with non-zero capacities, in which case the amount of data that can be accommodated among these 3 nodes with RF of 3 (one actual node and 2 replicas) is the smallest remaining capacity, which is C[n-2].
    • Case-2:There is capacity left in C[n-3] (i.e. the 4th largest entity) and any number of nodes before it (i.e., C[n-4], C[n-5], … etc) and C[n-2] == C[n-1] (i.e. the capacities remaining on the third and the second largest entities have become equal). This is because at this point, the capacity on the smallest entity remaining (the smallest non-zero entity before C[n-2] i.e) is greater than C[n-1] – C[n-2], indicating that after placing the first replica on C[n] and second replica on C[n-1], the time has come where the capacity on C[n-1] == C[n-2]. At this point, for the next bytes of data, the second replica will go to C[n] while the third replica will be round robin-ed between at least 2 (or more) entities. Now in this scenario as well, 2 cases can arise:
      • Case-2(a): (C1 + C2 + … + C[n-1]) / 2 <= C[n]
        Now, if C[n]’s capacity is so high that it means that for every 1st and 3rd replicas placed on the lowest capacities nodes upto C[n-1], the second replica always finds space on C[n], then it implies that, if (C1 + C2 + … + C[n-1]) / 2 <= C[n], then the amount of storage that can be accommodated on available nodes with RF of 3 is the lowest among the two sides of the above equation i.e., (C1 + C2 + … + C[n-1]) / 2, as we cannot consume the full space on C[n].
      • Case-2(b): (C1 + C2 + … + C[n-1]) / 2 > C[n]
        But if C[n]’s capacity is not so high as in case (a), i.e., (C1 + C2 + … + C[n-1]) / 2 > C[n], then replica placements for one of the replicas will be on the largest entity C[n], while the other two replicas will round-robin amongst the other largest capacity entities (since the capacities remaining on at least 2 entities C[n-2], C[n-1] are already equal). This will continue until C[n] becomes equal to C[n-1], which is guaranteed to happen eventually because the replicas consume space on C[n] at least at a rate double than C[n-1], C[n-2], … From that point, both the second and the third replicas will continue being round robin-ed across all the remaining entities, and thus all the capacities remaining at that point can be fully consumed. Hence, in this case, the amount of storage that can be accommodated is the sum of all remaining (non-zero) entities divided by 3 (RF).

Terminologies

Effective Usable Capacity = 95% of (Raw capacity – failover capacity based on RF)

95% because AOS stops writing to the disk when the cluster utilization reaches 95%.

Effective Capacity = Effective Usable Capacity – CVM

Extent Store Capacity = Effective Capacity / RF

Effective Capacity without Saving = Extent Store Capacity

Effective Capacity with Savings = Extent Store Capacity + Savings (Storage Efficiency & Erasure Coding)

Adjusting Nutanix File node count in Manual

Right now Sizer’s automatic sizing does not take into account Erasure Coding savings when it figures out the recommendation.  Fortunately the savings is taken into account in terms of HDD utilization.

So the usable capacity and the utilization is accurate, but the recommended node count could be high depending on the settings for compression among other things

This is easy to check and adjust if needed.

Go to manual sizing and decrement the node count.

 

Two things to keep in mind

  1.  The minimum number of nodes for Files is 4 as there is a FSVM on three nodes and 4th node is needed for N+1  (one node can be taken offline and still 3 nodes to run the 3 FSVMs).  So 4 nodes are needed independent of capacity.
  2. Second, like any Nutanix cluster you want to make sure you still are at N+1.  Here is table that shows you max HDD utilization (Files is a HDD heavy workload) you want to assure N+1.   For example, if you have 6 nodes and the HDD utilization is UNDER 75% you can be assured that you are at N+1.  Here the N+0 target (utilization after lose a node) is 90%, meaning with a node offline the utilization is 90% or less.
Node N+0  Utilization Target Max Threshold  for N+1
4 90% 67.50%
5 90% 72.00%
6 90% 75.00%
7 90% 77.14%
8 90% 78.75%
9 90% 80.00%
10 90% 81.00%
11 90% 81.82%
12 90% 82.50%
13 90% 83.08%
14 90% 83.57%
15 90% 84.00%
16 90% 84.38%
17 90% 84.71%
18 90% 85.00%
19 0.9 85.26%
20 0.9 85.50%
21 0.9 85.71%
22 0.9 85.91%
23 0.9 86.09%
24 0.9 86.25%
25 0.9 86.40%
26 0.9 86.54%
27 0.9 86.67%
28 0.9 86.79%
29 0.9 86.90%
30 0.9 87.00%