Cold tier data adjusted in Hot tier

Sizer always finds ways to propose the most optimal solution in terms of resources and cost.

As part of this effort, in certain cases , Sizer adjusts the workload data suposed to be sitting on the cold tier storage(HDDs) onto the hot tier storage(SSDs).

This happens in circumstances where there is surplus of SSDs in the BOM which is unutilized. The unutilised flash capacity is used for cold tier data if it helps reduce the overall number of nodes or if it helps avoid adding additional disks to met large HDD requirement if the extra SSD capacity can meet the same. The defined threshold levels are maintained and does not change for this adjustment in particular.

The same can be seen in a separate row in the calculation table and the Sizer BOM . Sample below:

 

Backup Sizing

All  Nutanix workloads now support backups.    This does the following

  • For any workload you can define backup policy in terms of number of full and incremental backups.
  • When invoked Sizer computes the backup storage that is needed and puts that in a standalone cluster.  Only backup workloads can be included in the backup standalone cluster(s)
  • Sizer also allocates cores, ram and storage for 3rd party backup software in the backup cluster
  • In future,  you can specify the backup hardware that is to be used in the backup cluster(s).
  • Alternatively we do offer Files Pro and Buckets Pro standalone clusters as targets

The inputs are as follows

  • Recovery Point Objective – the time between last backup (be it incremental or full backup).  This represents what point in time you can recover data.
    • For example, you want to recover some information.  The last backup will have occurred less than or at most 24 hours ago
  • Backup cycles.  This would be the number of cycles you want retained
  • Full backups in a cycle.  Typically 1 but can be more.  Here all the data in the workload is backed up
  • Incremental backups in a cycle.  Typically several and amount of data is the % percent change * workload data
  • Retention in Days – Backup cycles * (Full Backups per cycle + Incremental backups per cycle)
  • Rate change – Percent change expected between incremental backups
  • Backup Target –  options for holding the data such as Files Pro
  • Standalone Cluster – Name of cluster that will hold the backups

ROBO VM Solution Sizing

Robo VM Solution

The idea of Robo VM Solution is to combine the sizing of Robo Models with Decoupled  Quoting (separate license and hardware).

So in the end you pay for Robo VM licenses and the ROBO hardware but NOT the AOS cores or SSD TiB capacity.

Here  I defined a couple workloads with total VM count of 100.   You can have as many as you want.

Then in the sizing panel selected Robo Models

The resulting budgetary quote shows you pay for the Robo VM licenses and the decoupled hardware.

Ok, there are some limits

No user VM can be more than

  • 32 GB  RAM (** this will be enforced in AOS)
  • 2 TiB  total for hdd and ssd storage per vm
  • 50 TiB total or hdd and ssd  storage within each standalone cluster

Can have multiple workloads that are assigned to a cluster.  The cluster limit though is 50 TiB

Can have multiple standalone clusters which can represent different sites.  So could have two clusters that are 40 TiB each and that is fine

  • No limit on cores

If any user VM exceeds those constraints then present following error message

“This exceeds the Robo VM limits and so please select Data Center Models”

 

N+0, N+1, N+2 Failover Indicator

N+0, N+1, N+2 Failover Indicator

This is a BIG sizing improvement in Sizer where Sizer will always tell you if you are at N+0, N+1 or N+2 failover level for all resources (CPU, RAM, HDD, SSD) for each cluster.

Now as you make   changes in automatic sizing or manual sizing you always know if you have adequate failover.  Best practice is N+1 so you can take down any one node  (e..g take one node offline for an upgrade) and customer workloads can still run.

This can be very hard to figure out on your own.  ECX savings for example varies by node count.  Heterogenous clusters mean you have to find the largest node for each resource.  Multiple clusters mean you have to look at each separately.  Sizer does this for you !!

Here is what you need to know. 

Let’s take a two cluster scenario.  One called Cluster-1 is Nutanix cluster running 900 users for VDI and the Files to support those users.  The other is a standalone cluster for Files Pro with 100TB of user data

All clusters:

In a multi-cluster scenario All Clusters just provides a summary.  Here it shows two clusters and the hardware for the clusters.  In regards to N+1 indicator on the lower left it shows the worse cluster.  Both are N+1 and so you see N+1.  Had any cluster been N+0 then N+0 would be shown.  Great indicator to show there is an issue with one of the clusters

 

 

File cluster

This is the Standalone cluster for Files.  You see the hardware used in the cluster.  You see the failover level for each resource (CPU, RAM, HDD, SSD).  N+2 would indicate possibly could have less of that resource but often product options force more anyhow.  This is cold storage intensive workload and so HDD is the worse case.

Cluster -1

This is the Nutanix cluster for the VDI users.  You see the hardware used in the cluster.  You see the failover level for each resource (CPU, RAM, HDD, SSD).  This is core intensive workload and so that is the worse case.

 

Usable Capacity

Usable Remaining Capacity is the amount of storage that is available to the customer AFTER workloads, RF, storage savings are applied.  It represents what they should have remaining once deployed.

Sizer presents the values in both RF2 and RF3.

Usable Remaining Capacity (Assumming RF2)

  • HDD Usable Remaining  Capacity = (Raw + Compression Savings + Dedupe Savings + ECX Savings – Workload – RF Overhead – CVM overhead ) / 2
  • SSD Usable Remaining  Capacity =  (Raw + Compression Savings + Dedupe Savings + ECX Savings – Workload – RF Overhead – CVM overhead + Oplog ) / 2
  • Notes:
    • Usable capacity is basically RAW + storage savings with data reduction techniques like compression less workload, RF overhead and CVM overhead.
    • If All Flash,  Compression Savings, Dedupe Savings , ECX Savings, RF Overhead,  and CVM overhead that would be attributed to HDD’s is applied to SSDs
    • For SSD Capacity, Oplog is included as part of CVM overhead for SSDs but also added back as it is a Write log and so is available for user data.

Extent Store and Effective Capacity

Extent Store

This is a concept that is used in the Nutanix Bible.  This is RAW capacity less CVM.  It represents the capacity that is available to a customer

 

Effective Capacity

Used in Storage Calculator or DesignBrewz.  This is the Extent Store * Storage Efficiency setting in  Storage calculator.  So if the Extent Store is 10TiB and the Storage Efficiency factor is set to 1.5:1 then the Effective Capacity is 15 TiB.   Storage Efficiency factor is the expected benefit of storage reduction approaches like compression, dedupe, ECX.  Effective Capacity then is what is hoped to be available with these reduction techniques

Cores (Actual Cores, Adjusted Weight, Memory Adjustments like Unbalanced DIMMs)

In Sizing Details you may see an odd number like 40.27 cores for RAW cores as shown below

Actual Core Capacity

This is the total number of cores in the recommendation.

By clicking on the tooltip by the node you get the information

So in this recommendation we have 3 nodes where each has 2 cpu and each cpu has 8 cores.  So the Actual core capacity is 3 nodes * 2 cpu/node * 8 cores/cpu = 48 cores

Applied Weight

 

Intel designs a wide range of cpus to meet different market needs.  Core count certainly varies but the speed of a core is not the same across all cpu’s.

We need some benchmark to adjust for the core speed differences.  We use SPECInt 2006.  It is the best benchmark in terms of being an industry standard where vendors who publish numbers have to use standard testing process and publically publish the numbers.  We see consistency as well for a given CPU across all the vendors.  Thus this is a good benchmark to use for us to adjust for different values

So applied weight is where we have adjusted the cores to the baseline processor which runs at 42.31 specints

Review the Processor Table page with their core count, specints, and adjusted cores

Using this example we have a recommendation of 3 nodes with each node have quantity 2 2620v4 processors.  The table (calculation is shown in that page too) shows the 2620 v4 adjusted cores is 14.9 cores with nodes having 2 cpus

Thus in this recommendation total effective cores is 14.91 cores/node * 3 nodes = 44.73 cores.  We take applied weight adjustment of -3.26

 

Memory Adjustments

Broadwell Processors

With Broadwell processors “unbalanced DIMM” configuration depends on how they are laid out on the motherboard.  When it occurs there is a 10% increased access latency

To determine whether Sizer takes a discount it takes total count of DIMMs in a node and divides by 4. If odd number then it is Unbalanced and Sizer applies the discount.
If even, then no reduction is needed

Example

12x32GB in a node. 12 DIMMs/4 = 3 and so unbalanced
8X32GB in a node 8 DIMMs/4 = 2 and so balanced

If unbalanced core capacity is  reduced.

– Actual Core Capacity = Cores/Node * Node count
– Applied Wieght = extra or less cores vs baseline
– Adjustment due to Memory Issues = -10% * (RAW Cores+Applied Wieght)

It should be noted that if single processor system then NO adjustment is needed.

Skylake Processors

Skylake processors is more complex compared to Broadwell in terms of whether a system  has unbalanced dimms

We now test for the following

  • CPU – skylake
  • Model – Balanced_Motherboard – true  (described below)
  • Memory bandwidth – go with slower figure for either memory or CPU.  If 2133 Mhz then -10% memory adjustment.   If 2400Mhz or 2666Mhz (most common with skylake models) we take a 0% adjustment

Like before, we find the DIMM count per socket.  There is typically 2 sockets (cpu’s) but can be 1 and starting to introduce 4 socket models

Using the quantity of DIMMs per socket we should apply following rules

If CPU is skylake

  • If dimm count per socket is 5,7,9,10,11 then the model is considered unbalanced and we need to take a -50% memory adjustment
  • if dimm count per socket is 2,3,4, or 12 it is balanced and memory adjustment = 0%
  • if model is balanced and DIMM count per socket is 6 or 8 then it is balanced and memory adjustment = 0%
  • if model is unbalanced and DIMM count per socket is 6 or 8 then it is unbalanced and memory adjustment = -50%

After determining the adjustment percent we would make the adjustment as we do currently

  • Actual core capacity = Total cores in the cluster
  • Applied weight = adjustment vs baseline specint
  • Adjustment = Adjustment Percent * (Actual core capacity – Applied weight)
With Skylake, it can matter how the DIMMs are arranged on the motherboard.  We have PM review that and so far all models are laid out in balanced fashion
Here is doc that shows the options

 

 

Processor Table

Here is the table of processors

SPEC 2017 Integer Ratings, refer to this Google Sheet

The data is based on the data published on

https://www.spec.org/cgi-bin/osgresults

SPECint Adjusted Core is simply adjusting cores vs a baseline CPU (Intel E5 2680 v2) of 4.44 SPEC 2017 Integer Ratings per Core

For example, the 2620 v4 has 16 cores but only at 4.14 SPEC 2017 Integer Ratings per core

  • SPEC 2017 Interger Ratings per adjusted cores = 16 * SPEC 2017 Integer Ratings int per core / Baseline = 16 * 4.14/4.44 = 14.91
  • Basically, this is saying the 2620 v4 has 16 cores, but it is equivalent to 14.91 baseline cores in 2 CPU nodes
  • For a single CPU, it would be just 14.91/2 = 7.455

Looking at a high-speed CPU, the 6128 has just 12 cores but screams at 6.89 SPEC 2017 Integer Rate per core

  • Specint Adjusted cores = 12 * specint per core/ baseline = 12 * 6.89/4.44 = 19.31
  • Basically, this is saying the 6128 has 12 cores, but it is equivalent to 19.31 baseline cores

For the complete list of CPUs supported by Sizer and their SPEC 2017 Integer Ratings, refer to this Google Sheet.

Continue reading “Processor Table”

CVM (Cores, Memory, HDD, SSD)

CVM Cores & Memory Overheads

The CVM (Controller VM) CPU core and memory requirements in Nutanix environments are determined by a couple of key parameters:

Total Capacity

  • The overall storage capacity of the node influences the number of CPU cores and the amount of memory allocated to the CVM.
  • Higher capacity nodes require more resources to manage storage operations efficiently.

Recovery Point Objective (RPO)

  • The RPO, which defines the acceptable amount of data loss in case of a failure, impacts the CVM resource allocation.
  • Lower RPOs require more CPU cores and memory to ensure data replication is quick and accurate.

For the actual resource requirements refer to HCI Node Capacity & CVM Resource Requirements.

CVM Storage Overheads

Key Parameters for CVM Storage Overhead:

  • Nutanix Boot + Home: 280 GB across 2 drives, always on Tier-1
  • OpLog: 600 GB, Tier-1
  • Hades Reservation: 100 GB per disk, irrespective of disk type
  • Cassandra: 3% of all drives, are always on Tier-1 storage
  • Curator: 2% of all drives, are always on Tier-2 storage
  • Ext4 Formatting: Varies by storage type (HDD, SSD, NVMe)

CVM Storage Overhead Summary:

Nutanix Boot + Home:

  • Overhead: 280 GB across 2 drives
  • Tier: Always on Tier-1 Storage

OpLog:

  • Overhead: 600 GB
  • Tier: Always on Tier-1 Storage

    Hades Reservation:

  • Overhead: 100 GB per disk
  • Tier: Irrespective of disk type

    Cassandra:

  • Overhead: 3% of all drives
  • Tier: Always on Tier-1 Storage

Curator:

  • Overhead: 2% of all drives
  • Tier: Always on Tier-2 Storage

    Ext4 Formatting:

  • HDD: 1.78% of drive size
  • SSD & NVMe: 2.6% of drive size

The overheads can be seen in Sizer by hovering over the info icon as seen below:

Continue reading “CVM (Cores, Memory, HDD, SSD)”

Getting 1 node or 2 node clusters

New rules in Sizer for Regular Models and ROBO Models (October 2018)

Regular Models

Rules

  • All models included
  • All use cases are allowed – main cluster application, remote cluster application and remote snapshots
  • 3+ nodes is recommended

Summary

    • This is default in Sizer and is used most of the time
    • Fits best practices for a data center to have 3 or more nodes
    • Huge benefit as Sizer user can stay in this mode to size for 1175s or other vendor’s small models  if they want 3+ nodes anyhow. No need to go to Robo mode

 

  • Note: This gets rid of previous Sizer user headache as they want to size these models for 3+ nodes and get confused where to go

 

What changes

  • The smaller nodes such as 1175S are included in the list for running main cluster applications vs just remote applications and remote snapshots

ROBO Models

Rules

    • All models  but only some can size for 1 or 2 node
    • All use cases – main cluster application, remote cluster application and remote snapshots
    • All models can 3+ nodes depending on sizing requirements

 

  • ONLY Certain Models (aka ROBO models) can be 1 or 2 node

 

    • Note there is no CPU restriction.  Basically PM decides what models are ROBO and they can be 1 or 2 cpu

Summary

  • User would ONLY need to go to ROBO if they feel the solution fits in 1 or 2 node
    • If the size of the workloads require 3+ nodes, Sizer would simply report the required nodes and it would be no different recommendation than in regular
    • They feel 1 or 2 node restrictions is fine.  
      • The list of robo models are fine for the customer
      • RF for 1 node is disk level not node level
      • Some workloads like AFS require 3 nodes and so not available

What changes

  • All models can be used in ROBO where before it was just the ROBO models

No quoting in Sizer for Robo

Currently there are minimum number of units or deal size when quoting Robo.  Sizer will size the opportunity and will tell you that you should quote X units.  Given it takes 10 or more units and possibly you want to club together multiple projects, we disabled quoting from Sizer when includes 1175S