ECX savings

What is ECX [Erasure Coding] in Nutanix ?

The Nutanix platform leverages a replication factor (RF) for data protection  and availability.  This method provides the highest degree of availability because it does not require reading from more than one storage location or data re-computation on failure.  However, this does come at the cost of storage resources as full copies are required.

To provide a balance between availability while reducing the amount of storage required, DSF provides the ability to encode data using erasure codes (EC).  Similar to the concept of RAID (levels 4, 5, 6, etc.) where parity is calculated, EC encodes a strip of data blocks on different nodes and calculates parity.  In the event of a host and/or disk failure, the parity can be leveraged to calculate any missing data blocks (decoding).

The number of data and parity blocks in a strip is configurable based upon the desired failures to tolerate.  The configuration is commonly referred to as the number of <data blocks>/<number of parity blocks>.

How is ECX savings calculated in Sizer ?

Sizer follows the Nutanix Bible and its guidelines for ECX savings.

Below table shows the ECX overhead vs RF2/RF3 for different nodes:

The expected overhead can be calculated as <# parity blocks> / <# data blocks>.  For example, a 4/1 strip has a 25% overhead or 1.25X compared to the 2X of RF2.  A 4/2 strip has a 50% overhead or 1.5X compared to the 3X of RF3.

 

How does Sizer calculate ECX savings from the above: 

Lets take an example where the cold data for workload is 100TiB.

Also, we will use RF2 as the settings chosen for workload.

So depending on the size of the workload, if the total node recommended came to (lets say 4 nodes), as per the above table: data/parity is 2/1.  So 1.5x overhead for ECX as against 2 for RF2 , thus 50% savings.

For conservative approach and to be on safe side, we only consider ECX for 90 % of the cold data.

ECX applied on 90% of 100TiB = 90TiB

How much ECX savings: 50% = 50% of 90TiB = 45TiB