Compression Sizing

Compression Settings

  • In each each workload,  there are the following compression settings
    • Disable compression for pre-compressed data.
      • This turns off compression in Sizer.  It is a good idea if  customer has mostly pre-compressed data for that workload.  Though it may be tempting to turn-off compression all the time to be conservative, it is hard to economically have large All Flash solutions without any compression.   It is also unrealistic that no data compression is possible.  Thus use this sparingly
    • Enable Compression
      • This is always ON for All Flash.  The reason for that is because post process compression is turned ON for AF as it comes out of the factory.
      • By default it is ON for Hybrid, but user can turn it OFF
    • Container Compression
      • There is a slider that can go from 1:1 (0% savings) to 2:1 (50% savings).
      • The range will vary by workload.  We do review pulse data on various workloads.  Typically 30% to 50%.  For Splunk, it is 15% maximum as the application does fair amount of pre-compression before stored in Acropolis.

What Sizer will do if Compression is turned ON

  • Post process compression is what Sizer sizes for.  The compression algorithm in Acropolis is LZ4 which runs about every 6 hours but occasionally LZ4-HC goes through cold tier data that is over day old and can compress it further.
  • First the workload HDD  and SSD requirements are computed without compression.  This would include the workload and RF overhead
  • Compression will then be applied.  .
  • Example.  Workload requires 4.39 TiB (be it SSD or HDD), RF3 is used for Replication Factor, and Compression is set to 30%
    • Workload Total in Sizing Details = 4.39 TiB
    • RF Overhead in Sizing Details = 4.39* 2 = 8.79 TiB  (with RF3 there is 2 extra copies while with RF 2 there is just one extra copy)
    • Compression Savings in Sizing Details = 30% (Workload + RF Overhead) = 30% (4.39 + 8.79) = 3.96 TiB

Deduplication

  • Deduplication does not effect the compression sizing

Local Snapshots

  • First the local snapshots are computed using what the user enter for daily change rate  and number of snapshots retained (hourly, daily, weekly)
  • RF is applied to the local snapshots as extra copies need to be made.
  • Compression is applied
  • Example
    •  Workload requires 4.39 TiB HDD, RF3 is used for Replication Factor, and Compression is set to 30%
    • Daily change rate = 1% with 24 hourly snapshots, 7 daily snapshots, 4 weekly snapshots
    • Local Snapshot Overhead in Sizing Details =  1.76 TiB  (explained in separate section)
    • Snapshots RF Overhead in Sizing Details = 2*1.76 TiB  = 3.52 TiB (with RF3 there is 2 extra copies while with RF 2 there is just one extra copy)
    • Compression Savings in Sizing Details = 30% (Workload + RF Overhead + Local Snapshot Overhead + Snapshots RF Overhead) = 30% * ( 4.39 + 8.79 + 1.76 + 3.52) = 30% * 18.46 = 5.54 TiB
      • Though a lot of numbers this is saying compression is applied to all the cold user data (not CVM)

Remote Snapshots

  • Using same example used in local snapshots but adding remote snapshots put on a different cluster
  • Remote Snapshot overhead in Sizing Details  = 6.64 TiB  (note this is just for the remote cluster, also explained in separate section)
  • Snapshots RF Overhead in Sizing Details = 13.28 TiB  (note this is just for the remote cluster and remember it is RF3)
  • Compression Savings in Sizing Details = 30% * ( 6.64 + 13.28) = 5.98 TiB
    • Though a lot of numbers this is saying compression is applied to all the cold user data (not CVM)

Misc

  • If compression is ON then just Pro or Ultimate  license in financial assumptions and in the financial analysis section of the BOM

Leave a Reply

Your email address will not be published. Required fields are marked *