Sizing Recommendations for Objects 

General Information on Objects 

Understanding how Nutanix Objects works is useful context for any sizing. To read about the architecture check out the Objects Tech Note: https://portal.nutanix.com/page/documents/solutions/details?targetId=TN-2106-Nutanix-Objects:TN-2106-Nutanix-Objects   

To understand the current maximums visit: https://portal.nutanix.com/page/documents/configuration-maximum/list?software=Nutanix%20Objects 

Nutanix Objects falls under Nutanix Unified Storage (NUS) licensing. For an overview of NUS licensing visit: https://www.nutanix.com/products/cloud-platform/software-options#nus  

Performance vs. Capacity Workloads 

In the past object storage solutions were really only concerned with capacity – performance was barely a consideration. However, modern workloads such as AI/ML and data analytics engines leverage S3 compatible storage, and these very often have significant performance demands. Nutanix Objects has been internally benchmarked with high intensity workloads on both hybrid and all flash systems (see https://portal.nutanix.com/page/documents/solutions/details?targetId=TN-2098-Nutanix-Objects-Performance-INTERNAL-ONLY:TN-2098-Nutanix-Objects-Performance-INTERNAL-ONLY for details) and as a result we have a good understanding into Objects’ performance capabilities with a variety of workload profiles. Extrapolations can reliably be taken from these results to model performance scaling (Objects I/O performance scales linearly). The data gleaned from the benchmark testing is used by Sizer to determine the minimum number of Objects workers – and therefore nodes – needed to deliver a certain level of performance.  

It should be noted that there are factors outside the object store (and therefore outside of Sizer’s purview) that may also be relevant in attaining a certain level of performance. Factors such as network throughput and number of client connections. 

Perhaps more commonly node count will be driven by capacity requirements. Even in these cases however, the minimum Objects worker count needed for performance should still be noted, especially in mixed deployments (discussed further below). 

Whether a sizing’s ultimate driving factor is capacity or performance, Sizer adds a ‘+1’ node to ensure the required capacity / performance remains available even in the event of a node failure. 

Configurations 

While there is no difference in NUS licensing between dedicated deployments (where the AOS cluster is dedicated solely to NUS) and mixed deployments (where NUS co-exists on the AOS cluster alongside applications/user VMs), sizing considerations in each scenario vary to a degree. These are discussed below.  

More information about suitable hardware models can be found at: https://www.nutanix.com/products/hardware-platforms/specsheet?platformProvider=Nutanix&useCase=Files%20and%20Objects. The link points to Nutanix NX models, but you can easily change the hardware vendor as required. At the time of writing, HPE provides the nodes with the highest storage density. Ensure that Files and Objects is selected as the use case. 

For an Objects dedicated configuration 

Objects is supported on ALL models and ALL platforms. If you’re sizing for a dedicated 50TiB or above however, and hybrid nodes are preferred, we recommend the HPE DX4200, NX-8155 or equivalent for the best performance. Such models are ideal due to their high HDD spindle count, though any model will work fine as long as it matches the minimum configurations listed below. 

  • CPU: dual-socket 12-core CPU (minimum) for hybrid configs with 4 or more HDDs 
  • Dual-socket 10-core CPU is acceptable for hybrid configs with fewer than 4 HDDs 
  • Memory: 128GB per node (minimum) 
  • Disk: 
  • Avoid hybrid configurations that have only 2 HDDs per node. 
  • For hybrid configurations, systems with 10 HDDs (or more) are highly recommended. For cost and performance reasons use as many HDDs as possible (see explanation in below section Why use 10+ HDDs in a dedicated hybrid config?). On an NX8155 for example ideally 2*SSD + 10*HDD rather than 4*SSD + 8*HDD. 
  • If a system with 12 or more disk bays is not available, configure the system with the highest number of HDDs possible. 
  • For all flash configurations, any node with 3 or more SSDs/NVMes is fine. 
  • Erasure Coding: Inline enabled (set by default during deployment) 

For a mixed configuration (Objects co-exists with User VMs) 

Objects is supported on ALL models and ALL platform as long as it matches the minimum configurations listed below. 

  • CPU: at least 12 vCPUs are available per node 
  • All node types with dual-socket CPUs are supported and preferred, though single CPUs with at least 22 cores are also supported 
  • Memory: at least 36GB available per node 
  • Disk: avoid hybrid configurations with only 2 HDDs per node and bear in mind that more HDD spindle means better performance. 
  • Erasure Coding: Inline enabled (set by default during deployment) 

NUS licensing allows one user VM (UVM) per node. If taking advantage of this, ensure that there are enough CPU cores and memory on each node to cater for both an Objects worker and the UVM – and potentially also a Prism Central (PC) VM (unless PC is to be located on a different cluster). It’s important to understand that Nutanix Objects cannot be deployed without there being a Prism Central somewhere in the environment.  

Why use 10+ HDDs in a dedicated hybrid config? 

In the majority of today’s use cases objects tend to be large (>1.5MiB), meaning they result in sequential I/O on the Nutanix cluster. In response to this, Objects architecture is tuned to take advantage of the lowest tier. If there are any HDDs in a node, Objects will automatically use them for sequential data, while leveraging the SSDs purely for metadata.  

There are 3 reasons for this; 

  1. Excellent sequential I/O performance can be achieved with HDDs, assuming there are enough of them 
  1. Objects deployments can be up to petabytes in size. At that sort of scale, cache or SSD hits are unlikely, so using SSDs in hopes of achieving accelerated performance through caching would provide little return on the additional costs. To keep the solution cost-effective, Objects minimizes SSD requirements by using SSDs for metadata, and only using for data if required.  
  1. Since we recommend a dual-socket 10-core CPU configuration, fewer SSDs also helps to avoid system work that would be otherwise be incurred by having to frequently move data between tiers – the result is less stress on the lower CPU count. 

It should however be noted that if the workload is made up of mostly small objects, all-flash systems are significantly better at catering for the resulting random I/O, particularly if the workload is performance intensive. In all-flash environments, even a partially populated all-flash environment, both data and metadata will be placed on the SSDs/NVMes. 

The key takeaways are that in a hybrid configuration, which is the best fit for large object workloads, the more HDD spindles there are, the better the performance. For small object workloads an all flash configuration is generally a better way to go for meeting performance demands most effectively. 

  

Sizing Use Cases 

Use Case: Backup 

Below is a Backup workload in Objects Sizer. In this scenario Nutanix Objects is used as a target to store backups sent from backup clients (i.e. the backup app). This is in essence what we refer to as a Mine solution, except with Mine the backup app VM runs directly on the Objects cluster.  

Note that the source data (i.e. the data being backed up) will not be located on the same physical cluster as Nutanix Objects, Objects is used as the backup disk target/repository. 

Considerations when sizing a backup workload 

  • Initial capacity – estimated initial capacity that will be consumed by backups stored on Nutanix Objects. 
  • Capacity growth – % growth of the backup data per time unit (e.g. years) over an overall specified length of time. 
  • In the above example we estimate 5% growth for each year and want to make sure we size for 3 years of growth. 
  • Be cautious and do not attempt to cater for too long a growth period, otherwise the amount of capacity required due to growth could dwarf the amount of storage required on day one. Specifying a (for example) 10-year growth period contravenes our fundamental pay-as-you-grow value, and of course growth predictions may not be entirely accurate in any case. 3 years is a typical growth period to size for. 
  • Do not enable deduplication on any Objects workloads. 
  • Profiles 
  • Write (PUT) traffic usually dominates these environments as backups occur more regularly than restores (GETs) are performed. Furthermore, when restores do occur they are usually just reading a small subset of the backup. 
  • Backups usually result in sequential I/O so the requirement is expressed as MB/s throughput (with the except of Veeam – discussed below) 
  • Backups usually consist of large objects (with the except of Veeam – discussed further below
  • All values can be customized as required. 
  • Replication Factor 
  • When using nodes with large HDDs (12TB+) to achieve high storage density you should consider RF3 when you get to around 100 HDDs in a single fault domain. This provides a higher level of resilience against disk failure. This is important as disk failure is more likely when you have disks with slower rebuild times (because there is more data to rebuild) coupled with the fact that more disk hardware results in greater risk of a disk failure (law of probabilities). If you wish to mitigate this risk while sticking with RF2, consider proposing multiple Objects clusters in a single Objects federation. 

An exception to the norm: Veeam 

Veeam is different from other backup apps in that it does not write out large objects. With Veeam the object size is 768KB, about a tenth of the size of objects generated by other backup apps. Therefore, for Veeam opportunities the Backup profile in Sizer should be adjusted from the default 8MB object size and the requirement expressed in Ops/sec rather than MB/sec (these contrasting I/O gauges are discussed in the cloud-native apps section). 

Use Case: Archive 

Archive is very similar to Backup and so the same advice applies. The only difference is that the profile values are different, as you can see below. As with Backup though, these can be customized to the customer’s specific workload needs. 

Use Case: Cloud-Native Apps 

Cloud-native apps is broad category covering a wide range of workload profiles. The I/O profile depends entirely on what the app in question has been designed to do – and that could be literally anything. However, this category includes, among other things, containerized big data applications and query engines which tend to have highly intensive I/O requirements. For this reason, the default profile in Sizer (shown below) reflects a workload that’s quite performance intensive nature. Object size can also vary greatly in this category, but with many cloud-native workloads the object size will be much smaller than with backup and archive workloads, so the profile contains a small object size. Smaller objects result in random I/O rather than sequential, and when this is the case all flash nodes are a far better choice than hybrid. Note that this this random I/O value is expressed in Sizer in Ops/sec, rather than the MB/sec throughput metric used for large object sequential I/O. These metrics are used in keeping with how random and sequential I/O respectively are gauged in industry generally. 

When sizing Objects for a cloud-native app it’s important to try and find out from the customer what the I/O profile for the app is, then you can edit the I/O profile settings accordingly. This is especially important given the wide variance of cloud-native workloads types out there. 

There is also a “Number of Objects (in millions)” field – this is typically most relevant to cloud-native workloads, which can result in billions of objects needing to be stored and addressed. This value is used to determine how many Objects workers are needed, from a metadata perspective, to be able to address the number of objects that will be stored. Thus, it could be that an Objects cluster sizing is constrained not by performance nor by capacity, but by metadata requirements. 

If you have any doubts or difficulties sizing Objects, don’t hesitate to contact your local NUS Solution Architect (SA) for assistance. The SAs are listed here – https://ntnx-intranet–simpplr.vf.force.com/apex/simpplr__app?u=/site/a0xf4000004zeZ7AAI/dashboard  

January 2019 Sprint

January Sprint 1

Key enhancements

  • We heard from field and partners that getting either budgetary quotes or getting real quotes created is well hard (I can’t repeat what we heard in interviews as I want to keep the channel G rated ).  Partners were saying it can take a week to get either budgetary quote or real quote going.  Distributors were saying we retype what is in the Sizer BOM to create a quote.  Nutanix SE’s saying this is really hard with CBL.

We knew we can help and so we took it on big time in last three months and now in summary we can say a partner or NTNX field  person can create either a budgetary or real quote for any business model be it appliance, disagregated Nutanix, XC Core, CBL SW sale .   Attached is the matrix with details.  I believe this is about improving sales velocity.

  • We now have the  new File licenses going with Standalone cluster.  So here for Files Pro you can create the Files cluster with IONLY the Files skus attached (no AOS licenses).  More changes coming but big step
  • Data Center and ROBO Solutions. These are addons for your HCI recommendation where we add right amount of things like Prism Pro or Flow.

Product Updates

  • We always pull the latest from SFDC for Nutanix products
  • Implemented several Nutanix product rules like allowed CPUs for 3070 if GPU is desired.
  • We always pull the latest from HCL for SW only vendors

January Sprint 2

Key enhancements

  • Include ECX in auto sizing. We always took in the savings after the recommendation was determined and so the HDD utilization was accurate.  We hadn’t taken that into account though in determining the recommendation.  Now that we have really large workloads for Files and soon Buckets this became an issue.  So now Sizer recommendation is accurate
  • Clone Workload Feature. This is cool.  Define a workload and can clone it and then just modify what you want.  For example, you want five different Server Virtualization workloads that are all similar but different.  Define one and clone/edit the rest.
  • Thick VMs sizing logic improved when uploading Collector or RVTools outputs. Now no compression is taken.  A subtle but important Sizing improvement

UX improvements

  • Budgetary Quote – Added Hardware Support quote line
  • Making List view as default dashboard view instead of grid.
  • Implement Open/Closed Opportunity Filter – UX

Product Updates

  • New model – Fujitsu XF8050 HY/AF
  • HPE are DL380/360 again instead of DX . There was a problem with HCL but addressed it
  • We always pull the latest from SFDC for Nutanix products
  • Implemented several Nutanix product rules like allowed CPUs for 3070 if GPU is desired.
  • We always pull the latest from HCL for SW only vendors

 

Coming soon

  • Buckets !! Should come out this week.  I’ll announce it later but you can start thinking about Buckets by sizing different opportunities