Storage calculation for clusters with mixed capacity nodes

This article explains the logic for storage calculations for clusters having node with different storage capacities

What has changed?

Previously, the capacity calculations had been based on aggregate capacity across nodes in the cluster. This total capacity used to be the base for calculating usable and effective capacity in the cluster.

For example: Consider 3 nodes , N1=20TB, N2=20TB and N3=10TB

Based on the above, the total capacity available is 20+20+10 = 50TB and assuming (N+1) , the available nodes are N2+N3 = 30TB. Thus, 15TB can be used for data and 15TB for RF (assuming RF2)

With the new update: Sizer also ensures that the RF copy of the data and the data itself do not share the same node.

In the above example: after N+1, two nodes are available N2 = 20TB ,N3 = 10TB.

If we allow writing 15TB of data (and 15 for RF), part of the data and RF has to be on same node as N3 is only 10TB. So, to ensure the RF copy and data is on separate nodes, the usable storage in this case would be 20TB ( 10TB of data on N2 and its RF on N3 or vice versa).

Note: Although the same logic is used for both homogeneous and mixed capacity clusters, the difference is seen primarily for the mixed capacity clusters.

Here is a detailed write up on how the usable storage is calculated for clusters with mixed capacity nodes for different scenarios across RF2 and RF3

Algorithm for RF2

If we have only one node with non-zero capacity Cx, then in RF2 the replication is done between the different disks of the same node and hence the extent store in this case will be Cx / 2 (RF), else one of the below cases applies. Let us say, we have nodes with capacities C1, C2, C3, …., C[n] which are in sorted order according to their capacities. There are 2 cases to consider for RF2, to compute the effective raw storage capacity:

Case-1: C1 + C2 + C3 + …. + C[n-1] <= C[n] 
If this is the case, then the total amount of storage that can be replicated with a factor of 2 is  ∑(C1, C2, C3, …., C[n-1])

Case-2: C1 + C2 + C3 + …. + C[n-1] > C[n]
If this is the case, then the (total storage capacity) / 2 (RF) can be replicated among the available nodes. In other words, half the total capacity can be replicated.

Algorithm for RF3

Let us say, we have nodes with capacities C1, C2, C3, …., C[n] which are in sorted order according to their capacities. Algorithm for RF3 is slightly different from that of RF2 because we need to accommodate the replica of data on 2 nodes, as opposed to a single node on RF2.

  1. Since there are 3 replicas to place, we calculate the capacity difference between the 2nd largest (C[n-1]) and the 3rd largest (C[n-2]) entities as ‘diff’. This information is necessary so that given an optimal placement scenario where the first replica is placed on the entity with the smallest capacity, the second replica is placed on the entity with the largest capacity (C[n]) and the third replica is placed on the entity with the 2nd largest capacity (C[n-1]); the difference between the 2nd and the 3rd largest capacities ((C[n-1]) – (C[n-2])) will help us quickly deduce when the 2nd largest entity will become equal to the 3rd largest entity by virtue of space consumed on the former via replica placement.
  2. By deducting either the ‘diff’ calculated above (or) the capacity of the smallest entity and simulating RF3 placement such that C[n-2] and C[n-1] have now become equal (note that the difference between C[n] and C[n-1] will remain constant during this since the same capacity is deducted from both of them), in O(N) we arrive at the possibility of:
    • Case-1:  Only 3 entities remain with non-zero capacities, in which case the amount of data that can be accommodated among these 3 nodes with RF of 3 (one actual node and 2 replicas) is the smallest remaining capacity, which is C[n-2].
    • Case-2:There is capacity left in C[n-3] (i.e. the 4th largest entity) and any number of nodes before it (i.e., C[n-4], C[n-5], … etc) and C[n-2] == C[n-1] (i.e. the capacities remaining on the third and the second largest entities have become equal). This is because at this point, the capacity on the smallest entity remaining (the smallest non-zero entity before C[n-2] i.e) is greater than C[n-1] – C[n-2], indicating that after placing the first replica on C[n] and second replica on C[n-1], the time has come where the capacity on C[n-1] == C[n-2]. At this point, for the next bytes of data, the second replica will go to C[n] while the third replica will be round robin-ed between at least 2 (or more) entities. Now in this scenario as well, 2 cases can arise:
      • Case-2(a): (C1 + C2 + … + C[n-1]) / 2 <= C[n]
        Now, if C[n]’s capacity is so high that it means that for every 1st and 3rd replicas placed on the lowest capacities nodes upto C[n-1], the second replica always finds space on C[n], then it implies that, if (C1 + C2 + … + C[n-1]) / 2 <= C[n], then the amount of storage that can be accommodated on available nodes with RF of 3 is the lowest among the two sides of the above equation i.e., (C1 + C2 + … + C[n-1]) / 2, as we cannot consume the full space on C[n].
      • Case-2(b): (C1 + C2 + … + C[n-1]) / 2 > C[n]
        But if C[n]’s capacity is not so high as in case (a), i.e., (C1 + C2 + … + C[n-1]) / 2 > C[n], then replica placements for one of the replicas will be on the largest entity C[n], while the other two replicas will round-robin amongst the other largest capacity entities (since the capacities remaining on at least 2 entities C[n-2], C[n-1] are already equal). This will continue until C[n] becomes equal to C[n-1], which is guaranteed to happen eventually because the replicas consume space on C[n] at least at a rate double than C[n-1], C[n-2], … From that point, both the second and the third replicas will continue being round robin-ed across all the remaining entities, and thus all the capacities remaining at that point can be fully consumed. Hence, in this case, the amount of storage that can be accommodated is the sum of all remaining (non-zero) entities divided by 3 (RF).

Terminologies

We will explain the terminologies with an example. To keep it simple, let us take a homogenous cluster with 6 identical nodes.

Effective Raw Capacity is the Total Raw Storage available across clusters within the scenario. In this case, it is 6 nodes with 6 x 1.92TB drives, amounting to 69.12 TB or 62.86 TiB, as shown in the first row in Figure 2 below.

Failover Capacity Overhead is the storage that is discarded according to the failover plan selected. In the above case, the failover plan is set to “Standard (N+1),” and hence, we arrive at this value by discarding all the storage available within the node. The solution in the above example is an all-flash node with 6 x 1.92TB drives per node. The Failover Capacity Overhead amounts to 11.52TB per node or 10.48 TiB per node. The same can be seen in the second row in Figure 2 above.

Effective Raw Capacity After Failover = Effective Raw Capacity – Failover Capacity Overhead

In the above example, that is 62.86 TiB – 10.48 TiB = ~52.39 TiB

Effective Usable Capacity After Failover = ~95% of Effective Raw Capacity After Failover as AOS stops writing to the disk when the cluster utilization reaches 95%.

In the above example, that would be 52.39 * ~0.95 = 50.22 TiB

Extent Store = (Effective Usable Capacity After Failover – CVM) / Replication Factor

In the above example, with RF set to 2, it would be (50.22 TiB – 9.14 TiB)/2 = 20.54 TiB

Effective Capacity = Extent Store + Savings (Storage Efficiency & Erasure Coding)

In the above example, without saving, the Effective Capacity is 20.54 TiB.

If we set the Storage Efficiency to 18%, the Savings amount to 4.52 TiB, and hence the Effective Capacity would be 20.54 TiB + 4.52 TiB = 25.06 TiB. The same can be seen in the bottom part of Figure 2.

Please note that the numbers will vary for heterogeneous nodes when different capacity nodes exist within a cluster. The larger the variance of capacity across nodes, the Extent Store, and hence, the effective capacity will be reduced to ensure there is enough storage present across nodes to maintain copies of data as per the desired Replication Factor.

SAP HANA sizing guidelines

Sizing for SAP HANA  

Introductory video of SAP HANA in the sizer (non-ST VPN connection required).

https://nutanixinc.sharepoint.com/:v:/t/solperf/solperf_library/ETb67HebgR9Kg32r7jxTO3wBdXoHuOCCjvK2oTasctOgIQ?e=z4Epr2

Note:

  • Only use for SAP HANA based applications, not legacy SAP applications which make use of MSSQL, Oracle etc.
  • Multi-node SAP scale-out is not supported (used for larger SAP BW instances)
  • This sizing process does not vary for scale-up between AHV & VMware implementations
  • No spinning disks are used within a Nutanix cluster being used for SAP HANA
  • Any questions, support, or areas not covered – please use the SAP Slack channel

Supported Platforms:

  • Only Dell, HPE, Fujitsu and Lenovo are supported for SAP HANA, not NX.
  • If another OEM is selected, SAP HANA will not be shown as an available workload

 

Defaults

  • RF2 is used (RF3 is under testing, so not selectable in Sizer)
  • Compression is disabled, and not typically of value for SAP HANA
  • Higher default CVM resource is reserved

HANA Inputs

  • NVMe can be added for higher IO loads, such as a high usage SAP S4/HANA
  • Cost/Performance largely drives cpu choice. Ideally an implementation’s potential compute load in SAPS would be known. Please reach out for support in estimating and reviewing such information.

Environments

There would typically be two environments within a Nutanix cluster where production and non-production are mixed. Production rules should be applied both to all production instances, and any other instances that should be treated as production. This might apply to a QAS/Test environment and will typically apply to any DR cluster.

Production:

  • For most SAP applications (e.g., production S4) there is an SAP HANA database, and one or more application server instances. Some uses of SAP HANA do not use an application server, in which case just use a small one in the sizing exercise.
  • In addition to the Application Server instances, and the SAP HANA database, a small VM called the ASCS is often called for. This ASCS would be around 2c/24GB RAM/100GB disk.
  • Generally, production has two or more application server instances. Typically, 2 – 6 cores, with around 24GB/core. Multiple instances for larger loads. Small storage space requirement for os & application image.
  • For a downtime requirement of less than 20 minutes, a pair of SAP HANA instances should be sized.
  • There is no over commit of cpu or memory
  • Servers must have all memory channels filled and balanced, so 6 or 12 DIMMs per cpu. – Sizer auto recommendation enforces this consideration
  • L suffix cpus are required for largest memory instances
  • Available storage for SAP HANA should be around 2.5x to 3x memory (3x is used in Sizer)
  • Production rules – SAP HANA instances are on whole dedicated cpus and so cannot be allocated to the CVM cpu
  • HANA System Relication(HSR) – is exactly a copy of the HANA VM. In Sizer, add another HANA VM if implementing the HSR.

 

Non Production:

 

 

  • QAS/Test landscape tends to match nonPRD for size of instance
  • If an operating system HA cluster is used in production, there is typically at least one such cluster outside of production also – used as a testbed.
  • Each SAP solution would normally have two or three non-production landscapes
  • Solution Manager (SolMan) is often overlooked, and not asked for. It is a required instance in the overall deployment and would be sized in PRD with one SAP HANA instance and an application server instance. Another such pair for QAS/test. No HA clustering would be required.
  • DEV, SBX etc. are usually subsets in memory size.

 

Feb 8, 2021

First release of SAP HANA in the sizer:

 

Business Continuity and Disaster Recovery Discovery Guidance (Revised 4/7/25)

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.

REVISION HISTORY
4/7/25 – 1st Revision – Kevin Laine, Mike Umphreys
1/5/21 – 1st Publish – Laine Leverett


BCDR Discovery

 1.  Has your organization completed a Business Impact Analysis (BIA) for your applications and workloads? If so, could you share the findings with us?

Why ask? A Business Impact Analysis (BIA) helps prioritize applications based on their criticality to the business. It provides key insights into acceptable levels of data and time loss, as well as the potential revenue and operational impacts of disruptions. Understanding the systems, dependencies, and business processes tied to critical applications is essential for designing a tailored Disaster Recovery (DR) solution that meets their organization’s specific continuity needs. This information will ensure the DR strategy aligns with their business’s tolerance for risk and recovery objectives, enabling us to propose the most effective solution. This information would be HIGHLY beneficial to have in designing a DR solution for your customer. 

2.  If a Business Impact Analysis (BIA) is not available, can you provide a breakdown of your applications along with their corresponding Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs)? Additionally, please specify the retention requirements for both on-premises and the recovery site. 

Why ask? RPOs help us identify the appropriate Nutanix products to meet your data recovery needs, while RTOs are crucial in determining how quickly applications need to be restored. The design of the Disaster Recovery (DR) solution may vary significantly based on these time-sensitive requirements. It’s also important that these objectives are driven by business priorities rather than IT assumptions, ensuring that the recovery strategy aligns with the organization’s operational needs. Additionally, it’s key to distinguish between DR and backup: DR retention should be long enough to meet the RTO, while longer RTOs may be more cost-effectively addressed with backup solutions instead of full DR. Understanding both retention and recovery needs allows us to design a solution that balances cost with business continuity requirements effectively.

 3.  For the applications listed, can you provide the change rate for each? You can calculate this using existing backups and the deltas between them. For applications with aggressive RPOs that may require near-sync, please also include the write throughput for each application.

Why ask? The change rate is crucial for understanding how much data is modified over time, which helps determine the appropriate data protection approach. For applications with aggressive RPOs requiring near-sync recovery, knowing the write throughput is essential, as it directly impacts the network bandwidth requirements. This information allows us to design a DR solution that ensures optimal performance and meets the required RPOs without overloading the network or the infrastructure

 4.  What backup and/or DR products are you currently using?

Why ask? Understanding the backup and DR products in use helps identify potential synergies and opportunities for transitioning to Nutanix solutions that align more closely with your business continuity goals. For example, if you’re using products like Zerto, you may be a good candidate for Nutanix’s near-sync capabilities, which can enhance your RPO and RTO requirements. Also, if a customer is using Zerto today with VMware (ESXi), this can’t be used with AHV but may be replaced with inherent Nutanix DR Orchestration. This insight allows us to tailor the solution to better integrate with your existing environment while optimizing recovery processes. 

 5.  In the event of a failure, who is on-call to help with the restore. Will all relevant personel be trained on recovery plans and procedures?

Why ask? This question helps the customer consider the operational aspects of disaster recovery and highlights the importance of having a clear, well-communicated recovery process. Ensuring that all relevant staff are trained and confident in executing recovery procedures is vital for minimizing downtime and reducing the stress of the situation. Simplifying and streamlining recovery workflows can make the process more efficient, allowing everyone to feel empowered and capable of contributing to the restoration efforts when a disruption occurs. 

 6.  What Disaster Recovery (DR) solutions are you currently using, and have you tested the solution to verify it functions as expected?  How well have the solutions met or not met your requirements and expectations?

Why ask? This question helps uncover the current DR topology and provides insight into its performance. Understanding whether the existing solution(s) meets the customer’s business continuity needs, and whether it has been properly tested, allows us to identify gaps or areas for improvement. It’s essential to assess the current state to determine if Nutanix can offer a more reliable, efficient, and scalable solution that aligns better with the customer’s recovery objectives and expectations. 

 7.  Are there any other systems such as physical servers, mainframes, or other physical devices that should be considered as part of your disaster recovery plan that are outside of the Nutanix Infrastructure? 

Why ask? This question ensures that the customer has thoroughly considered the full scope of their disaster recovery needs. It’s important to account for all systems and devices, not just those within the Nutanix infrastructure, to create a comprehensive DR plan that addresses all critical components of the IT environment. This helps prevent any overlooked systems from becoming a potential point of failure during recovery. 

8.  What types of disasters are you planning for in your business continuity and disaster recovery strategy? 

Why ask? The type of disaster you’re planning for will influence the products and operational procedures required for an effective recovery plan. For example, a pandemic might require a different approach (such as enabling remote work) compared to a geographic event like an explosion or earthquake, or a localized issue like a flooded ROBO office server closet. Additionally, cybersecurity events like a ransomware attack will require distinct recovery strategies. By helping the customer consider various disaster scenarios, we can guide them toward a comprehensive “recovery in depth” strategy—whether it involves snapshots, backups, remote disaster recovery sites, stretch clusters, cloud-based disaster recovery (Cloud Provider DR), or DRaaS. This ensures they are fully prepared for a broad range of potential disruptions. 

9.  Can your sites operate independently in the event of a failure? If so, for how long? 

Why ask? This question helps assess the criticality of each site and their ability to function without reliance on other locations. Understanding how long each site can remain operational independently is essential for determining the appropriate disaster recovery strategy, ensuring that business continuity is maintained during disruptions. It also helps us identify which sites may require additional resources or support to ensure they can continue operating effectively in the event of a failure at another site. 

10.  What is your data retention policy, and who is responsible for defining it (e.g., regulatory requirements or internal guidelines)? 

Why ask? Understanding the retention policy helps us identify the factors driving data storage requirements, whether they are mandated by external regulations or internally set business needs. This insight allows us to ensure the disaster recovery solution aligns with both compliance obligations and the organization’s operational requirements. It also enables us to design a solution that optimizes data storage, retention, and recovery processes based on those specific needs. 

11.  Do you have any regulatory requirements such as HIPAA, PCI, SEC, SOX?

Why ask? Regulatory requirements significantly impact the design of a disaster recovery solution. Understanding which regulations apply will help us determine the necessary features and controls, such as encryption, data sovereignty, role-based access control (RBAC), and logging, to ensure compliance. This ensures that the recovery solution meets both legal obligations and security standards, while minimizing risk to the organization. 

12.  Do you need immutable copies of the data or VMs?

Why ask? This question helps assess the customer’s data protection needs, particularly in relation to the retention period and the sophistication of their requirements. For example, if the customer has a long-term retention policy (such as seven years), they may need a third-party backup tool to tier data to an object store that supports WORM (Write Once Read Many) to ensure data immutability. Additionally, this requirement could be driven by the need to protect against ransomware attacks, as immutable data copies provide an added layer of security by preventing data tampering or deletion. Understanding these needs ensures we can design a solution that meets both compliance and security objectives. 

13.  Is this a new disaster recovery solution, or are you looking to enhance or replace an existing one?

Why ask? Understanding whether this is a completely new DR solution or an enhancement to an existing setup allows us to determine the flexibility we have in designing the most effective strategy. If it’s a net new solution, we can build it from the ground up to meet the customer’s current needs and future growth. If it’s an enhancement or replacement, we need to consider integrating with the existing infrastructure while addressing any gaps or limitations in the current DR approach? 

14.  Will the source and destination clusters both be Nutanix clusters?

Why ask? If both the source and destination clusters are not Nutanix-based, it could introduce constraints in designing the disaster recovery solution. For example, a non-Nutanix source or destination may dictate the choice of hypervisor and prevent array-based replication, requiring additional software like SRM or Zerto for replication. It may also necessitate adjustments in licensing or configuration. Understanding this early ensures that we can design the most efficient and cost-effective DR solution while accounting for any potential complexities. 

15.  Do you have a requirement for your backups and disaster recovery solution to be hosted on separate hardware from your primary infrastructure? 

Why ask? This question helps determine how to architect the infrastructure for both backup and disaster recovery targets. If the backups or DR solution must be on separate hardware, it will influence the design and selection of infrastructure, ensuring that redundancy and isolation requirements are met for business continuity. This also ensures that potential risks, such as hardware failure, are mitigated by keeping backup and DR systems independent from the primary infrastructure. 

16.  What does your desired replication topology for disaster recovery?

Why ask? Understanding the desired replication topology helps us determine whether you need Professional or Ultimate AOS licenses, as well as how to map out the replication setup (e.g., A->B->C, A->B, A->B and A->C). This also provides insight into whether your sites are active/passive and clarifies the specific definitions of “active” and “passive” in your environment (e.g., a data center with power, VMs ready to power on, or VMs already powered on and capable of an immediate switchover). Knowing this allows us to design a replication strategy that aligns with your recovery objectives and infrastructure requirements. 

17.  Does your disaster recovery solution need to provide the same performance as your production environment?

Why ask? This question helps determine whether disaster recovery is a critical business requirement or more of a compliance checklist item. If the DR solution needs to match production performance, it indicates a higher priority for minimizing downtime and avoiding significant business loss during outages. Understanding this will help size the solution appropriately, ensuring it meets the required performance levels. If the customer decides to accept a lower-performing DR setup, it’s important to get their sign-off to ensure alignment with business expectations and risk tolerance. 

18.  Do you need to have replicated copies of all the VMs, but only plan to restore a subset in the event of a disaster?

Why ask? This question helps identify potential licensing requirements and the size of the target cluster needed for disaster recovery. If all VMs are being replicated but only a subset will be restored, it may impact the resources required for replication and storage, as well as the licensing model. This insight allows us to design a more efficient DR solution that aligns with your actual recovery needs and avoids unnecessary resource allocation. 

19.  How frequently do you test your disaster recovery plan, and what does the plan entail?

Why ask? This question helps determine whether the customer has actively validated their disaster recovery plan to ensure it will perform as expected during an actual disaster. Many organizations may be optimistic about the effectiveness of their plans, but testing provides a reality check and reveals any gaps or areas for improvement. Understanding the testing frequency and details allows us to assess the maturity of their DR strategy and offer guidance on how to enhance it, ensuring better preparedness and faster recovery in the event of an incident. 

20.  Do you have existing runbooks in place for disaster recovery, and how are they used during an incident?

Why ask? This question helps us understand the customer’s current disaster recovery processes and whether they have a structured, documented plan (runbook) for responding to an incident. If they don’t have runbooks in place, it highlights an opportunity for us to assist in developing a comprehensive plan that can be tested and refined. A well-documented and regularly tested runbook ensures a faster, more organized response during a disaster, reducing downtime and minimizing business disruption. 

21.  Databases: How are you currently backing up and protecting your database environments?

Why ask? Databases are often among an organization’s most critical assets, so understanding how they are backed up and protected gives us insight into their data protection strategy. They may already be using database-level replication or clustering technologies that offer more granular control than traditional storage-based replication for VMs. This information is key when discussing RPO and RTO, as databases typically require higher levels of availability and more tailored recovery strategies than other parts of the infrastructure. It helps us design a solution that meets the unique needs of their most critical systems. Databases can use native BC/DR continuity meaning that the application itself can do the protection and further enhanced with potential snapshot technology.  

22.  What hypervisor(s) are you using, and are you open to alternatives even if just for DR? 

Why ask?  This question helps us understand which replication products and solutions are compatible with your environment, especially if the source or destination cluster is non-Nutanix. If you’re using ESXi on your primary cluster, this could open the possibility of leveraging Nutanix AHV for the target cluster, optimizing the disaster recovery setup. Understanding their willingness to explore cross-hypervisor DR options enables us to propose the most flexible and efficient solution, while maximizing your existing infrastructure investmentsIf they are using ESXi, this is a good time to also ask about renewal dates and if they have gotten their renewal costs. 

NOTE- This might have been asked in 1st call type contacts.  There is no reason to ask again.

23. Do you have a dedicated network for disaster recovery (DR) traffic, and do you require encryption on those links? 

Why ask? This question helps us determine if network segmentation is needed in Prism for DR replication traffic, ensuring that DR traffic is isolated for security and performance reasons. It also helps us understand if the customer requires encryption in transit for their DR network, which is critical for safeguarding sensitive data during replication. This insight allows us to design a DR solution that meets both security and network performance requirements.  

24.  What is the current bandwidth between sites that you plan to use for DR replication, and what is the latency between those sites?

Why ask? This question helps us assess whether the bandwidth between sites is sufficient to meet the customer’s defined RPO (Recovery Point Objective), based on the rate of change in their environment. Additionally, it ensures that the latency between sites meets the minimum requirements for solutions like Metro Availability or Metro Witness. Understanding these factors allows us to design a DR solution that can meet the customer’s recovery objectives efficiently and reliably. 

25.  What is the current rate of utilization of the network links between the sites you plan to use for replication traffic?

Why ask? These links may be used for other traffic and could impact the available bandwidth that you assume you will have access to for replication traffic.  See if you can get utilization over a 30 day period, and if possible over several months,  to see any trends of increase or decrease in utilization.

26.  How do you handle IP addresses on the recovery site for VMs that have failed to the recovery site?

Why ask? This question helps us understand the customer’s preferred networking failover approach and any potential challenges associated with IP address management during a disaster recovery event. It provides insight into whether they use overlay networks, a stretched Layer 2 subnet between sites, a full subnet failover by updating routes to point to the recovery site, or allow IP addresses to change during failover. Understanding this will help us design a solution that minimizes issues such as broken applications due to hard-coded IP addresses, DNS updates, or cache entry changes, ensuring a smoother failover process. 

  1. What is your current approach to data consistency and application consistency during disaster recovery

Why ask? Understanding how data consistency is ensured—whether through application-consistent snapshots, crash-consistent backups, or other methods—helps ensure that the recovery process doesn’t result in data corruption. This is especially important for transactional systems (like databases) or applications requiring high availability. Knowing how these are addressed can guide the design of the solution for higher data integrity during DR. 

  1. What is your approach to multi-cloud disaster recovery or cloud-based backup?

Why ask? This will help assess the customer’s strategy if they are leveraging or considering a multi-cloud or hybrid-cloud approach to business continuity. It’s crucial to know if the customer is looking to integrate public cloud resources (e.g., AWS, Azure) into their DR plan. This insight helps us integrate Nutanix solutions like Nutanix Cloud Clusters (NC2) for seamless multi-cloud disaster recovery. 

Nutanix Unified Storage – Files Discovery Guidance (Revised 4/24/25)

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools.

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution.

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity.

This is a live document, and questions will be expand and update over time.

REVISION HISTORY
4/21/25 – 1st Revision – Mike McGhee
1/5/21 – 1st Publish – Matt Bator


Files

1.  Is this replacing a current solution, or is this a net new project?
     a.  What’s the current solution?

Why ask? This question helps us understand the use case, any current expectations and what the competitive landscape may look like as well as an initial idea of the size / scale of the current solution.

2.  Is there a requirement to use an existing Nutanix cluster (with existing workload) or net new Nutanix cluster?

Why ask? If we’re sizing into an existing cluster we need to understand current hardware and current workload.  For licensing purposes adding Files to an existing cluster means the Unified Storage Pro license. A common scenario has been to add storage only nodes to an existing cluster to support the new Files capacity.  If sizing into a new cluster we can potentially dedicate this cluster to Files and Unified Storage. 

3.  Is this for NFS, SMB or both? Which protocol versions (SMB 3.0, NFSv4, etc)?

Why ask?  We need to understand protocol to first validate they are using supported clients.  Supported clients are documented in the release notes of each version of Files.  Concurrent SMB connections also impact sizing with respect to the compute resources we need for the FSVMs to handle those clients.  Max concurrent connections are also documented in the release notes of each version. 

It also helps us validate supported authentication methods.  For SMB, we require Active Directory where we support 2008 domain functional level or higher.  There is limited local user support for Files but the file server must still be registered with a domain.  For NFS v4 we support AD with Kerberos, LDAP and Unmanaged (no auth) shares.  For NFS v3 we support LDAP and Unmanaged. 

4.  Is there any explicit performance requirement for the customer? Do they require specific IOPS or performance targets? 

Why ask?  Every FSVM has an expected performance envelope.  There is a sizing guide and performance tech note on the Nutanix Portal which give a relative expectation on the max read and write throughput per FSVM and max read or write IOPs per FSVM. 

Throughput based on reads and writes are integrated into Nutanix Sizer and will impact the recommended number of FSVMs.  This may also impact the hardware configuration,including choice of NICs, leveraging RDMA between the CVMs, or iSER supported since the Files 5.0 release via a performance profile. Also the choice of all flash vs. hybrid.  

5.  Do they have any current performance collection from their existing environment?
      a.  Windows File Server = Perfmon
      b.  Netapp = perfstat
      c.  Dell DPACK, Live Optics

Why ask?  Seeing data from an existing solution can help validate the performance numbers so that we size accurately for performance. 

6.  What are the specific applications using the shares?
       a.  VDI (Home Shares)
       b.  PACS (Imaging)
       c.  Video (Streaming)
       d.  Backup (Streaming)

Why ask?  When sizing for storage space utilization the application performing the writes could impact storage efficiency.  Backup, Video and Image data are most commonly compressed by the application.  For those applications we should not include compression savings when sizing, only Erasure Coding.  For general purpose shares with various document types assume some level of compression savings.  

7.  Are they happy with performance or looking to improve performance?

Why ask?  If the customer has existing performance data, it’s good to understand if they are expecting equivalent or better performance from Files.  This could impact sizing, including going from a hybrid to an all flash cluster. 

 8.  How many expected concurrent user connections?

Why ask? Concurrent SMB connections are a required sizing parameter.  Each FSVM needs enough memory assigned to support a given number of users.  A Standard share is owned by one FSVM.  A distributed share is owned by all FSVMs and is load balanced based on top level directories.  We need to ensure any one FSVM can support all concurrent clients to the standard share or top level directory with the highest expected connections. We should also be ensuring that the sizing for concurrent connections is taking into account N-1 redundancy for node maintenance/failure/etc.

 9.  What is your current share configuration including number of shares?

Why ask?  Files has a soft (recommended) limit of 100 shares per FSVM. We also leverage Nested shares to match an existing environment if there are more shares needed.  Files currently supports 5,000 nested shares since the 4.4 release.

10.  Does their directory structure have a large number of folders in share root?

Why ask?  This indicates a large number of top level directories making a distributed share a good choice for load balancing and data distribution.

11. Are there Files in the share root?

Why ask?  Distributed shares cannot store files in the share root.  If an application must store files in the root then you should plan for sizing using standard shares.  Alternatively, a nested share can be used. 

 12. What is the largest number of files/folders in a single folder?

Why ask?  Nutanix Files is designed to store millions of files within a single share and billions of files across a multi-node cluster with multiple shares.  To achieve speedy response time for high file and directory count environments it’s necessary to give some thought to directory design. Placing millions of files or directories into a single directory is going to be very slow in file enumeration that must occur before file access.  The optimal approach is to branch out from the root share with leaf directories up to a width (directory or file count in a single directory) no greater than 100,000.  Subdirectories should have similar directory width.  If file or directory counts get very wide within a single directory, this can cause slow data response time to client and application.  Increasing FSVM memory up to 96 GB to cache metadata can help improve performance for these environments especially if designs for directory and files listed above are followed.

13. What is the total size of largest single directories?

Why ask?  Nutanix supports standard shares up to 1PiB starting with the Files 5.0 release (prior to compression.)  And top level directories in a distributed share up to 1PiB.  These limits are based on the volume group supporting the standard share or top level directory.  We need to ensure no single folder or share (if using a standard share) surpasses 1PiB.

12. Largest number of files/folders in a single folder?

Why ask?  Nutanix Files is designed to store millions of files within a single share and billions of files across a multi-node cluster with multiple shares.  To achieve speedy response time for high file and directory count environments it’s necessary to give some thought to directory design. Placing millions of files or directories into a single directory is going to be very slow in file enumeration that must occur before file access.  The optimal approach is to branch out from the root share with leaf directories up to a width (directory or file count in a single directory) no greater than 100,000.  Subdirectories should have similar directory width.  If file or directory counts get very wide within a single directory, this can cause slow data response time to client and application.  Increasing FSVM memory to cache metadata and increasing the number of vCPUs can help improve performance for these environments especially if designs for directory and files listed above are followed.

13  Does the total storage and compute requirements including future growth?

Why ask?  Core sizing question to ensure adequate storage space is available with the initial purchase and over the expected timeframe. 

14.  What percent of data is considered to be active/hot?

 Why ask?  Understanding the expected active dataset can help with sizing the SSD tier for a hybrid solution.  Performance and statistical collection from an existing environment may help with this determination.

 15.  What is your storage change rate?

Why ask?  Change rate influences snapshot overheads based on retention schedules.  Nutanix Sizer will ask what the change rate is for the dataset to help with determining the storage space impact of snapshot retention.

 16.  Do you have any storage efficiency details from the current environment (dedup, compression, etc.)?

Why ask?  Helps to determine if data reduction techniques like dedup and compression are effective against the customers data.  Files does not support the use of deduplication today, so any dedup savings should not be taken into account when sizing for Files.  If the data is compressible in the existing environment it should also be compressible with Nutanix compression.

 17.  What is the block size of current solution (if known)?

Why ask?  Block size can impact storage efficiency.  A solution which has many small files with a fixed block size may show different space consumption when migrated to Files, which uses variable block lengths based on file size.  For files over 64KB in size, Files uses a 64KB block size.  In some cases a large number of large files have been slightly less efficient when moved to Nutanix Files.  Understanding this up front can help explain differences following migrations.

18.  Is there a requirement for Self Service Restore (SSR)?

Why ask?  Nutanix Files uses two levels of snapshots, SSR snapshots occur at the file share level via ZFS.  These snapshots have their own schedule and Sizer asks for their frequency and change rate under “Nutanix Files Snapshots.”  The schedule associated with SSR and retention periods will impact overall storage consumption. Nutanix Files Snapshots increase both the amount of licensing required and total storage required, so it’s important to get it right during the sizing process.

 19.  What are the customer’s Data Protection/Disaster Recovery requirements and what is their expected snapshot frequency and retention schedule (hourly, daily, weekly, etc.)?

Why ask? Data Protection snapshots occur at the AOS (protection domain) level via the NDSF.  The schedule and retention policy are managed against the protection domain for the file server instance and will impact overall storage consumption.  Sizer asks for the local and remote snapshot retention under “Data Protection.”
Files supports 1hr RPO today and will support near-sync in the AOS 5.11.1 release in conjunction with Files 3.6.  Keep in mind node density (raw storage) when determining RPO.  Both 1hr and near-sync RPO require hybrid nodes with 40TB or less raw or all flash nodes with 48TB or less raw.  Denser configurations can only support 6hr RPO.  These requirements will likely change so double check the latest guidance when sizing dense storage nodes. Confirm that underlying nodes and configs support NearSync per latest AOS requirements if NearSync will be used.

 20. Does the customer have an Active/Active requirement?

Why ask?  If the customer needs active/active file shares in different sites which represent the same data, we need to position a third party called Peer Software.  Peer performs near real time replication of data between heterogenous file servers.  Peer utilizes Windows VMs which consume some CPU and memory you may want to size into the Nutanix clusters intended for Files.

Files 5.0 introduced an active/active solution called VDI sync, specific for user profile data.  The solution supports activity against user specific profile data within one site at a time.  If the user moves to another site, the VDI session can follow and localize access for that user.

 21. Is there an auditing requirement? If so, which vendor or vendors

Why ask?  Nutanix is working to integrate with three main third-party auditing vendors today, Netwrix (supported and integrated with Files), Varonis (working on integration) and Stealthbits (not yet integrated).  Nutanix Files also has a native auditing solution in File Analytics.

Along with ensuring audit vendor support, a given solution may require a certain amount of CPU, Memory and Storage (to hold auditing events).  Ensure to include any vendor specific sizing in the configuration.  File Analytics for example could require 8vcpu 48GB of memory and 3TB of storage.

Data Lens is a SaaS offering in the public cloud, so you will need to ensure the customer is comfortable with a cloud solution. 

22. Is there an Antivirus requirement? If so, which vendors?

Why ask? Files supports specific Antivirus vendors today with respect to ICAP integration.  For a list of supported vendors see the software compatibility matrix on the Nutanix Portal and sort by Nutanix Files:

https://portal.nutanix.com/page/documents/compatibility-interoperability-matrix/software

If centralized virus scan servers are to be used you will want to include their compute requirements into sizing the overall solution.

 23. Is there a backup requirement? If so, which vendor or vendors?

Why ask?  Files has full change file tracking (CFT) support with HYCU, Commvault, Veeam, Veritas and Storware.  There are also vendors like Rubrik who are validated but do not use CFT.  If including a backup vendor on the same platform, you may need to size for any virtual appliance which may also run on Nutanix. 

23. Is the customer using DFS (Distributed File Server) Namespace (DFS-N)?

Why ask?  Less about sizing and more about implementation.  Prior to Files 3.5.1 Files could only support distributed shares with DFS-N.  Starting with 3.5.1 both distributed and standard shares are fully supported as folder targets with DFS-N.

Files 5.1 introduced a native unified namespace to combine different file servers into a common namespace.  

 24.  Does the customer have tiering requirements?

Why ask?  Files supports tiering which means automatically moving data off Nutanix Files and to an S3 compliant object service either on-premises or in the cloud.  In scoping future requirements, customers may size for a given amount of on-premises storage and a larger amount of tiered storage for longer term retention.

Server Virtualization Discovery Guidance (Revised 4/15/2025)

Introduction – Please Read First

The questions here are to assist with ensuring that you’re gathering the necessary information from a customer/prospect to provide an appropriate solution to meet their requirements.  This is in addition to capturing specific metrics from tools such as Nutanix Collector or RVTools.   

The list is not exhaustive and will need to be adapted to the appropriate audience.  It should be used as a guide to make sure you’ve conducted a thorough discovery.  It is important that you don’t just ask a question without understanding the reason why and why it matters – this leads to providing an optimal solution. 

Always ask open questions (“Tell me more…”) and where possible avoid talking about Nutanix products and capabilities so as not to derail the gathering information.  If asked, suggest this will form part of a follow-up workshop. 

Questions marked with an asterisk (*) may require the assistance of a Portfolio Specialist or Solution Architect to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new scenario.  The questions will help guide you to completing the customer’s requirements, constraints, assumptions, and risks for your opportunity.   

This is a live document, and questions will be expanded and updated periodically.
 

Revision History 

Revision 2025.1 Darren Woollard March 19, 2025

Initial Publication Lane Leverett November 2020
 


Server Virtualization

Generic & State of The Union Questions

Generic & State of The Union Questions 

1. What does your server virtualization environment look like today? 

       a. Hardware – single or multiple vendors?  By choice, risk mitigation or from mergers? 

       b. Software – the hypervisor and the eco-system that surrounds it (not just the hypervisor components, think of self-service, build automation, micro-segmentation, ticketing workflow, backup, etc…) 

2. What do you find most challenging about your current virtualization environment (examples below)? 

      a. Management? 

      b. Business/people process is manual (causing slow turnaround, bad perception of I.T.) 

       c.  Upgrading/patching across multiple sites 

       4.  Needing 3rd party software to complete certain tasks 

3. In your virtualized environment, what keeps you up at night? 

4. What is working well in your current virtualization environment that you want to ensure is continued? 

5. What does your cloud strategy currently look like?
      a. Private Cloud, Hybrid Cloud, Public Cloud, Multi-Cloud, “We’re Cloud First”
            i.   Why has this strategy been chosen?
            ii.  Who is directing/championing this strategy? 

6. What is the desired position when it comes to utilizing Public Cloud provider services?  In the next, say 1-3 years
      a. Is a distributed multi-cloud operating model perceived to be the best way to deliver the services of the business?        

       b. Is there a preference of a specific Public Cloud provider? 

      c. Will some services remain on-premises and some within the Public Cloud? 

       d. Will some services transform to SaaS offerings during a digital transformation project removing the need for the on-prem application? 

       e. What are the top 3 concerns about operating in a distributed ‘Cloud’ model? 

 Architecture/Solution Specific Questions

1. Do you have a preferred  x86 server vendor standard?
     a. Are you happy with this vendor?
     b.  If so, what do you enjoy/appreciate the most?
     c. If not, what do you find the most challenging? 

2. What is your preferred storage vendor for virtualization?
     a. Are you leveraging RDM (Raw Device Mappings) for your workloads?
     b. If so, can you please provide some workload examples?
           i.  Oracle, MS CSV’s, SCSI-3 Shared Devices 

3. What is your preferred storage vendor for physical workloads? 

       a. How do the physical servers connect to the storage presentation? 

       b.  What is the storage presentation protocol to these devices (iSCSI, NFS, etc…) 

4. How do you currently connect your storage to your x86 servers?
     a.  NFS, FC, FCOE, iSCSI 

 5.  What SAN/Storage hardware is in place today? 

       a. HDD/Hybrid/All Flash/etc…? 

       b. How many spindles of each? 

       c. How many Controllers/Storage Processors? 

       6. What does the logical disk layout look like? 

              a. RAID Level? 

              b. Number of disks per RAID Group? 

7. Who is your preferred hypervisor vendor and what version(s) are deployed? 

       a. If multiple vendors are used, is this due to architectural reasons? 

8. How open would you be to considering other hypervisors? 

9. Who is your preferred networking vendor? 

10. Are you using traditional 3-Tier networking or Leaf-Spine networking? 

11. What does your networking architecture/rack design look like? 

12. Are you integrating hypervisor networking and what are your current networking standards?
        a.  Cisco ACI / VMware NSX / Arista / Cumulus 

13. A collection of data from your current environment is preferred in order that a point in time capture can be reviewed.  Can we use Nutanix Collector, RVTools (VMware estates), Dell LiveOptics, Microsoft MAP, Oracle AWR, or any other inventory collection you may have used to gather at a minimum, the following information?:
      a.  # of Virtual Machines
      b.  # of vCPUs
      c.  Current vCPU to Physical Core oversubscription
      d.  Current Physical CPU Model In hosts (for SpecInt Sizing/Comparison- http://ewams.net/?view=How_to_Size_the_CPUs_of_New_Systems_Using_my_Specint_Rated_Tool )
      e.  Allocated memory
      f.  Provisioned storage
      g.  Consumed storage
      h.  Largest vCPU allocation (for NUMA design)
      i.  Largest Memory allocation (for NUMA design)
      j.  Working set size (what will sit in SSD for Hybrid)-can be determined from daily incremental backups (https://www.joshodgers.com/2014/09/25/rule-of-thumb-sizing-for-storage-performance-in-the-new-world/ ) 

Backup & Data Protection Questions

See BCDR Discovery Guidance Doc

Automation

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.


Calm

Discovery Questions

1.  How are you currently automating IT Service Delivery today, do you have any:
     a.  IAAS – Infrastructure as a service
     b.  PaaS – Platform as a service
     c.  SaaS – Software as a service

Why ask?  It helps us understand the customer’s maturity level when it comes to application deployment and could uncover some of the competitive infrastructure.  See some of the possible competitive or other products we may be able to work with or integrate with.

2.  Are standardizing and compliance important to you in your IT Automation Delivery Strategy?
a.  Do you currently use a business intake or self-service request process via a solution such as ServiceNow, Cherwell, Remedy, etc. to automate IT service delivery?

Why ask? Gives us the opportunity to discuss our SNOW plugin. Also helps understand which front end they will use for the Calm implementation.

3. Do you have any contracts with the cloud providers (AWS, Azure or GCP)?
a.  What are the specific use cases or workload profiles consumed from the cloud providers?

Why ask? Helps us understand which providers they may consume with Calm. Helps us understand which services are still on-prem and available as a target for AOS. May help position Beam. Also helps us understand if they have a Microsoft EA which may force their spend to go to Azure.

4.  Can you describe the process, do you have any documentation  for VM, OS or Database Deployment and Management?

Why ask? It helps uncover their current pain points and possibly competitive landscape.  (This would typically be asked when talking to the Infrastructure Team) If the process is already well documented/defined, the hardest part of the implementation is already done.

5.  What tools do you leverage to automate your Windows or Linux Server builds beyond the imaging / template / cloning process?
      a.  vRA
      b.  Terraform
      c.  Puppet
      d.  Chef
      e.  Ansible
      f.  Salt
      g.  SCCM

Why ask? Helps understand the competitive landscape as well as integration points that will need to be solved

6.  How many VMs are under management today?

Why ask? It helps us estimate the size of the deal for licensing

7.  What does your infrastructure footprint for managing/running containers look like?
      a.  What tools are you using?
      b.  How many containers?
      c.  If Kubernetes, how many pods and containers?
      d.  Which version of Kubernetes? (AKS/EKS/Anthos/Openshift/Tanzu/etc)

Why ask? Helps understand their current place on the journey to cloud native apps.  If they are still investigating, we have an option to position Karbon. If they are using another product already, we may be able to provide the infrastructure for that environment. 

8.  Do you have any contracts with the cloud providers (AWS, Azure or GCP)?
      a.  What are the specific use cases or workload profiles consumed from the cloud providers?

Why ask? Helps us understand which providers they may consume with Calm. Helps us understand which services are still on-prem and available as a target for AOS.  May help position Beam.  Also helps us understand if they have a Microsoft EA which may force their spend to go to Azure.

9.  Are standardizing and compliance important to you in your IT Automation Delivery Strategy?
      a.  Do you currently use a business intake or self-service request process via a solution such as ServiceNow, Cherwell, Remedy, etc. to automate IT service delivery?

Why ask? Gives us the opportunity to discuss our SNOW plugin. Also helps understand which front end they will use for the Calm implementation. 

10.  In your application development organization, is Continuous Integration/Continuous Delivery (CI/CD) an operating principle?
      a.  What tools do you leverage in your current/targeted pipeline?
            i.    Jenkins
            ii.   Atlassian Bamboo
            iii.  CircleCI
            iv.  GitLab CI/CD
            v.   Azure DevOps

Why ask? Helps us understand the integrations needed for a successful implementation.

Resources:

Glossary of Terms: https://github.com/nutanixworkshops/calmbootcamp/blob/master/appendix/glossary.rst 

xPert Automation team page: http://ntnx.tips/xPertAutomation (Internal Only)

LinkedIn Learning – DevOps Foundations Learning Plan: https://www.nutanixuniversity.com//lms/index.php?r=coursepath/deeplink&id_path=79&hash=2ce3cb1f946cc3770bd466853e68ee36ddbcf5e1&generated_by=19794

Udacity+Nutanix: Hybrid Cloud Engineer Nanodegree

Calls to action/next steps:

1.  Create a SFDC opportunity, quote a Calm+Services bundle, add a DevOps resource request
2.  Test Drive: Automation
3.  Calm bootcamps (+Karbon, +CI/CD, etc.) (Internal Only)

End User Computing Discovery Guidance (Revised 4/15/25)

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools.  

This list is not exhaustive but should be used as a guide to make sure you’ve made a proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution.  

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity.  

This is a live document, and the questions will be expanded and updated over time.

Revision History
2025.1  1st Revision – Kees Baggerman, Thomas Brown – 4-15-25
Nov 2020 Initial Publication – Lane Leverett


Basic discovery

1. What is the expected type of EUC workload?
 

Why Ask? Are we talking about VDI (Full Desktop), RDSH (Shared Desktop), and Application Virtualization (like MSIX, App-V, or Horizon App Volumes).  Please keep in mind that you need to ask this question for every different workload the customer needs. In most EUC projects there is not just one type of user requirement. You will find a lot of mixed workloads, like persistent and non-persistent desktops as well as application virtualization.
NCI-VDI licensing can help if the customer wants to run resource-intensive VDI workloads, like developers or VDI with vGPUs and DR scenarios.  

For more information about NCI-VDI: https://www.nutanix.com/library/datasheets/nci-vdi  

 

 2. Which vendor is used to manage and broker Desktops and Apps?
 

Why ask? The main vendors are Citrix and Omnissa. Every vendor has its own display protocol, which makes a difference in CPU usage. Citrix = HDX, VMware = Blast, Blast Extreme  

3. What is the expected Operating System Version? 

Why ask? Every new version of Windows has higher CPU and Memory requirements. Comparing an older Windows version to the latest version can make a big difference.  

Windows 10 Performance Impact Analysis: https://portal.nutanix.com/page/documents/solutions/details?targetId=TN-2113-Windows-10-Performance-Impact:TN-2113-Windows-10-Performance-Impact 

Overall Performance Impact Analysis: 

End-User Computing Performance Impact Analysis 


4.  What Office version is used? 

Why ask? For Microsoft Office, it is the same issue as Windows versions. Newer versions need more resources.  Office 2019 Performance Impact Results from LoginVSI: https://www.loginvsi.com/login-vsi-blog/98-login-vsi/907-office-2019-performance-impact 

Overall Performance Impact Analysis: 

End-User Computing Performance Impact Analysis 


5. What other applications will be used? 

Why ask? The applications used have a strong impact on the CPU. There might be single-threaded applications that need high clock speed, or you run applications that are multi-threaded and need a higher core count. Think Microsoft Teams, Zoom but also specific LOB applications like the Epic client, CAD/CAM apps or Bloomberg Terminal.  

6. What is the expected type of user? 

Why ask? In sizer, we ask for user types: task, knowledge, power user or developer. Every user type comes with a specific workload profile – memory, # vCPUs, vCPU:pCPU ratio, disk size. Sometimes the customer can give you details on the VM sizing, but not on the expected vCPU:pCPU ratio. Depending on the workload expected you can set the ratio.  

For more information read here:  

 

7. How many concurrent users will you have on that workload? 

Why ask? Concurrent user defines the number of active users. How many VMs need to run at the same time? Our VDI licensing is based on active VMs. You can have more users in an environment, but they can share resources if they don’t work on the platform at the same time.  This can impact how you size compute and memory but remember that storage may be needed for all possible users.  

8. What provisioning method is used? 

Why ask? Depending on the workload, the VMs can be persistent or non-persistent. Persistent desktops will be treated like normal VMs. Non-persistent VMs will have a different storage footprint since they share a single boot disk and have additional write cache disks, which will be deleted after a VM reboots. Citrix uses MCS (Machine Creation Service) or PVS (Provisioning). Omnissa uses  InstantClones. 

9. Where are the user profiles stored? 

Why ask? Using our own Files solution, we can provide storage for user profiles. Today, you will mostly encounter FSLogix profile containers or Citrix Profile Management, which still need an SMB share to be stored on and loaded during user logon. 

For more information: 

 

 

10. Do you need additional GPU support?  * (This may warrant engaging with a Solutions Architect or EUC Specialist for proper sizing and configurations) 

Why ask? To accommodate applications like CAD or requirements in number of monitors and high resolution you need to add NVIDIA GPUs. An overview of vGPU Profiles can be found here: 

 11. Are there any other special requirements? 

Why ask? Does the customer need RF3? Block Awareness, Rack Awareness, Storage encryption or replication) 

VM Details

1. How many vCPUs? 

Why ask? The number of vCPUs impacts the performance of the VM and the density of the host. Solution Engineering found the sweet spot to be at 3 vCPUs for a Windows 10 desktop, 4 vCPUs for Windows 11 or 8 vCPUs for Windows Server based Desktops 

2. What is the ratio between vCPU to pCPU? 

Why ask? See question 6 

3. What is the requested CPU size per User? 

Why ask? How many MHz does every user need to run the workload? It is very rare that a customer can answer this. It is more common in application virtualization environments, where a number of users share the same VM and its resources. 

4. What CPU is currently in use? 

Why ask? If the CPU currently used can handle it, you could choose the same clock speed. But keep in mind that in a virtualized environment, resources are shared, and you might have additional tasks running, like Files, which can impact the CPU. 

5.  How much Memory is required per VM? 

6. What is the disk size? 

Why ask? Depending on the provisioning method used you need the size of the Master Image (also called parental disk or Sandbox) and the Write Cache per VM. The write cache stores the temporary files written during the VM is active. For persistent VMs you need the disk size of the Master Image, which is then cloned into separate VMs. 

7. Are you planning to use microsegmentation to secure your VMs? If yes, what solution will be used?  * (This may warrant engaging with a Flow or Networking specialist or Solutions Architect) 

Why ask? Position Flow on AHV or remember to size for a NSX appliance on every host. 

For more information: 

 

 

8. Are you planning on using an App layering solution? 

Why ask? This saves the customer the need to manage many different master images. With App layering, you have one master image, and when a user logs in, the system will automatically attach additional disks containing the required applications. We can use shadow cloning to make those disks available locally. 

Read: 

 

 

General supporting Infrastructure

1. What Hypervisor are you planning to use?

Why ask? Different Hypervisors have different needs. If the customer chooses VMware, we may need to accommodate vCenter. 

2. Where will generic required services run? 

Why ask? By generic services, we mean AD, DHCP, DNS, Printing, licensing or application backend services. Some of them might be running in the cloud or on existing infrastructure. If running on the Nutanix Cluster take note of size and see Server virtualization questions. 

 3. Where will user profiles, home shares or App disks (if used) be stored? 

Why ask? This is an opportunity to position Files. Today, customers usually use a profile container to store user profiles. FSLogix is the most common solution used by customers since it is included in their licensing. Please be aware that Files Services running on the same cluster do have a performance penalty during login times. 

4. What is your DR strategy? 

Why ask? Every customer needs a DR strategy for their EUC environment. A great question to position NC2 , replication and our unique VDI licensing approach.
Also need to calculate additional resources in your sizing, depending on the customer’s strategy. 

More information: 

 

Citrix Infrastructure

1. Where do you plan on running your Citrix services? 

Why ask? Customers can choose to run all Citrix-related services (like Studio, Databases, Storefront) as a service in the cloud, managed by Citrix, or on-premises. If the customer chooses to run the services on premises, he can still run it on a different infrastructure. If he chooses to run it on the same cluster, please size additional server virtualization VMs. Guidelines on VM requirements can be found here: https://docs.citrix.com/en-us/citrix-virtual-apps-desktops/system-requirements.html
If the customer chooses Citrix Virtual Apps and Desktop (CVAD) Service you still need an additional Windows server as a Cloud Connector.

A typical on-premises implementation would need the following servers:
SQL deployment. What type of HA? (Always On, SQL Clustering with WSFC)
StoreFront Servers (HA, N+1)
Citrix Desktop Studio and Director (N+1)
Optional Provisioning Server (PVS) (N+1)
Network Load Balancer
Global Server Load Balancing
Profile Management Infrastructure (File services)
AppLayering Infrastructure
Any Endpoint Management technologies?
Failure domain sizes (Prism Central sizing)
Dedicated Infra Management Cluster or part of the Citrix cluster 

VMware Infrastructure

1. Where do you plan on running your Omnissa Horizon Services? 

Why ask? Customers can choose to run all Horizon-related services as a service in the cloud, managed by VMware or on-premises. If the customer chooses to run the services on premises, he can still run it on a different infrastructure. If he chooses to run it on the same cluster, please size additional server virtualization VMs.  

Guidelines on VM requirements can be found here: https://docs.vmware.com/en/VMware-Horizon-7/7.12/horizon-installation/GUID-858D1E0E-C566-4813-9D53-975AF4432195.html 

A typical on-premises implementation would need the following servers:
SQL deployment. What type of HA? (Always On, SQL Clustering with WSFC)
Unified Access Gateway Appliances (N+1)
vCenter (N+1)
Horizon Connection Server (N+1)
Optional View Composer (N+1)
Profile Management Infrastructure (File services)
AppLayering Infrastructure
Any Endpoint Management technologies?
Failure domain sizes (Prism Central sizing) 

Advanced Discovery

1. How do you optimize your image? 

Why ask? Image optimization is crucial in all EUC environments. Optimizing the VM using tools provided by Citrix, VMware, or vendor-independent vendors increases the host density and user experience.
Citrix: https://support.citrix.com/article/CTX224676
VMware: https://flings.vmware.com/vmware-os-optimization-tool 

2. What is your Antivirus Strategy? 

Why ask? The right AV solution can also have a massive impact on user experience and host density. If not done correctly all file operations lead to file scans, which increase CPU and IO on the host 

Databases

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.


Databases

Generic

 1.  Is this replacing a current solution, or is this a net new project?  What’s the current solution?

Why ask? This question helps us understand the use case, any current expectations and what the competitive landscape may look like

2.  Is the current environment coming to the end of a contract and due for contract renewal/hardware refresh?  how soon?

Why ask?  It helps us understand how serious the customer is about migrating and the drivers : usually cost and helps create a pipe-line.

3.  Is the current Infrastructure solution bare-metal/3-tier or virtualized or Engineered Appliance ? provide details
      a.  e.g.  AIX  , Solaris Sparc , VMware Virtualized ,  OVM/KVM, Exadata/ODA (Oracle)
      b.  FC SAN/speed/10GbE Ethernet/iSCSI , Storage Array : Vendor : All Flash/Hybrid
      c.  If possible use automated means to capture configuration and performance information to help with capturing as much information as possible (RVTools, Collector, AWR, LiveOptics, MAP, etc.)

Why ask?  To determine which environment is easier to go after as a starting point.

4.  What are the 3 major pain points in current environment(s)  ( other than end of life/contract).  Examples:
      a.  License Consolidation
      b.  Managing multiple GUIs ( need single pane of Glass)
      c.  Life Cycle Management/Patching
      d.  Performance
      e.  Storage Sprawl due to multiple copies
      f.  Provisioning

Why ask?   Helps us articulate Nutanix Value for Relational Database Workloads.

 5.  How many sites/env.  ( PROD / DR / QA/DEV/Test)

Why ask?   Helps us articulate a Disaster recovery/backup strategy.

6.  How are backups done today : Native or 3rd Party tools , leveraging Snaps/clones?

Why ask?  Whether using third party DR tools ( Zerto/Actifio/SRM) or native database replication.  Whether using third party backup   ( Commvault/VEEAM/VERITAS ) or native tools

7.  Workload types? :  OLTP (Online Transactional Processing) / OLAP (Online Analytical Processing) /DWH (Data Warehouse)

Why ask?   Helps us identify transactional, OLTP, vs Analytical, OLAP/DWH  ( latency sensitivity )

8.  Largest Database size?

Why ask?  Beyond 30 TB , hyperconverged virtualizing may not be beneficial. Need to understand use case

9.  Performance Characteristics  desired : Bandwidth / IOPS / Latency.  These can be given directly from the customer if known, or gathered using local operating system metrics (perfmon/top) or via a discovery tool or script like AWR for Oracle, or a tool like LiveOptics, SolarWinds, etc.

Why ask?  Accurate sizing

 10.  Type of Database Clustering used if Any

Why ask?   Determine if there are potentially any mission critical workloads

MSSQL

SQL Server Inventory Questions:

1.  Number of SQL Server Instances in the environment?

Why ask?  Inventory purposes and Era only supports a single SQL Server instance on the same host.

2.  Number of SQL Server databases in the environment?

Why ask?  Inventory purposes and also helps identify which databases are considered critical for AG (Always On Availability Groups) etc. , databases reside in an instance.

3.  Total size of SQL Server databases in the environment?

Why ask?  Inventory sizing purposes.

4.  SQL Server versions used in the environment?

Why ask?  Different SQL Server versions have different features, limitations etc and also different CU cumulative update levels.  SQL Server stopped issuing service packs in SQL Server 2016 everything now is a CU format. External SQL Server Edition and Version Comparison.

5.  Windows versions used in the environment?

Why ask?  Different Windows versions have different features, limitations and update levels that may affect SQL Server, also driver versions etc.

6.  SQL Server licensing model used in the environment Core, or Server/Cal?

Why ask?  This can help differentiate which licensing  model the customer is using and why.

7.  SQL Server High Availability and Disaster Recovery being used in the environment?   *(Depending on the complexity for HA or DR, this would warrant further discussion with a Database Specialist/Solutions Architect)

Why ask?  This can help determine if shared storage is used such as a SQL Server Failover Cluster Instance (FCI), or a SQL Server Always On Availability Group (AG) which does not require shared storage.  Also is there any multi site replication being used either as a physical storage layer or logical SQL Server layer.

8.  CPU model, type, speed allocated for current/existing SQL Server hosts?

Why ask?  Inventory sizing purposes for baseline.

9.  Number of CPU/Cores allocated for SQL Server hosts?

Why ask?  Inventory sizing purposes for baseline.

10.  Amount of Memory allocated for SQL Server hosts?

Why ask?  Inventory sizing purposes for baseline.

11.  Amount of storage allocated for SQL Server hosts?

Why ask?  Inventory sizing purposes for baseline.

12.  Storage type used for SQL Server hosts, flash, HDD, DAS, SAN etc?

Why ask?  Inventory sizing purposes for baseline helpful in determining expectations with regard to latency.

13.  Network allocation (speed, number of nics) for SQL Server hosts?

Why ask?  Inventory sizing purposes for baseline.

SQL Server Performance Questions:

 1.  What is the total max IOPS required for all SQL Server Instances?

Why ask?  The number of I/O service requests to use as a baseline for their current workload.

2.  What is the latency requirement for SQL Server?

Why ask?  The response time requirement to use as a baseline for their current workload.

3.  What is the bandwidth requirement for SQL Server both read/write?

Why ask?  The throughput requirement to use as a baseline for their current workload.

4.  What is the current SQL Server workload profile read/write ratio?

Why ask?  This helps determine what their workload profile is like and how it will affect our platform (reads are local, writes incur node replication cost) as a baseline.

5.  What is the SQL Server average IO size?

Why ask?  This helps determine what their workload profile I/O size is related to bandwidth .

6.  Top current SQL Server wait statistics during peak workload?

Why ask?  This helps determine what SQL Server is waiting on to process transactions, where there may be a bottleneck.

7.  Current customer Microsoft SQL Server pain points?

Why ask?  This helps narrow the focus and develop a relationship with the customer.  It also assists in focusing on how Nutanix can help alleviate those specific pain points and gives information about how the solution can be shown to resolve those particular pain points.

Oracle

 1.  License Entitlement  ( Cores/NUPS/ELA/ULA/bundled licensing)?

Why ask?  Oracle licensing is expensive and customers want to make the best use of their entitlement when replatforming and not spend more $$ on new licensing when doing a new solution.  Customers are also looking forward to reducing their Oracle License overhead .

2.  Type of Licensing used? :  STD / Enterprise and other options (RAC/Partitioning …etc.).  Each is a paid item.

Why ask?  There may be possibilities to eliminate some Options by using Nutanix Features such as Compression, Encryption, Replication

3.  Is the customer ready to run a “SQL script” or provide details of the environment using RVTools/Collector?

Why ask? When inventorying an Oracle DB environment, you can use the Automatic Workload Repository (AWR) report to gather detailed inventory and performance statistics for an Oracle Database.  Nutanix has an AWR script that can be run to capture the necessary information and is able to be downloaded from within the Sizer Tool.  When adding a Workload select Import, then click the AWR tab and you will see the AWR SQL Script download link.  Once run, you can then upload the output using the Upload File option.

4.  What are the main pain points in the current environment?

5.  When moving to Nutanix would you consider AHV as a hypervisor?

6.  Have you been introduced to Era?

Era

1.  How do you do DB provisioning today and how long does it take to provision a multi-node database cluster?

Why ask :  To find out customer operational efficiency for provisioning . Era can help improve this from weeks to hours.

2.  How many dev/test copies of databases do you have for a your PROD instance(s)?

Why ask : Customers make multiple “full copies” of PROD for test-dev dev/test and use up to 5-10 times the space they need . Era will help in creating space optimized clones of database with “rapid speed”  

3.  What is your typical clone refresh interval and time it takes to refresh a DB clone?

Why ask: Customers using traditional techniques to refresh a copy of a database from a RMAN backup , takes multiple hours and is usually done once a month .  With Era , they can clone everyday or multiple times a day in minutes.

4.  How do you do your Database Patching (Oracle)?

Why ask :  Oracle patching is a huge pain point in large Oracle environments. Era provides a unique way to do “fleet patching”  which will help save 100’s of man hours spent in traditional patching

5.  How do you migrate Databases when required ( Oracle)?

Why ask : Migration is an involved process and a lot of planning and time is required for migration.
Era provides an easy method to “replicate & migrate” databases (Same version) for same-endian formats. ( Linux->Linux or Windows ->Linux)

6.  What is your choice of  Database Replication  (infrastructure/database/hypervisor based)?  Please elaborate.  * (Depending on the complexity of the environment, this would warrant further discussion with a Database Specialist/Solutions Architect)

Why ask:  customers are looking to reduce their software licensing cost of database replication and will look for opportunities to replicate using infrastructure (nutanix replication) . era enables cross-cluster replication including replicating to a NTNX cluster in AWS cloud in an upcoming release 2.0

7.  What are the database engines they currently use?

Server Virtualization

What is a Server Virtualization sizing? 

This is the most common workload along with VDI. This can be used for any web app which needs to be sized. Each workload or the application which is to be migrated to the Nutanix software stack is a VM with its own CPU/RAM/Capacity requirements. To simplify for the users, Sizer has set profiles (small,medium,large ) for the VMs but customizable as per the actual application needs.

 

What are profiles in Server Virtualization in Sizer?

Profiles are fixed templates with pre assigned resources in terms of vCPUs, RAM, SSD, HDD to each profile. Broadly, small, medium,large profiles will have different allocation of these resources.

The idea is to facilitate users with the details of a workload (that is a VM)  so they cna quickly fill in number of VMs and Sizer will do the necessary sizing.

Small VM profile template:

Medium VM profile template:

Large VM profile template:

 

What if my VMs are different? Have differen values? 

While these templates and their values are general guidelines, these are customisable.

Clicking on the Customize, opens a  pop-up for user entered values:

ECX savings

What is ECX [Erasure Coding] in Nutanix ?

The Nutanix platform leverages a replication factor (RF) for data protection  and availability.  This method provides the highest degree of availability because it does not require reading from more than one storage location or data re-computation on failure.  However, this does come at the cost of storage resources as full copies are required.

To provide a balance between availability while reducing the amount of storage required, DSF provides the ability to encode data using erasure codes (EC).  Similar to the concept of RAID (levels 4, 5, 6, etc.) where parity is calculated, EC encodes a strip of data blocks on different nodes and calculates parity.  In the event of a host and/or disk failure, the parity can be leveraged to calculate any missing data blocks (decoding).

The number of data and parity blocks in a strip is configurable based upon the desired failures to tolerate.  The configuration is commonly referred to as the number of <data blocks>/<number of parity blocks>.

How is ECX savings calculated in Sizer ?

Sizer follows the Nutanix Bible and its guidelines for ECX savings.

Below table shows the ECX overhead vs RF2/RF3 for different nodes:

The expected overhead can be calculated as <# parity blocks> / <# data blocks>.  For example, a 4/1 strip has a 25% overhead or 1.25X compared to the 2X of RF2.  A 4/2 strip has a 50% overhead or 1.5X compared to the 3X of RF3.

 

How does Sizer calculate ECX savings from the above: 

Lets take an example where the cold data for workload is 100TiB.

Also, we will use RF2 as the settings chosen for workload.

So depending on the size of the workload, if the total node recommended came to (lets say 4 nodes), as per the above table: data/parity is 2/1.  So 1.5x overhead for ECX as against 2 for RF2 , thus 50% savings.

For conservative approach and to be on safe side, we only consider ECX for 90 % of the cold data.

ECX applied on 90% of 100TiB = 90TiB

How much ECX savings: 50% = 50% of 90TiB = 45TiB