July 2021 sprints(Thresholds )

Sizer Thresholds – What has changed and Why?

 

What are thresholds in Sizer? 

Sizer has a feature called thresholds. These are defined individually for each of the sizing resources – cores, memory, SSDs, HDDs & GPUs (wherever applicable). These thresholds ensure that the total available resources in the nodes(cluster) are sufficient to meet the workload requirements and also account for some buffers for the unforeseen surges in workload applications

What has changed in thresholds?

Up until July 2021, the threshold defaults across cores/memory/SSD/HDD used to be 95% as can be seen(and modified) under the policy screen as shown below.

Note that the default was set to 95% which is also the maximum allowed. Users can go for a lower threshold (more conservative sizing with more buffer for future spikes). However, under no circumstances, sizer allowed to go higher than the default – greater than 95% –  to provide for a 5% margin for accomodating sizing errors/estimates and workload usage uncertainties.

Starting August 2021, Sizer would be changing the defaults for these thresholds to 85% across ALL resources[cores/memory/ssds/hdds) as shown below.

Note that the defaults have moved left to 85% , however, the maximum allowable utilization of the cluster resources still remains at 95%.

Why?

Why the change?

Having the maximum allowable and default both at 95%  at times did not provide enough margin for sizing estimate errors or unforeseen workload usage or spikes as only 5% left.  Given making accurate estimates is hard, we felt it was prudent to provide more slack with a 85% threshold.

To be clear though, many sizings have been done successfully at the old 95% level.  This move was also supported by Sizer users doing manual sizings who often opted for more slack.  This change was done to be more prudent versus any sizing issue.

When is it best to leave it at 85% Threshold

We feel for most sizings this is the more prudent level.  Allows for more room for estimate errors and for that matter customer growth

When might it be fine to go to 95% Threshold

Certainly numerous sizings have been done with 95% threshold and customers were happy.   We still do allow 95% to be the threshold.  These are the N+0 thresholds and so at N+1 there is a lot more slack.  The 95% level hits when one node is taken offline like for upgrades.  If the customer does upgrades during off-hours, their Core and RAM requirements are a lot less than normal and do not hit the higher threshold anyway.    Again we feel it is more prudent to leave it at 85%, and going higher just means you need to be comfortable with your sizing estimates and especially when the cluster is at N+0 (during an upgrade) 

What are the implications to existing sizings? 

First-the new sizings :

All new sizings (effective 9th August 2021) will have default thresholds at 85%. And since it is a significant change which impacts ALL new sizings and ALL users(internal/partners/customers), there will be a BANNER displayed prominently for two weeks for general awareness.

 

Implications to existing sizings : 

There will be NO impact or implication to the sizings created before 9th August 2021. The existing sizings would continue with the default threshold of 95% and would calculate the Utilisation %ages,  N+0,1 etc based on the previous default threshold of 95%. Thus, there won’t be any resizing or a new recommendation for existing sizings and those sizings and their recommendation holds good for that scenario.

Cloning an existing scenario: 

Cloning an existing sizing will be treated as a new sizing created after 9th August,2021 and thus, new sizing rules and default thresholds will apply.

One implication of this can be that there will be an increase in utilisation %ages across the cluster resources. This is because now, only 85% of the resources would be considered available for running the workload as against 95% earlier. This unavailability or in other words – reservation – of additional 10% of resources may drive to a higher node count (or make an existing N+1 solution as N+0) in some edge circumstances.

User can choose to resize for the new defaults , which may lead to higher node or core count – but that is for the better-as explained above-providing for margings and spikes – or- since it is a clone for an exsiting sizing which may have been sold to the customer – user can , alternatively, go to the threshold setting and move it to the right- back at 95%- which would then give back the same recommendation as the original sizing.

Collector 3.4.1

Nutanix Collector 3.4.1 is the version of Collector which fixes a few critical issues. Along with Collector 3.4.1, there have been continuous updates to Collector Portal as well. The details of the resolved issues are captured below:

RESOLVED ISSUES

Resolved an issue where the collection fails in Hyper-V due to hosts with missing fields.

Resolved an issue with the vCenter collection where the IOPS data is missing.

Resolved an issue in the computation of Median, Average, and the 95th percentile values of the VM Summary Performance Chart.

WHAT’S NEW

vCPU:pCore ratio now available in the VM Summary tab

PORTAL ENHANCEMENTS

Enabled sorting and filtering for all the fields in the VM list tab – both in config view & performance view.

Ability to customize the criteria of VM Provisioning Status

vCPU:pCore ratio now available in the VM Summary tab

Other Resources

For more information please refer to Nutanix Collector 3.4.1 User Guide, Security Guide, Release Notes & Nutanix Collector Portal User Guide

For any other queries related to Nutanix Collector, please refer to Collector FAQs

Download URLs

Public URL: Collector Download Link for Prospects
MyNutanix URL: Collector Download Link for MyNutanix Users 

Collector 3.4

Nutanix Collector 3.4 is an updated version that provides the VM provisioning status of the VMs, contains multiple bug fixes as well security enhancements. Along with Collector 3.4, there have been continuous updates to Collector Portal as well. The details of the new functionalities and benefits are captured below:

WHAT’S NEW

VM Provisioning Status in VM Summary & VM List

A new donut providing breakup of VMs by Provisioning Status is now available VMs Summary tab.

VM Provisioning Status column is also added to the VM List tab. The column lists the provisioning status of all the VMs.

Sorting of VM List

VM list tab can now be sorted based on the column values along with the filtering capability

Collector output exported as XLSX now includes the VM List sheet

Exported XLSX includes an additional sheet for VM List that allows you to look at individual VMs and their attributes.

VM level IOPS value available for vCenter Server

In the case of vCenter environments, the VM level IOPS values are available.

Updated column headers in XLSX export

Improved the column headers in exported XLSX by adding relevant units and renaming a few column names.

Collector output exported as XLSX now includes a Metadata sheet

Exported XLSX includes a Metadata sheet that captures information relating to the version of the Collector used to capture the data.

SECURITY ENHANCEMENTS

Windows Bits are now signed

The Collector bundle for Windows is now digitally signed to confirm the software author and guarantee that the code has not been altered or corrupted.

AES asymmetric encryption for enhanced security

AES asymmetric encryption is used for enhanced security while uploading the Collector data to the Collector Portal.

Collector download bundle now includes the Collector Security Guide

The Collector Security Guide is now included in the Collector download bundle.

RESOLVED ISSUES

Resolved an issue that caused login issues when non-ASCII characters are used in the password.

Resolved the drive capacity and quantity issues in the “Host Summary” screen in the case of Prism environments.

The column header units in the XLSX export are updated from MB to MiB for accuracy.

Other Resources

For more information please refer to Nutanix Collector 3.4 User Guide, Security Guide, Release Notes & Nutanix Collector Portal User Guide.

For any other queries related to Nutanix Collector, please refer to Collector FAQs.

Download URLs

Public URL: Collector Download Link for Prospects
MyNutanix URL: Collector Download Link for MyNutanix Users 

Working in Sizer with Collector Data

Once you have gathered the workload requirements using Collector, you can generate the Nutanix solutions that are best suited for your workload needs by either

  1. Export the Collector data to Sizer via the “Export to Sizer” option within the Collector Portal or
  2. Import the Collector data in Sizer via the “Import Workloads” option within the Sizer Scenario

Note: Apart from Nutanix Collector, Sizer supports importing output from RVTools and Nutanix Insights. The concepts and parameters remain the same irrespective of the import source. 

There are various options while feeding the Collector data to Sizer. This help page will describe how and when to use the different available options.

In the following section, we will be exporting the data from the Collector Portal to Sizer. To export the data to Sizer, navigate to the desired Project in the Collector Portal, click Export, and select the Export to Sizer option as shown in Figure 1 below.

Figure 1: Exporting Collector Project to Sizer

On clicking “Export to Sizer” you will be shown a pop-up dialog asking for Sizer scenario details as seen in Figure 2 below.

Figure 2: Specifying Scenario Details

In the first step, you can specify the Scenario Name and Account name. Optionally, you can specify the Opportunity details are well. The default name of the scenario is the same as the Collector Project name.

Note: Account and Opportunity detail fields are limited to Nutanix Employees and not visible to partners. 

In the second step, the user can specify all the “Sizing Preferences” as seen in Figure 3 below

Figure 3: Specifying Sizing Preferences

At a very high level, the sizing preferences are based on the number of VMs and their requirements – the compute, memory & storage.

Now, let’s start with the “VMs” option – Powered ON only vs Both powered ON and powered OFF. Nutanix Collector gathers configuration and performance data of all the VMs managed by the IP address specified while establishing the connection. If there are VMs that you don’t wish to size, you can optionally turn them off using the “VM List” tab before exporting to Sizer.  In most scenarios, we recommend the users to choose the “Powered ON only” option but you do have an option to select “Both powered ON and powered OFF” VMs in case you plan to run all the discovered VMs on the Nutanix solution.

Figure 4: Export VMs – Powered ON only vs Both powered ON and powered OFF

Warning: When the data gathered is that of a VDI workload, we highly recommend our users carefully look at the CPU utilization values of the powered-off VMs under the “VM List” tab. The VMs may be powered off when Collector gathered the data but these VMs may have been running intermittently and might be consuming resources. If you miss out on exporting these VMs, you might end up under-sizing the solution. 

Next, let us look at the “Capacity” option – Consumed vs Provisioned. This option lets the users decide if they would like to size for storage based on the actual consumed storage or the storage that is provisioned for each VM. The selection purely depends on the actual storage utilization value and the future capacity needs.

Figure 5: Export Storage - Consumed vs Provisioned

Figure 5: Export Storage – Consumed vs Provisioned

If the current environment is over-provisioned on storage, we recommend the users go with the “Consumed” option.  Don’t worry about running out of storage if you go by the “Consumed” option, Sizer does take care of it and also ensures the recommended solution is capable of handling node failure.  Alternatively, you can always go with the “Provisioned” storage values if you are sure that the allocated storage is required or you want to take a more conservative approach.

Note: When sizing only powered-ON VMs, do ensure to size the storage of powered-OFF VMs. The VMs are powered OFF but the data of powered OFF VMs persists and needs to be accounted for during sizing. 

Now, the most critical part, sizing based on Configuration vs Performance. This option will decide the compute or cores within the solution. Again the selection of this option depends on the CPU utilization values. When we look at the data collected, we see a lot of existing solutions are oversized on the compute or cores. This is usually due to the bad job done during requirement gathering or a very conservative approach taken to avoid CSAT issues.

Figure 6: CPU Utilization under Cluster Summary

The cluster summary gives an overall CPU utilization of the cluster and the “Provisioning Status” donut under VMs Summary helps to identify the VMs that are over-provisioned or under-provisioned.

Figure 7: Provisioning Status under VMs Summary

The above two charts should give the users a fair idea and help users to decide on which option to go with when it comes to sizing CPU cores or compute. If the CPU utilization value is very low, we recommend going with “Performance” based sizing. When using the performance-based option, we highly recommend using 95th percentile CPU utilization values to ensure the workload demands are met all the time.

Figure 8: Export Workload CPU options – Configuration vs Performance

Warning: Selecting performance based on Median or Average may result in under-sizing the solution.

When using 95th percentile-based CPU utilization values, Sizer does factor in an additional 20% of compute requirements to ensure the workload spikes are met. The end Sizer solution has enough cores within the solution to ensure you are not short on CPU cores and the solution can handle node failures as per desired resiliency levels (that can be tweaked in Sizer).

The next preference is the “Create Workloads by” option – you can either choose Profile or VM-based workloads. Ever since the launch of “Export to Sizer” we supported “Profile” based workloads. This approach would categorize each collector discovered VM under predefined VM buckets in Sizer – XS, S, M, L, XL.

Figure 9: Export Workload – Profile-based vs VMs based

Beginning in 2021, we have started supporting “per VM” based workloads, this results in a much more accurately defined workload feed to Sizer.  We highly recommend the “per VM” based option to our users. This option currently has a limitation and supports only up to 500 VMs. If you are trying to export more than 500 VMs you will have to go with “Profile” based workloads.

Retaining VM to Cluster Mapping is another handy preference – the option is unchecked by default. By default, exporting to Sizer consolidates all the VMs in one cluster (Cluster-1), this is good if the customer wants to consolidate the VMs within one cluster. But if the users want to recreate the same cluster and VM mappings in the Nutanix solution, you can always use “Retain VM to Cluster Mapping” to ensure AS-IS VM to cluster mapping is retained in the Nutanix Sizer solution.

The final option is the  Flash percentage – the Flash percentage option allows the user to control the Flash storage requirements.

Figure 10: Specifying workload Flash Requirement

By default, the storage requirements are split into 90:10, 90% HDD requirements, and 10% flash requirements. Depending on the workload nature, the users can tweak this percentage as desired. If the users prefer an all-flash solution, the slider value can be set to 100%.

Now, we are all set to hit the “Done” button to export to Sizer. On successfully exporting to Sizer, you can see the newly created Scenario URL in the banner as shown in Figure 10 below.

Figure 11: Banner showing Sizer Scenario URL

Successful export will also generate a CollectorToolsSummaryReport.xlsx with all the details of the VM data.

Hopefully, this help page provides you with more clarity on the options to choose when exporting Collector data to Sizer. By the way, you will also see the same options when you are trying to “Import” workloads in Sizer either via Collector or RVTools data.

If you have any queries feel free to reach out to Collector Support via collector@nutanix.com or alternatively you reach the Collector team via the Nutanix Community Page – Sizer Configuration Estimator

For more information please refer to Nutanix Collector User Guide, Security Guide, Release Notes & Nutanix Collector Portal User Guide

For any other queries related to Nutanix Collector, please refer to Collector FAQs

Sizer 5.0

Sizer 5.0 is the latest version of Sizer going live on 24 Feb 2021

Whats New?

Three major features in Sizer 5.0 :

1. Multi recommendation

Sizer to now have an option to recommend more than one solution for a given workload, depending on the price range.

2. Sizer Policy

These are the recommended cluster settings based on the deployment environment for the cluster being sized. Sizer strongly recommends to go with the default settings for the chosen environments , however, it allows you to make modifications to adjust to a given requirement.

3. Advanced Cluster settings

These are advanced filters to narrow down the sizings to a more specific solution, providing for greater flexibility and ability o accomodate specific customer requests.

 

Sizer journey to 5.0:

From single workload to multi workload to multi cluster to finally multi recommendation with Sizer 5.0

 

Multi-era  for Sizer: 

 

  • Multiple Workloads
      • Bulk Edits – Ability to update, delete, disable, enable many workloads at once
      • Enable our next move towards Collector-driven sizing where Collector feeds Sizer with 100s of workloads to create the  most precise sizing

 

  • Multiple Clusters
      • Cluster Settings – Ability for each cluster to have its own settings for common characteristics
        • CPU speed, NIC, Max nodes, thresholds, etc
        • This allows each cluster to be optimized for specific workloads
      • Sizer Policy – Apply best practices defined by experts for different environments
        • Settings for Test/Dev, Production or Mission Critical Environments
  • Multiple Recommendations

  • Cluster Settings gives the user control so Automatic Sizing gives desired recommendation
  • Multiple recommendations then allow user to play with the results
    • Cost Optimized, Compute Optimized, Storage Optimized solutions are provided
    • Each can be further tweaked by the user

 

Sizer 5.0 – Multiple Recommendations

 

Toggle between multiple recommendations that fit your cost tolerance

  • Cost Optimized – lowest cost-default option
  • Compute Optimized- most cores within cost tolerance
  • Storage Optimized – most HDD/SSD within cost tolerance

  • Cost Tolerance
  • This is advanced settings in cluster
  • Allows to select a price delta(from the cheapest)
  • Triggers multi recommendation within the price range

 

Sizer 5.0 – Policy

Why..

Different deployment environments might have different needs in terms of availability/resiliency and performance

What..

These are the recommended cluster settings based on the deployment environment. Brings consistency of sizings for a given environment.  

How.. 

Each cluster to have one of below policy

  • Test/Dev, Production, Mission Critical
  • Allowed to edit the policy settings 

Apply your own Sizer Policy for cluster characteristics

  • Maintenance Window requirements
  • Network speed
  • Minimum Compression
  • Other settings

Each cluster can then follow operational policy

  • Test/Dev, Production, Mission Critical
  • You can edit the policy to better meet customer needs

Sizer 5.0 – “Customize” the auto 

  • Tweak the auto recommendation through “Customize” option 
  • Allows incremental increase or decrease of the selected resource
  • Checks for valid /qualified combinations when tweaking( for ex: if tweak violates SSD/HDD rule or balanced memory config, won’t allow 
  • Shows the cost delta for the customized solution
  • Stay on the Solutions page while playing with options.

Sizer 5.0 – Advanced Cluster settings

Minimum CPU frequency:

  • This will ensure the sizing recommends only the processors above the quoted frequency
  • Helpful if customer is keen on certain range of processors for performance reasons 

CVM/node

  • The values here will override the default CVM overhead applied by Sizer -1
  • Allows customer to provision more cores/ram to CVM in case of performance sensitive workloads

Cost Tolerance

  • Allows for a price delta (from the lowest cost)
  • Recommends more than one(default) solution-2
  • Cost optimized :default-lowest cost solution
  • Compute optimized : most core heavy solution in the price range
  • Storage optimized: most capacity heavy solution in the  price range

 

Short demos on Sizer 5.0 features:

Sizer 5.0/5.1 overview:

Multi Recommendation:

Maintainance Window :

Sizer Policy : 

SAP HANA sizing guidelines

Sizing for SAP HANA  

Introductory video of SAP HANA in the sizer (non-ST VPN connection required).

https://nutanixinc.sharepoint.com/:v:/t/solperf/solperf_library/ETb67HebgR9Kg32r7jxTO3wBdXoHuOCCjvK2oTasctOgIQ?e=z4Epr2

Note:

  • Only use for SAP HANA based applications, not legacy SAP applications which make use of MSSQL, Oracle etc.
  • Multi-node SAP scale-out is not supported (used for larger SAP BW instances)
  • This sizing process does not vary for scale-up between AHV & VMware implementations
  • No spinning disks are used within a Nutanix cluster being used for SAP HANA
  • Any questions, support, or areas not covered – please use the SAP Slack channel

Supported Platforms:

  • Only Dell, HPE, Fujitsu and Lenovo are supported for SAP HANA, not NX.
  • If another OEM is selected, SAP HANA will not be shown as an available workload

 

Defaults

  • RF2 is used (RF3 is under testing, so not selectable in Sizer)
  • Compression is disabled, and not typically of value for SAP HANA
  • Higher default CVM resource is reserved

HANA Inputs

  • NVMe can be added for higher IO loads, such as a high usage SAP S4/HANA
  • Cost/Performance largely drives cpu choice. Ideally an implementation’s potential compute load in SAPS would be known. Please reach out for support in estimating and reviewing such information.

Environments

There would typically be two environments within a Nutanix cluster where production and non-production are mixed. Production rules should be applied both to all production instances, and any other instances that should be treated as production. This might apply to a QAS/Test environment and will typically apply to any DR cluster.

Production:

  • For most SAP applications (e.g., production S4) there is an SAP HANA database, and one or more application server instances. Some uses of SAP HANA do not use an application server, in which case just use a small one in the sizing exercise.
  • In addition to the Application Server instances, and the SAP HANA database, a small VM called the ASCS is often called for. This ASCS would be around 2c/24GB RAM/100GB disk.
  • Generally, production has two or more application server instances. Typically, 2 – 6 cores, with around 24GB/core. Multiple instances for larger loads. Small storage space requirement for os & application image.
  • For a downtime requirement of less than 20 minutes, a pair of SAP HANA instances should be sized.
  • There is no over commit of cpu or memory
  • Servers must have all memory channels filled and balanced, so 6 or 12 DIMMs per cpu. – Sizer auto recommendation enforces this consideration
  • L suffix cpus are required for largest memory instances
  • Available storage for SAP HANA should be around 2.5x to 3x memory (3x is used in Sizer)
  • Production rules – SAP HANA instances are on whole dedicated cpus and so cannot be allocated to the CVM cpu
  • HANA System Relication(HSR) – is exactly a copy of the HANA VM. In Sizer, add another HANA VM if implementing the HSR.

 

Non Production:

 

 

  • QAS/Test landscape tends to match nonPRD for size of instance
  • If an operating system HA cluster is used in production, there is typically at least one such cluster outside of production also – used as a testbed.
  • Each SAP solution would normally have two or three non-production landscapes
  • Solution Manager (SolMan) is often overlooked, and not asked for. It is a required instance in the overall deployment and would be sized in PRD with one SAP HANA instance and an application server instance. Another such pair for QAS/test. No HA clustering would be required.
  • DEV, SBX etc. are usually subsets in memory size.

 

Feb 8, 2021

First release of SAP HANA in the sizer:

 

Business Continuity and Disaster Recovery

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.


BCDR

 1.  Has your organization done a Business Impact Analysis of your applications and workloads, and would you be able to share that with us?

Why ask? A business impact analysis should have rated the customer’s applications based on business criticality and may have informed them as to which are the most important applications and what level of data loss and time loss would be acceptable, what the impact in both revenue and non-revenue would be, what the systems and dependencies of those critical applications and processes are.  This information would be HIGHLY beneficial to have in designing a DR solution for your customer.

2.  If a Business Impact Analysis is not available, please breakdown applications and their corresponding recovery point objectives and associated recovery time objectives.  Also breakdown different retention requirements for on-prem and the recovery site.

Why ask? Recovery point objectives lets us know what Nutanix products can be used to meet the objective and the RTO (recovery Time Objective) may cause the design to shift depending on how fast the application needs to be back up and running. It is also good to get this backed by the business versus the IT department making the decisions. We should also mention that DR and Backup are different. The retention for DR should be as long as is needed to meet the RTO. If you have a very long RTO it is probably best served by a more cost effective backup solution.

 3.  For the applications listed please list the change rate for your applications. You can use existing backups and the deltas between them to find out the change rate. For any aggressive RPO that may require near-sync you will also want to know the write throughput for the application.

Why ask? Write throughput is the best indicator for network bandwidth throughput with near-sync.

 4.  What backup and/or DR products do you currently use?

Why ask? Helps to find any synergies or aid to move them to products which are more Nutanix friendly.  Example: if they are using Zerto, they may be great candidates for near-sync.

 5.  In the event of a failure, who is on-call to help with the restore. Will everyone be trained on recovery plans and procedures?

Why ask? This should help the customer to consider the operational implications of DR and help them see the need to simplify the recovery process. When bad events happen everyone needs to feel comfortable and empowered to help with the restoration.

 6.  What are you using today for Disaster Recovery and have you tested the solution to verify it works as expected?  How has the solution met or not met your requirements and expectations?

Why ask? Discovery to uncover the current topology and the how it is currently performing.

 7.  Are there any other systems that need to be taken into account for a proper disaster recovery plan that would be outside of the Nutanix infrastructure? (physical servers, mainframe, other physical devices)

Why ask? This is to make sure that the customer has thought through the entire scope necessary for a DR based on their requirements.

8.  What type of disasters are you planning for?

Why ask? Depending on the type of disaster scenario that could involve different products or operational procedures (for example, think of the difference of a disaster like a pandemic where all your systems are working, but your workforce can’t come in the office versus a geographic event like an explosion or earthquake in comparison to a ROBO office server closet being flooded, or even a ransomware attack).  Help the customer walk through different scenarios and possible reasons for needing to invoke a recovery plan and how we can help give them a recover in depth strategy (snapshots, backup, remote DC DR, stretch clusters, Cloud Provider DR, DRaaS with Xi Leap)

9.  Can you sites run independently? If so, how long?

Why ask? This helps to determine the criticality of each site

10.  What is your Retention Policy, and who made that decision (regulatory or self-imposed)?

Why ask? This helps to understand what is driving the need for retention and if it is internal or externally mandated.

11.  Do you have any regulatory requirements such as HIPAA, PCI, SEC, SOX?

Why ask? This will impact the particular design of the recovery solution, what features and controls will need to be in place (encryption, data sovereignty, RBAC, Logging, etc.)

12.  Do you need immutable copies of the data/VMs?

Why ask? These questions allow SE to determine sophistication of customers as well as how long they need the data. If there is a 7 year policy, customers will most likely need a 3rd Party backup tool in order to tier the data to an Object Store that supports WORM (Write Once Read Many).  This could also stem from requirements to help mitigate the risk of ransomware.

13.  Is this going to be a net new DR Solution?

Why ask? Do we have the flexibility to design a net new DR strategy?

14.  Will the source and destination clusters both be Nutanix clusters?

Why ask? If Nutanix is not the source and destination, then it will cause constraints for the design of the DR Solution.  (i.e. it will force hypervisor choice, replication can’t be array based but will need to be done with a separate software product like SRM or Zerto).  It may also require licensing changes

15.  Do you have a requirement for the backups/DR to be on separate hardware?

Why ask? Understand how to architect and select the infrastructure for the backup and DR targets

16.  What does your desired replication topology look like?

Why ask? Gives us the information on whether we need Professional or Ultimate AOS licenses and helps to map out the topology for replication (i.e. A->B->C; A->B; A->B and A->C).  This also allows us to discover if sites are active/passive and exactly what the definitions for active and passive mean (i.e. a data center with power, or VMs ready to power on, or VMs already powered on and able to switchover immediately, etc.)

17.  Does DR need to be the same performance as production?

Why ask? This helps to give an understanding of if DR is a checklist item for them or a significant business requirement (i.e. they’ll lose significant money in the event of any downtime of the production site).  Allows you to be able to size the solution appropriately.  Ensure to get sign off if they do decide to allow for DR to be undersized.

18.  Do you need to have replicated copies of all the VMs but only plan to restore a subset?

Why ask? Helps determine potential licensing requirements and size of target cluster.

19.  How often do you test your disaster recovery plan and what does the plan look like?

Why ask? This will help with understanding if the customer has actually validated any disaster recovery plan that they have implemented to ensure that it will actually work.  Oftentimes folks are optimistic with regards to how well their plan will actually work, so having tested it brings a sense of reality to the plan and can help them course correct.

20.  Do you have existing runbooks that are used in the event of a disaster?

Why ask? Knowing this will help us understand their processes in the event of a disaster.  Also, if they don’t have any runbooks this will let us know we may need to help them out in putting a plan together that they can use and test.

21.  Databases: How are you currently backing up and protecting your database environments?

Why ask? This can give us information about how they are protecting their most critical assets.  They may already be using or licensed for database level replication or clustering technology which would give them more granular levels of control than a traditional storage replicated VM.  This can also help as part of the discussions around RPO and RTO for these more critical systems which may need higher levels of availability than other parts of the infrastructure. 

22.  What hypervisor(s) are you using? Are you open to cross-hypervisor DR?

Why ask? This will let us know what replication products we can use (if the source or destination cluster is non-Nutanix) and can open up the possibility of leveraging AHV for the target cluster if they are using ESXi on Nutanix as the source/primary cluster. 

23.  Do you have a separate network for DR traffic? Do you require encryption on those links?

Why ask? This helps with understanding if network segmentation is necessary to be configured in Prism for DR Replication traffic and whether or not the customer needs to supply encryption in flight for that network.  

24.  What is the current bandwidth between sites that you plan to use for DR replication? Also, what is the latency between those sites?

Why ask? We need to know how big the pipes are between sites so that we can ensure that the RPO the customer has defined as their requirement will be able to be met based on the rate of change.  Also ensuring that the latency between sites meets the minimum requirements listed for Metro Availability or the Metro Witness.  

25.  What is the current rate of utilization of the network links between the sites you plan to use for replication traffic?

Why ask? These links may be used for other traffic and could impact the available bandwidth that you assume you will have access to for replication traffic.  See if you can get utilization over a 30 day period, and if possible over several months to see any trends of increase or decrease in utilization.

26.  How do you handle IP addresses on the recovery site for VMs that have failed over?

Why ask? This allows us to discover what type of networking failover scenario(s) the customer would prefer to use: Overlay Networks; Stretched Layer 2 subnet between sites; perform a full subnet failover from the primary to secondary site by updating routes to point to the new recovery site; allow IP addresses to change when failed over (this can cause obvious challenges of broken applications that hard code IP addresses, updating of DNS and cache entries, etc.).

November 2020 Sprints

November 24  – Collector 3.2

Hi everyone, we just went live with Collector 3.2.. major highlight being able to run the tool in local and remote mode for Hyper-V environments Hyper-V local support :
  • Collector now supports running the tool against a Hyper-V cluster directly from the Hyper-V hosts locally. UI has option to choose from Hyper-V (local) and Hyper-V (remote)
  • Collection can be done by downloading the tool in any of the hosts which are part of the cluster we wish to collect data from and choosing Hyper-V local in the drop down menu.
  • With both remote and now local collection option, it provides for greater flexibility in switching the mode in case of connectivity/access issues with remote setup. (particularly for Hyper-V as it connects directly to cluster hosts and not management APIs unlike vCenter)
  • This version supports Hyper-V clusters for both local and remote mode. Support for standalone Hyper-V hosts (not part of a cluster) is in plan.
Precheck  :
  • A precheck script is bundled within the tool which can run a few checks to see if the expected services are available and if other prerequisites are satisfied.
  • Upon running into the error screen, the tool will redirect to the script location which can be run on the host to get the relevant data
Usability :
  • The login page now has a drop down to choose the flow that is, vCenter, Prism, and Hyper-V (remote) and Hyper-V (local), and the default ports are populated upon selection.
  • VM Summary table – shows both consumed and provisioned storage across all the cluster VMs
  • The tool now accepts hostname (apart from host IPs) for connecting to the Hyper-V host instance is now supported. The previous limitations have been removed.
  • Improved error messages/log enhancements
We now have a dedicated Collector page with the latest 3.2 bits and documents – User Guides, Release notes , here :
https://portal.nutanix.com/page/downloads?product=collector
We went live with the latest sprint, below the major highlights.. Proposals:
  • Updated slides on quarterly financials w/Q4
  • Now includes the Backup cluster / DR cluster details along with primary  workload cluster, includes the config details and utilization dials
  • HW spec slide added for NX-Mine specific appliance : a subset of the standard NX HW spec
Sizing enhancement:
  • SQL workload supported on Nutanix clusters on AWS
Usability:
  • Bulk Edit: I/O input fields option added for bulk edits for Server Virtualization and Cluster sizing(Raw)
  • Storage calculator updates including new drive options – 16TB HDDs [support for 320TB nodes]
  • Validator support for new NEC and KTNF platforms
  • Changes to Solutions summary UI – Cluster in a separate row/ consistent with workload summary UI
  • New partner roles added for partner specific HW vendor visibility
Product updates:
  • HPE DX: New AMD platform support – DX325 Gen10 8SFF
  • mCPU/lCPU-DIMMs rule update for across vendors
  • Dell XC:  GPU with NVMe restrictions removed, now both can be in same config

November 3

Hi everyone, we went live yesterday with the current sprint, below the major highlights:GPU Dials:
  • You will see a 5th set of dials- for the GPU – (for nodes/workloads) requiring GPUs, of course.
  • The dials show the utilization %age and cluster fail over considerations just like for cores, ram etc.
  • The additional dial will feature in the BOM as well for GPU workloads
320 TB node support
  • For Objects and Files Dedicated workload, the node limits now go up to 320TB(HDD)/node
  • The total capacity(including the SSDs) can go up to 350TB [16TB x 20 + 7.68TB x 4]
  • HPE DX4200 supports this configuration currently and is supported in Sizer
Collector/RVTool import filter
  • During import, Sizer will filter out the CVM VMs while running Collector or RVTool against a vCenter managed Nutanix cluster.
  • CVM resources are added by Sizer anyway so this will help avoid double accounting .
  • For Prism managed Nutanix clusters, the CVMs are filtered out by Collector itself.
Platform updates:
  • Two new NEC platforms : NEC Express R120h-1M & R120h -2M
  • A new vendor got added this release – KTNF with their server model: KR580S1-308N
  • New server platform for Inspur  : InMerge1000M5S
  • Updates to Fujitsu,Dell XC and Lenovo platforms

October 2020 Sprints

Oct 19

hi everyone
We went live last night with latest.  Both big and small changesOn small but good things
  • Updated Oracle sizing to match the recent changes in SQL Server production cluster sizing.  We already had dedicated cluster as requirement fr Oracle but now 1:1 CVM with total of 12 physical cores (yep we want lot of I/O capability) and min of 14 cores and 52 specints.
  • Align the VCPU:pcore ratio when doing either configuration or performance sizing with Collector
  • Bulk edit can now be done for XenApp or SQL Server workloads
  • Robo model addition: NX3060-G7
  •  DX Mine appliance: 1.92TB SSDs – RI
On BIG things
  • Sizer FINALLY has I/O !!  Well technically we had it for Files Application sizing but not general purpose use.  Now have  I/O performance sizing for both Cluster sizing (Raw) and Server Virtualization workloads.  Where historically we would size for capacity, now can size for I/O and Capacity (whichever is greater requirement)
Want to thank both  and  for all their hard work in getting the I/O effort going.  There was a lot of testing and analysis to get this scoped.  They both worked very hard and is excellent work.  This is what I love to see in Sizer so it is a better for you all

Here is the I/O panel in the workloads

Oct 6

Hi everyone, we went live with the current sprint, below the major highlights:SQL sizing enhancements: Major changes to this one..

  • Changes to the Business Critical ON/OFF options, default settings. Default SQL sizing to be business critical.
  • Sizer to allocate additional CVM cores(1:1 vCPU to pCore ratio) to aid in performance for business critical option.
  • Business critical SQL workload to be In a dedicated cluster, with only other SQL workloads. VDI, Server Virt etc not allowed to the SQL dedicated cluster.
  • All Flash or NVMe models only with high frequency processors for higher performance

Budgetary quote : HPE-DX

  • Now, generate budgetary quote for sizings on DX. Earlier budgetary quote would show only SW/CBL quote but now HW BOM price estimates also included
  • The HW BOM quote covers complete BOM including PSU, transceivers, chassis etc and including HW support prices

Files changes:

  • New File SKUs with tiered pricing is supported now, including generating Frontline quote through Sizer.Sizer’s budgetary quote for Files is also updated with newer SKUs and pricing approach.
  • Application storage – updated with latest performance numbers across hybrid and AF nodes.
  • With increased throughput/IO per node, would need fewer nodes than before for same workload.
  • Defaults to 1x25GbE NIC for smaller nodes, 2x25GbE for larger nodes.

Collector/RVTools

  • Now can choose to size for storage based on VMs consumed or provisioned capacity  during import.

Usability

  • Era quoting in Frontline supported through Sizer
  • Bulk edit – now also supported for Oracle, Backup workloads
  • HPE-DX default NIC recommendation -FLOM/OCP in both auto and manual
  • Updates to XC models, updated list of CPUs and SSDs across models

Thanks Ratan.  In regards to SQL, it has grown up in Sizer sort of speak.  If you are looking for adding a small sql database in a cluster of say server virt, then go with Business Critical off.Then SQL workload can be in a cluster with other type workloads, we take 2:1 CVM and no min cpu in terms of cores and specintsGo to Business Critical and then it is a dedicated cluster, 1:1 CVM with total of 12 physical cores (yep we want lot of IO capability) and min of 14 cores and 52 specints.    Config is also AF or NVme.  We will be making the same changes for Oracle in current sprint.