Business Continuity and Disaster Recovery

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.


BCDR

 1.  Has your organization done a Business Impact Analysis of your applications and workloads, and would you be able to share that with us?

Why ask? A business impact analysis should have rated the customer’s applications based on business criticality and may have informed them as to which are the most important applications and what level of data loss and time loss would be acceptable, what the impact in both revenue and non-revenue would be, what the systems and dependencies of those critical applications and processes are.  This information would be HIGHLY beneficial to have in designing a DR solution for your customer.

2.  If a Business Impact Analysis is not available, please breakdown applications and their corresponding recovery point objectives and associated recovery time objectives.  Also breakdown different retention requirements for on-prem and the recovery site.

Why ask? Recovery point objectives lets us know what Nutanix products can be used to meet the objective and the RTO (recovery Time Objective) may cause the design to shift depending on how fast the application needs to be back up and running. It is also good to get this backed by the business versus the IT department making the decisions. We should also mention that DR and Backup are different. The retention for DR should be as long as is needed to meet the RTO. If you have a very long RTO it is probably best served by a more cost effective backup solution.

 3.  For the applications listed please list the change rate for your applications. You can use existing backups and the deltas between them to find out the change rate. For any aggressive RPO that may require near-sync you will also want to know the write throughput for the application.

Why ask? Write throughput is the best indicator for network bandwidth throughput with near-sync.

 4.  What backup and/or DR products do you currently use?

Why ask? Helps to find any synergies or aid to move them to products which are more Nutanix friendly.  Example: if they are using Zerto, they may be great candidates for near-sync.

 5.  In the event of a failure, who is on-call to help with the restore. Will everyone be trained on recovery plans and procedures?

Why ask? This should help the customer to consider the operational implications of DR and help them see the need to simplify the recovery process. When bad events happen everyone needs to feel comfortable and empowered to help with the restoration.

 6.  What are you using today for Disaster Recovery and have you tested the solution to verify it works as expected?  How has the solution met or not met your requirements and expectations?

Why ask? Discovery to uncover the current topology and the how it is currently performing.

 7.  Are there any other systems that need to be taken into account for a proper disaster recovery plan that would be outside of the Nutanix infrastructure? (physical servers, mainframe, other physical devices)

Why ask? This is to make sure that the customer has thought through the entire scope necessary for a DR based on their requirements.

8.  What type of disasters are you planning for?

Why ask? Depending on the type of disaster scenario that could involve different products or operational procedures (for example, think of the difference of a disaster like a pandemic where all your systems are working, but your workforce can’t come in the office versus a geographic event like an explosion or earthquake in comparison to a ROBO office server closet being flooded, or even a ransomware attack).  Help the customer walk through different scenarios and possible reasons for needing to invoke a recovery plan and how we can help give them a recover in depth strategy (snapshots, backup, remote DC DR, stretch clusters, Cloud Provider DR, DRaaS with Xi Leap)

9.  Can you sites run independently? If so, how long?

Why ask? This helps to determine the criticality of each site

10.  What is your Retention Policy, and who made that decision (regulatory or self-imposed)?

Why ask? This helps to understand what is driving the need for retention and if it is internal or externally mandated.

11.  Do you have any regulatory requirements such as HIPAA, PCI, SEC, SOX?

Why ask? This will impact the particular design of the recovery solution, what features and controls will need to be in place (encryption, data sovereignty, RBAC, Logging, etc.)

12.  Do you need immutable copies of the data/VMs?

Why ask? These questions allow SE to determine sophistication of customers as well as how long they need the data. If there is a 7 year policy, customers will most likely need a 3rd Party backup tool in order to tier the data to an Object Store that supports WORM (Write Once Read Many).  This could also stem from requirements to help mitigate the risk of ransomware.

13.  Is this going to be a net new DR Solution?

Why ask? Do we have the flexibility to design a net new DR strategy?

14.  Will the source and destination clusters both be Nutanix clusters?

Why ask? If Nutanix is not the source and destination, then it will cause constraints for the design of the DR Solution.  (i.e. it will force hypervisor choice, replication can’t be array based but will need to be done with a separate software product like SRM or Zerto).  It may also require licensing changes

15.  Do you have a requirement for the backups/DR to be on separate hardware?

Why ask? Understand how to architect and select the infrastructure for the backup and DR targets

16.  What does your desired replication topology look like?

Why ask? Gives us the information on whether we need Professional or Ultimate AOS licenses and helps to map out the topology for replication (i.e. A->B->C; A->B; A->B and A->C).  This also allows us to discover if sites are active/passive and exactly what the definitions for active and passive mean (i.e. a data center with power, or VMs ready to power on, or VMs already powered on and able to switchover immediately, etc.)

17.  Does DR need to be the same performance as production?

Why ask? This helps to give an understanding of if DR is a checklist item for them or a significant business requirement (i.e. they’ll lose significant money in the event of any downtime of the production site).  Allows you to be able to size the solution appropriately.  Ensure to get sign off if they do decide to allow for DR to be undersized.

18.  Do you need to have replicated copies of all the VMs but only plan to restore a subset?

Why ask? Helps determine potential licensing requirements and size of target cluster.

19.  How often do you test your disaster recovery plan and what does the plan look like?

Why ask? This will help with understanding if the customer has actually validated any disaster recovery plan that they have implemented to ensure that it will actually work.  Oftentimes folks are optimistic with regards to how well their plan will actually work, so having tested it brings a sense of reality to the plan and can help them course correct.

20.  Do you have existing runbooks that are used in the event of a disaster?

Why ask? Knowing this will help us understand their processes in the event of a disaster.  Also, if they don’t have any runbooks this will let us know we may need to help them out in putting a plan together that they can use and test.

21.  Databases: How are you currently backing up and protecting your database environments?

Why ask? This can give us information about how they are protecting their most critical assets.  They may already be using or licensed for database level replication or clustering technology which would give them more granular levels of control than a traditional storage replicated VM.  This can also help as part of the discussions around RPO and RTO for these more critical systems which may need higher levels of availability than other parts of the infrastructure. 

22.  What hypervisor(s) are you using? Are you open to cross-hypervisor DR?

Why ask? This will let us know what replication products we can use (if the source or destination cluster is non-Nutanix) and can open up the possibility of leveraging AHV for the target cluster if they are using ESXi on Nutanix as the source/primary cluster. 

23.  Do you have a separate network for DR traffic? Do you require encryption on those links?

Why ask? This helps with understanding if network segmentation is necessary to be configured in Prism for DR Replication traffic and whether or not the customer needs to supply encryption in flight for that network.  

24.  What is the current bandwidth between sites that you plan to use for DR replication? Also, what is the latency between those sites?

Why ask? We need to know how big the pipes are between sites so that we can ensure that the RPO the customer has defined as their requirement will be able to be met based on the rate of change.  Also ensuring that the latency between sites meets the minimum requirements listed for Metro Availability or the Metro Witness.  

25.  What is the current rate of utilization of the network links between the sites you plan to use for replication traffic?

Why ask? These links may be used for other traffic and could impact the available bandwidth that you assume you will have access to for replication traffic.  See if you can get utilization over a 30 day period, and if possible over several months to see any trends of increase or decrease in utilization.

26.  How do you handle IP addresses on the recovery site for VMs that have failed over?

Why ask? This allows us to discover what type of networking failover scenario(s) the customer would prefer to use: Overlay Networks; Stretched Layer 2 subnet between sites; perform a full subnet failover from the primary to secondary site by updating routes to point to the new recovery site; allow IP addresses to change when failed over (this can cause obvious challenges of broken applications that hard code IP addresses, updating of DNS and cache entries, etc.).

November 2020 Sprints

November 24  – Collector 3.2

Hi everyone, we just went live with Collector 3.2.. major highlight being able to run the tool in local and remote mode for Hyper-V environments Hyper-V local support :
  • Collector now supports running the tool against a Hyper-V cluster directly from the Hyper-V hosts locally. UI has option to choose from Hyper-V (local) and Hyper-V (remote)
  • Collection can be done by downloading the tool in any of the hosts which are part of the cluster we wish to collect data from and choosing Hyper-V local in the drop down menu.
  • With both remote and now local collection option, it provides for greater flexibility in switching the mode in case of connectivity/access issues with remote setup. (particularly for Hyper-V as it connects directly to cluster hosts and not management APIs unlike vCenter)
  • This version supports Hyper-V clusters for both local and remote mode. Support for standalone Hyper-V hosts (not part of a cluster) is in plan.
Precheck  :
  • A precheck script is bundled within the tool which can run a few checks to see if the expected services are available and if other prerequisites are satisfied.
  • Upon running into the error screen, the tool will redirect to the script location which can be run on the host to get the relevant data
Usability :
  • The login page now has a drop down to choose the flow that is, vCenter, Prism, and Hyper-V (remote) and Hyper-V (local), and the default ports are populated upon selection.
  • VM Summary table – shows both consumed and provisioned storage across all the cluster VMs
  • The tool now accepts hostname (apart from host IPs) for connecting to the Hyper-V host instance is now supported. The previous limitations have been removed.
  • Improved error messages/log enhancements
We now have a dedicated Collector page with the latest 3.2 bits and documents – User Guides, Release notes , here :
https://portal.nutanix.com/page/downloads?product=collector
We went live with the latest sprint, below the major highlights.. Proposals:
  • Updated slides on quarterly financials w/Q4
  • Now includes the Backup cluster / DR cluster details along with primary  workload cluster, includes the config details and utilization dials
  • HW spec slide added for NX-Mine specific appliance : a subset of the standard NX HW spec
Sizing enhancement:
  • SQL workload supported on Nutanix clusters on AWS
Usability:
  • Bulk Edit: I/O input fields option added for bulk edits for Server Virtualization and Cluster sizing(Raw)
  • Storage calculator updates including new drive options – 16TB HDDs [support for 320TB nodes]
  • Validator support for new NEC and KTNF platforms
  • Changes to Solutions summary UI – Cluster in a separate row/ consistent with workload summary UI
  • New partner roles added for partner specific HW vendor visibility
Product updates:
  • HPE DX: New AMD platform support – DX325 Gen10 8SFF
  • mCPU/lCPU-DIMMs rule update for across vendors
  • Dell XC:  GPU with NVMe restrictions removed, now both can be in same config

November 3

Hi everyone, we went live yesterday with the current sprint, below the major highlights:GPU Dials:
  • You will see a 5th set of dials- for the GPU – (for nodes/workloads) requiring GPUs, of course.
  • The dials show the utilization %age and cluster fail over considerations just like for cores, ram etc.
  • The additional dial will feature in the BOM as well for GPU workloads
320 TB node support
  • For Objects and Files Dedicated workload, the node limits now go up to 320TB(HDD)/node
  • The total capacity(including the SSDs) can go up to 350TB [16TB x 20 + 7.68TB x 4]
  • HPE DX4200 supports this configuration currently and is supported in Sizer
Collector/RVTool import filter
  • During import, Sizer will filter out the CVM VMs while running Collector or RVTool against a vCenter managed Nutanix cluster.
  • CVM resources are added by Sizer anyway so this will help avoid double accounting .
  • For Prism managed Nutanix clusters, the CVMs are filtered out by Collector itself.
Platform updates:
  • Two new NEC platforms : NEC Express R120h-1M & R120h -2M
  • A new vendor got added this release – KTNF with their server model: KR580S1-308N
  • New server platform for Inspur  : InMerge1000M5S
  • Updates to Fujitsu,Dell XC and Lenovo platforms

October 2020 Sprints

Oct 19

hi everyone
We went live last night with latest.  Both big and small changesOn small but good things
  • Updated Oracle sizing to match the recent changes in SQL Server production cluster sizing.  We already had dedicated cluster as requirement fr Oracle but now 1:1 CVM with total of 12 physical cores (yep we want lot of I/O capability) and min of 14 cores and 52 specints.
  • Align the VCPU:pcore ratio when doing either configuration or performance sizing with Collector
  • Bulk edit can now be done for XenApp or SQL Server workloads
  • Robo model addition: NX3060-G7
  •  DX Mine appliance: 1.92TB SSDs – RI
On BIG things
  • Sizer FINALLY has I/O !!  Well technically we had it for Files Application sizing but not general purpose use.  Now have  I/O performance sizing for both Cluster sizing (Raw) and Server Virtualization workloads.  Where historically we would size for capacity, now can size for I/O and Capacity (whichever is greater requirement)
Want to thank both  and  for all their hard work in getting the I/O effort going.  There was a lot of testing and analysis to get this scoped.  They both worked very hard and is excellent work.  This is what I love to see in Sizer so it is a better for you all

Here is the I/O panel in the workloads

Oct 6

Hi everyone, we went live with the current sprint, below the major highlights:SQL sizing enhancements: Major changes to this one..

  • Changes to the Business Critical ON/OFF options, default settings. Default SQL sizing to be business critical.
  • Sizer to allocate additional CVM cores(1:1 vCPU to pCore ratio) to aid in performance for business critical option.
  • Business critical SQL workload to be In a dedicated cluster, with only other SQL workloads. VDI, Server Virt etc not allowed to the SQL dedicated cluster.
  • All Flash or NVMe models only with high frequency processors for higher performance

Budgetary quote : HPE-DX

  • Now, generate budgetary quote for sizings on DX. Earlier budgetary quote would show only SW/CBL quote but now HW BOM price estimates also included
  • The HW BOM quote covers complete BOM including PSU, transceivers, chassis etc and including HW support prices

Files changes:

  • New File SKUs with tiered pricing is supported now, including generating Frontline quote through Sizer.Sizer’s budgetary quote for Files is also updated with newer SKUs and pricing approach.
  • Application storage – updated with latest performance numbers across hybrid and AF nodes.
  • With increased throughput/IO per node, would need fewer nodes than before for same workload.
  • Defaults to 1x25GbE NIC for smaller nodes, 2x25GbE for larger nodes.

Collector/RVTools

  • Now can choose to size for storage based on VMs consumed or provisioned capacity  during import.

Usability

  • Era quoting in Frontline supported through Sizer
  • Bulk edit – now also supported for Oracle, Backup workloads
  • HPE-DX default NIC recommendation -FLOM/OCP in both auto and manual
  • Updates to XC models, updated list of CPUs and SSDs across models

Thanks Ratan.  In regards to SQL, it has grown up in Sizer sort of speak.  If you are looking for adding a small sql database in a cluster of say server virt, then go with Business Critical off.Then SQL workload can be in a cluster with other type workloads, we take 2:1 CVM and no min cpu in terms of cores and specintsGo to Business Critical and then it is a dedicated cluster, 1:1 CVM with total of 12 physical cores (yep we want lot of IO capability) and min of 14 cores and 52 specints.    Config is also AF or NVme.  We will be making the same changes for Oracle in current sprint.

September 2020 Sprints

Sept 21

We just went live with the current sprint. And some cool features in this release..

Compare with one less node

  • A second set of dials with utilization% on selecting ‘compare’ checkbox(screenshot below)
  • Helps compare and analyze the  scenario and the state(N+1/N+0) with one less node than the optimal recommended
  • Avoid extra steps to go to manual to replicate the above

Bulk edits – Workload section

  • Make bulk edits/changes to workload attributes like vCPU:pcore ratio, or User type(in VDI) etc, particularly helpful for imported workloads
  • Couple of weeks ago, went live with bulk edit for common section [RF, Compression, ECX, snapshot etc]
  • With this, all inputs to a workload can be edited in bulk
  • Currently most major workloads supported for bulk edit : Server Virtualization, VDI, Files and Cluster/Raw

Encryption changes

  • Overall enhancement to Encryption support in Sizer with latest encryption licenses and add-ons
  • Option to choose SW and HW encryption and Sizer adds appropriate encryption license
  • Add-on encryption licenses support for non AOS [Files/Objects/VDI core/Robo]

Other enhancements / Platforms

  • Sizing stats : Usable remaining capacity adjusted for RF and N+1
  • HPE DX: Power calculation and checks for x170r/x190r nodes on DX2200/2600

hi everyone

I’m super excited about the 2nd dials.  Very often SE’s go to pain of changing node count to see what is ilike with one less node (N+1 vs N+0)Then  you want to change something in the original sizing and have to go back and see the impact on N+0We make it easy.  By the way that is an official sizing at N+0.  We don’t just take a percentage difference but do a real sizing and apply all the rules but just with one less node.

 

August 2020 Sprints

Aug 26

We just went live with the current sprint.. and excited to share that we went live with Sizer Basic!!

A quick introduction to Sizer Basic:

This flavor of Sizer is aimed at slightly different set of users/persona  for ex: sales rep, AMs, and customers.. and thus designed to reduce friction and time spent between gathering workload requirements to solution to quote.. idea is to drive volume sales…  asking few workload related questions., filling in the defaults around cluster properties/settings thus avoiding the complexity for the user… and come up with a solution rather quickly. Currently, Basic Includes all major workloads which have highest sizing momentum, and covers all platforms. One major highlight of Basic is the In built self-help capability- illustrations, guided tour, context based help panel , triggers – all of which educates the first time or repeat users about the tool, its workflow and also on the specifics of sizing.  Also to note that Sizer Basic is role based access so only for users who are assigned Basic role. Existing users continue on current Sizer (and would see Advanced/Basic tags).

Here’s a detailed 45 minute demo video on Sizer Basic:
https://nutanix.zoom.us/rec/share/4s5kDZPexkRLb4HGyGaYU79mAaK_eaa81yUY8_YJxBtyFKu2rdfJ38WGdBrB8ePtOther changes as part of the sprint includes:

  • Support for sync rep (for metro availability)

Now you can choose synchronous replication in Sizer and it comes out with a primary and a secondary cluster. Somewhat similar to the DR cluster but not including the additional snapshots

  • Async/Near sync enhancements

Changes related to the async/near sync for 5.17 such as moving the limits for hourly snapshots from 80TB to 92TB for all flash and the related config rules..

  • Sizing stats table

Usable remaining capacity row added to the sizing stats, gives the details on resources available in the cluster after workload requirements are met. The numbers are adjusted for RF.

  • Platforms – AMD /  HPE DX

HPE DX came out with support for the AMD platform – HPE DX385. The platform is listed and can be selected for sizing by choosing AMD under auto settings. Sizer’s default is Intel processor based models.

Hi everyone

Wanted to provide some color on why Sizer Basic and what we will do in future for full Sizer

 

Sizer Basic – As a company we are covering a very wide range of use cases and scale. However, about 45% of all scenarios were in the top workloads like VDI, Files, Server Virtualization, etc  and stayed with the defaults. Well that gives us an opportunity to have Basic with these defaults and let many more people do either the initial sizing or just go with the defaults. Two benefits. First, enter collaborative sales. We have about 700 customer users of full Sizer now and it has been quite successful. Often they do sizings and then share those with their SE. Basic will allow us to get Sizer out to many more customers and continue with this trend. Second benefit is SE can focus more on the complex sizings. I would envision often the initial sizing is done in Basic and then they can share the sizing with you for enhancements.  So with Basic you have more opportunity to collaborate.

 

Sizer Advanced – With the introduction of Basic we are working on Advanced. Here we know it is a SE or advanced user and we plan to add a lot more dials and options to allow you create awesome complex multi-cluster solutions. We will still offer this to customers as we offer Sizer today but does require SE support.  Stay tuned

 

Aug 11

What the heck a double here day!! .Well this is big as this will be a game changer in how you sell.  There was a SE team that got created to work with me to finally get a GREAT proposal out of Sizer.  So now you can do all your edits and get to final sizing and Sizer will automatically create a super presentation complete with the Sizing dials in ppt, pictures of all the hardware, corporate overview, slides for any product you selected.  A real proposal created by real SEs.Easy to do

  1. Go Create Proposals
  2. Any product in the sizing is automatically added but you get a nice selection panel for any products you want to include.  Might notice this looks like Frontline.  (we are all the same team and like this UI).
    3. Takes some time but you get a zip and open it up and you get slides CUSTOMIZED for your presentation.Why is this important.  Well for enterprise SEs you often create lots of sizings and so each has a presentation.  For commercial SEs you may find your time is so limited that now you can present a good presentation to the customer.  Also assured this is all current.  What we found is SEs wasted lot of time creating ppts and everyone had their own version.

 

Let me give a sample of what this does for you First every cluster has its own slide with the configuration summary and the dials.  Working with the SEs they often would want to show N+1 and N+0 levels (what the customer should expect in an upgrade for example).  Affectionately this was called the Justin slide as this is what Justin Bell presents and everyone said YES.  Also we show all the hardware pics too

 

 

 

hi everyone

Welcome to new Fiscal Year and hope everyone had good break.  Sizer team is coming out with a big bang with the new sprint launched todaySizing

  • HPE DX Mine support and we have HP DX Mine appliance in Sizer
  • Improvements in Splunk Smartstore

Usability

  • Bulk edits.  So you got a bunch of workloads and you say darn I need to change the compression, or RF level, or ECX, etc.  In old days had to go in one by one and change the workloads and now you can make bulk edits !!  Still can go in one by one if that is your part of you Zen practice.
  • Extent Store chart.  There has been a lot of confusion with all our charts on the storage that is available.  Heck I get confused.  We did some cleanup in the Sizing details already and now you see a nice interactive panel below those details to get to extent store (raw less cvm) and effective capacity (extent store with storage efficiences).  On left you can play with RF, compression, N+1, ECX and in real time get update on right.  Don’t like that complex TiB stuff got a switch for you to go to TB

July 2020 Sprints

July 27

Hope all is going well.  We did go live with a sprint last night. Some cool things

  • HPE DX BOM – support SKUs and structure
  • XenApp / RDSH profile edit
  • Era VCPU configurable license sku

Some very cool things

  • customize the thresholds used in manaual and auto sizing
  • much better summary of the sizing details.  Now we show the capacity, any savings and then in red the consumption items

July 13

Hi everyone, we went live with the current sprint, below are the major highlights

Sizing Improvements:

  • Splunk Smartstore : The new Splunk Smartstore , which decouples compute and storage is now supported in Sizer. Sizer recommends a compute cluster and a storage(object) cluster respectively for indexer and cold/frozen data.
  • RVTool host info in sizing : Bringing parity with Collector, Sizer reads RVTool host information for the VMs and factors in sizing. The existing server’s CPU is normalized against the baseline.
  • ERA platforms – it went live mid sprint, now you can you can select Era Platform licensing, Sizer generates Era platform licenses for the total cores in the database workload cluster, including the child/accounting SKUs.

Usability:

  • HPE DX BOM, Transceivers: As a continuation of the exercise to provide complete BOM for HPE DX, now Sizer would also recommend the appropriate type and quantity of transceivers to go with the selected NIC, depending on NIC type and no of ports. Sizer already recommends the required PSUs, GPU cables, Chassis etc as part of the complete BOM initiative.
  • Storage efficiency slider: Similar to the workloads section, the storage efficiency in Storage capacity calculator and Extent Store charts has a slider to chose from a range of values.
  • HPE Arrow models[BTO configs]: This went out mid sprint. Enabling HPE Arrow models for SFDC/Internal users.

Platforms:

  • Dell XC: Dell XC is the second vendor to go out with AMD models. XC6515 AMD (XC Core only) is now live in Sizer (under AMD option in settings).
  • Regular platform updates across NX and OEMs keeping up to date with the config changes in product meta.

July 6

hi everyone

We delivered a few things in mid-sprint last night-  Era platform licensing is nowin Sizer for Oracle or SQL.  So you can specify Era Platform licensing and then the cluster the database workload is in uses that licensing which is total cores in the cluster.-  All Nutanix users have access to the Arrow models for HP DX scenarios-  RV Tools support – Sizer will pick up the host info and SPECint numbers for each host from the RVtools spreadsheet.  This is already supported in Collector and can help get more sizing precision.

June 2020 Sprints

June 30

we went out with the release for the current sprint. Below are the highlights :

  • AWS EC2 sizing –
    •  Sizer can map the AWS EC2 instances to equivalent Nutanix nodes. Helpful in sizing for migrating workload from AWS to Nutanix. Currently compute optimized and storage optimized EC2 instances are supported.  In Beta currently.
  • Change in the N+0 thresholds
    •  The N+0 defaults continue to remain 95% for the compute and memory. The SSD and HDD thresholds moved from 90 to 95% for better utilization. The N+1 yellow indicator at 5% range of N+0 threshold makes a good case for making the shift.
  • RVTools enhancements
    •  Sizer now applies the derived vCPU:pCore ratio based on the excel instead of using the Sizer default. Additionally, the host processor for the workload is factored in while sizing for the imported VMs. These are already supported for Collector imported workloads. Also, supported with this release is latest version of  RVTool 4.0.4
  • Scenario number as permalink
    •        To help identify and share the scenario in a better way, as an usability enhancement, the scenario url now has number, for ex: S-123456
  • HPE DX  enhancements: Rules around certain processors/memory combination for some scenarios involving Cascade lake.
  • Recurring platform updates across NX/OEM/SWO vendors.

Thanks Ratan.  Want to bring out a key innovation with AWS EC2 Sizing.  The Sizer Council suggested it as a “hot” opportunity in current environment as they have customers anxious to pull out at least some of their AWS deployments onto Nutanix given AWS costs.

Here you just specify number of each instance type they want to move and get precise recommendation.This is case of excellent collaboration with the Sizer Council.  This came up just about 8 weeks ago and now liveDo want to thank Ratan who got all the detail requirements defined

 

June 15

Hi Everyone
We went live with current sprint.. below are the highlights:

Workload updates:

Files : 240TB node support. Now Sizer can recommend denser nodes up to 240TB capacity tier.. this is supported for Files dedicated and with few prerequisites such as minimum cores/ram/flash for the dense nodes . Second workload after Objects to support higher capacity nodes.  Files licenses for VDI core: Selecting VDI core( dedicated VDI cluster) and opting for Files for storing user data generates Files(for AOS) license for required capacity..VDI/Frame licenses in quotes:  If Frame is chosen in VDI, the Sizer budgetary/SFDC quote will now include  required Frame subscription licenses along with regular license for the cluster

Usability:

NX Mine appliance: NX Mine XSmall – a new extra small form factor for Mine for NX platform is supported now with required licenses and quote
Mine enhancement: Non decoupled, Disabling Mine for appliance/Non decoupled scenario.  ECX update in storage calculator and extent store chart in Sizer. We revisited approach to storage calculator ECX calculations and some updates around effective capacity. Now considering/applying ECX on usable remaining as well, earlier the usable remaining only considered RFUI changes for Workload tab: A lot of new capabilities will be coming in Sizer . Bulk edit/delete, import each VM as workload, move workloads between clusters etc.. There are UI changes for these .. we had filters in Workload tab, now rearranging few columns. Cluster is now a separate row followed by all workloads in that cluster underneath.  Gives lot of space for Workload name, so we can have space for basic/advanced tags and few check boxes for bulk edits.

Platform updates:

  • HP-DX : New platform: DX8000 DX910 –  A new HPE DX NVMe platform
  • Inspur SWO/InMerge(OEM): GPU made non mandatory. The GPU models can be selected without a GPU as well.
  • Dell XC: 640-4 and 4i / processor update – New revised list of supported processors for these XC models.

June 1

Hi everyone.

Some big things came out today

Frontline Quoting  – Frontline is our new quote tool that will replace the existing Steelbrick quoting tool.  Much nicer UX.  Allows for more tighter integration with Sizer in our goal to  offer an excellent E2E presales experience from Collector/Collector Portal for gathering customer requirements, Sizer to design the right solution to meet customer needs, and finally Frontline to create the quote.

So now you have the option to quote in Frontline if you are a Frontline user.  In Quote options we still have the options to create a SFDC quote and a budgetary quote.  This is a third option.  At this time about 1200 users in the company are set up for Frontline.  Most of Americas, some in EMEA and some in APAC.  Don’t fret though we envision getting everyone on it in a couple months.

Dashboard Filters  –  Ever get frustrated you can’t find or filter out different sizings.  We had ways to hide things with Customize View .  Now we have Dashboard Filters and you can just get what you want in a couple filters.  Attached is the pulldown .  You can have multiple filters as an AND condition.  So for example two filters allow you to select certain customer and certain workloads.    This is great for those that are getting into 100s of scenariosWe also made various product updates including

  • GPU None option for XF8055 , XF8050
  • DX: New platform: DX360-10-G10-NVMe
  • Dell XC: LCPUs
  • Lenovo; HX7820-24

 

May 2020 Sprints

May 19

 

Hi everyone.We went live with our latest sprint last night.

Sizing Improvements:

Arrow DX models – With our new focus to adjust for Virus economy we added pre-built DX models from Arrow for USA Commercial reps and SE and managers. Today there are supply chain challenges that are causing delays when customers try to get HP DX models. These are pre-built and available at Arrow…TODAY.   So for either manual or in Auto sizing you can select Arrow models and size and quote those. At this point you do have to be in the US Commercial group.  We hope to expand it in the future.

Usability:

  • Frontline integration – Frontline is our new cool quoting system and we want to get Sizer tied to it… we are working hard on it is coming soon.
  • Streamlined the Input processor options on workloads to make it more intuitive. Typical Power value added to the BOM and UI for Nutanix

Product Alignment

  • Dell XC product updates
  • Dell XC: New processor: Xeon Gold 6246 / XC740xd-12

 

 

Files

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.


Files

1.  Is this replacing a current solution, or is this a net new project?
     a.  What’s the current solution?

Why ask? This question helps us understand the use case, any current expectations and what the competitive landscape may look like.

2.  Using an existing Nutanix cluster (with existing workload) or net new Nutanix cluster?

Why ask?  If we’re sizing into an existing cluster we need to understand current hardware and current workload.  For licensing purposes adding Files to an existing cluster means the Files for AOS license. A common scenario has been to add storage only nodes to an existing cluster to support the new Files capacity.  If sizing into a new cluster we can potentially dedicate this cluster to Files and use Files Dedicated licensing.

3.  Is this for NFS, SMB or both?
     a.  Which protocol versions (SMB 3.0, NFSv4, etc)?

Why ask?  We need to understand protocol to first validate they are using supported clients.  Supported clients are documented in the release notes of each version of Files.  Concurrent SMB connections also impact sizing with respect to the compute resources we need for the FSVMs to handle those clients.  Max concurrent connections are also documented in the release notes of each version.  It also helps us validate supported authentication methods.  For SMB, we require Active Directory where we support 2008 domain functional level or higher (there is no local user or group support for Files).  For NFS v4 we support AD with Kerberos, LDAP and Unmanaged (no auth) shares.  For NFS v3 we support LDAP and Unmanaged.

 4.  Is there any explicit performance requirement for the customer?  Do they require IOPs or throughput or latency numbers?

Why ask?  Every FSVM has an expected performance envelope.  There is a sizing guide and performance tech note on the Nutanix Portal which give a relative expectation on the max read and write throughput per FSVM and max read or write IOPs per FSVM.  Throughput based on reads and writes are integrated into Nutanix Sizer and will impact the recommended number of FSVMs. https://portal.nutanix.com/page/documents/solutions/details?targetId=TN-2117-Nutanix-Files-Performance:TN-2117-Nutanix-Files-Performance

5.  Do they have any current performance collection from their existing environment?
      a.  Windows File Server = Perfmon
      b.  Netapp = perfstat
      c.  Dell DPACK, Live Optics

Why ask?  Seeing data from an existing solution can help validate the performance numbers so that we size accurately for performance. 

6.  What are the specific applications using the shares?
       a.  VDI (Home Shares)
       b.  PACS (Imaging)
       c.  Video (Streaming)
       d.  Backup (Streaming)

Why ask?  When sizing for storage space utilization the application performing the writes could impact storage efficiency.  Backup, Video and Image data are most commonly compressed by the application.  For those applications we should not include compression savings when sizing, only Erasure Coding.  For general purpose shares with various document types assume some level of compression savings.  

7.  Are they happy with performance or looking to improve performance?

Why ask?  If the customer has existing performance data, it’s good to understand if they are expecting equivalent or better performance from Files.  This could impact sizing, including going from a hybrid to an all flash cluster. 

 8.  How many expected concurrent user connections?

Why ask? Concurrent SMB connections are a required sizing parameter.  Each FSVM needs enough memory assigned to support a given number of users.  A Standard share is owned by one FSVM.  A distributed share is owned by all FSVMs and is load balanced based on top level directories.  We need to ensure any one FSVM can support all concurrent clients to the standard share or top level directory with the highest expected connections. We should also be ensuring that the sizing for concurrent connections is taking into account N-1 redundancy for node maintenance/failure/etc.

9.  Will the underlying hardware config support larger or more FSVMs if additional throughput or performance is required?

Why ask? Files is a scale-out and scale-up workload so you need to know what growth in the environment can look like.

 10.  Current share configuration including number of shares?

Why ask?  Files has a soft (recommended) limit of 100 shares per FSVM.

11.  Directory structure:
       a.  Large number of folders in share root?

Why ask?  This indicates a large number of top level directories making a distributed share a good choice for load balancing and data distribution.

       b.  Files in share root?

Why ask?  Distributed shares cannot store files in the share root.  If an application must store files in the root then you should plan for sizing using standard shares.  Alternatively, a nested share can be used. 

       c.  Total size of largest single directories?

Why ask?  Nutanix supports standard shares up to 140TB.  And top level directories in a distributed share up to 140TB.  These limits are based on the volume group supporting the standard share or top level directory.  We need to ensure no single folder or share (if using a standard share) surpasses 140TB. Files Compression can yield more usable storage per share as well. Nutanix Files – Deployment and Upgrade FAQ https://portal.nutanix.com/page/documents/kbs/details?targetId=kA00e000000LMXpCAO

       d.  Largest number of files/folders in a single folder?

Why ask?  Nutanix Files is designed to store millions of files within a single share and billions of files across a multi-node cluster with multiple shares.  To achieve speedy response time for high file and directory count environments it’s necessary to give some thought to directory design. Placing millions of files or directories into a single directory is going to be very slow in file enumeration that must occur before file access.  The optimal approach is to branch out from the root share with leaf directories up to a width (directory or file count in a single directory) no greater than 100,000.  Subdirectories should have similar directory width.  If file or directory counts get very wide within a single directory, this can cause slow data response time to client and application.  Increasing FSVM memory up to 96 GB to cache metadata can help improve performance for these environments especially if designs for directory and files listed above are followed.

12.  Total storage and compute requirements including future growth?

Why ask?  Core sizing question to ensure adequate storage space is available with the initial purchase and over the expected timeframe. 

13.  Does your sizing include the resources to run File Analytics?

Why ask?  FA is a key differentiator for Files, and drives a lot of customer delight and insights into their data. Every SE and sizing should assume that any Files customer will want to run FA as well – don’t present as an optional component.

14.  Percent of data considered to be active/hot?

 Why ask?  Understanding the expected active dataset can help with sizing the SSD tier for a hybrid solution.  Performance and statistical collection from an existing environment may help with this determination.

 15.  Storage change rate?

Why ask?  Change rate influences snapshot overheads based on retention schedules.  Nutanix Sizer will ask what the change rate is for the dataset to help with determining the storage space impact of snapshot retention.

 16.  Any storage efficiency details from the current environment (dedup, compression, etc.)?

Why ask?  Helps to determine if data reduction techniques like dedup and compression are effective against the customers data.  Files does not support the use of deduplication today, so any dedup savings should not be taken into account when sizing for Files.  If the data is compressible in the existing environment it should also be compressible with Nutanix compression.

 17.  Block size of current solution (if known)?

Why ask?  Block size can impact storage efficiency.  A solution which has many small files with a fixed block size may show different space consumption when migrated to Files, which uses variable block lengths based on file size.  For files over 64KB in size, Files uses a 64KB block size.  In some cases a large number of large files have been slightly less efficient when moved to Nutanix Files.  Understanding this up front can help explain differences following migrations.

18.  Self Service Restore (SSR) requirements (share level snapshots)?

Why ask?  Nutanix Files uses two levels of snapshots, SSR snapshots occur at the file share level via ZFS.  These snapshots have their own schedule and Sizer asks for their frequency and change rate under “Nutanix Files Snapshots.”  The schedule associated with SSR and retention periods will impact overall storage consumption. Nutanix Files Snapshots increase both the amount of licensing required and total storage required, so it’s important to get it right during the sizing process.

 19.  Data Protection/Disaster Recovery requirements (File Server Instance snapshots):
         a.  Expected snapshot frequency and retention schedule (hourly, daily, weekly, etc.)?

Why ask? Data Protection snapshots occur at the AOS (protection domain) level via the NDSF.  The schedule and retention policy are managed against the protection domain for the file server instance and will impact overall storage consumption.  Sizer asks for the local and remote snapshot retention under “Data Protection.”
Files supports 1hr RPO today and will support near-sync in the AOS 5.11.1 release in conjunction with Files 3.6.  Keep in mind node density (raw storage) when determining RPO.  Both 1hr and near-sync RPO require hybrid nodes with 40TB or less raw or all flash nodes with 48TB or less raw.  Denser configurations can only support 6hr RPO.  These requirements will likely change so double check the latest guidance when sizing dense storage nodes. Confirm that underlying nodes and configs support NearSync per latest AOS requirements if NearSync will be used.

        b.  Active/Active requirements (Peer Software)?

Why ask?  If the customer needs active/active file shares in different sites which represent the same data, we need to position a third party called Peer Software.  Peer performs near real time replication of data between heterogenous file servers.  Peer utilizes Windows VMs which consume some CPU and memory you may want to size into the Nutanix clusters intended for Files.

 20.  Feature Requirements:
         a.  Auditing? Which vendors?

Why ask?  Nutanix is working to integrate with three main third-party auditing vendors today, Netwrix (supported and integrated with Files), Varonis (working on integration) and Stealthbits (not yet integrated).  Nutanix Files also has a native auditing solution in File Analytics.
Along with ensuring audit vendor support, a given solution may require a certain amount of CPU, Memory and Storage (to hold auditing events).  Ensure to include any vendor specific sizing in the configuration.  File Analytics for example could require 8vcpu 48GB of memory and 3TB of storage.

         b.  Antivirus? Which vendors?

Why ask? Files supports five main Antivirus vendors today with respect to ICAP integration, McAfee, Symantec, Kaspersky, Sophos and Bitdefender.  If centralized virus scan servers are to be used you will want to include their compute requirements into sizing the overall solution.

         c.  Backup? Which vendors?

Why ask?  Files has full change file tracking (CFT) support with HYCU and Commvault.  Veritas, Rubrik and Veeam are or will soon be working on integration.  Other vendors can also be supported outside of CFT support.  If including a backup vendor on the same platform, you may need to size for any virtual appliance which may also run on Nutanix.

         d.  Multiprotocol? SMB + NFS? * (Engage with a Files Specialist/Solutions Architect if this is a customer requirement)

Why ask?  Multiprotocol is challenging, and often behaves differently than a customer imagines it will. One protocol is defined as authoritative and the other protocol maps onto it. If the customer does not already use multiprotocol shares and have a strong command of the technology, engage your SA to assist on the design to ensure success.

21.  Using DFS (Distributed File Server) Namespace (DFS-N)?

Why ask?  Less about sizing and more about implementation.  Prior to Files 3.5.1 Files could only support distributed shares with DFS-N.  Starting with 3.5.1 both distributed and standard shares are fully supported as folder targets with DFS-N.

 22.  Tiering requirements?

Why ask?  Files is targeting support for tiering in the 1H of CY21.  Tiering in this context means automatically moving data off Nutanix Files and to an S3 compliant object service either on-premises or in the cloud.  In scoping future requirements, customers may size for a given amount of on-premises storage and a larger amount of tiered storage for longer term archive.

23.  Access-Based Enumeration (ABE) Requirements?

Why ask?  Nutanix Files supports Access-based Enumeration (ABE). Is it a requirement to hide objects (files and folders) from users who don’t have NTFS permissions (Read or List) on a network shared folder in order to access them?  If so, we fully support it. 

24.  Reality Check: Files Mixed vs Dedicated clusters

Why ask?  Always double check the cost of Dedicated vs Mixed clusters. Dedicated can often be more cost effective, and accommodates larger FSVM sizes since the FSVMs  are capable of using the full amount of compute resources available to the cluster.

25.  Reality Check: Dedicated Cluster hardware minimums

Why ask?  Remember that Files is still a virtualized workload, so don’t assume the minimum possible hardware spec. Use 12 core 4214 CPUs as a reasonable minimum, or 14 cores if NearSync requirements dictate. 128GB memory per node will not allow for AHV + CVM + maximum FSVM size deployments, so consider 192GB, or nodes that can expand to 192GB after deployment.

26.  Reality Check: Implementation

Why ask?  Have a high level design of how you’ve designed/sized Files in your solution and communicate the design to the installer. Poor implementation, and implementation that doesn’t match the planned design, is one of the leading causes of customer satisfaction issues for Files.

27.  Reality Check: Files Prerequisites

Why ask?  Ensure that you’ve reviewed the relevant prerequisites and shared with the customer before deploying (Active Directory if using SMB, AHV/ESXi only – no Hyper-V, have a second VLAN if customer wants iSCSI isolation, Backup clients like Rubrik deployed in wrong subnet if using two networks)

28.  Reality Check: Clients

Why ask?  Review the list of supported Files clients and share with the customer. Laptops and desktops are rarely a problem, but document senders/multifunction printers that are used to scan paper and convert to PDFs on a file share can often be capped at only SMBv1 support, which Files does not and will never support.

Resources:

Xpert Storage Team Page:  https://sites.google.com/nutanix.com/americas-xpert-program/storage?authuser=1

Files Sales Enablement Page: https://sites.google.com/nutanix.com/files/home?authuser=1

Calls to action/next steps:

For a peer review of a sizing or to request meeting support after the Files first call is completed: create a SFDC opportunity and request a Storage solutions architect on the opportunity

Test Drive – Storage: https://www.nutanix.com/one-platform?type=tddata

Files Bootcamps: https://confluence.eng.nutanix.com:8443/display/SEW/Bootcamps (Internal Only)