Business Continuity and Disaster Recovery Discovery Guidance (Revised 4/7/25)

Introduction – Please Read First

These questions are here to assist with ensuring that you’re gathering necessary information from a customer/prospect in order to put together an appropriate solution to meet their requirements in addition to capturing specific metrics from tools like Collector or RVTools. 

This list is not exhaustive, but should be used as a guide to make sure you’ve done proper and thorough discovery.  Also, it is imperative that you don’t just ask a question without understanding the reason why it is being asked.  We’ve structured these questions with not only the question that should be asked, but why we are asking the customer to provide an answer to that question and why it matters to provide an optimal solution. 

Questions marked with an asterisk (*) will likely require reaching out to a specialist/Solution Architect resource at Nutanix to go deeper with the customer on that topic/question.  Make sure you use the answers to these questions in the Scenario Objectives in Sizer when you create a new Scenario.  These questions should help guide you as to what the customer requirements, constraints, assumptions, and risks are for your opportunity. 

This is a live document, and questions will be expand and update over time.

REVISION HISTORY
4/7/25 – 1st Revision – Kevin Laine, Mike Umphreys
1/5/21 – 1st Publish – Laine Leverett


BCDR Discovery

 1.  Has your organization completed a Business Impact Analysis (BIA) for your applications and workloads? If so, could you share the findings with us?

Why ask? A Business Impact Analysis (BIA) helps prioritize applications based on their criticality to the business. It provides key insights into acceptable levels of data and time loss, as well as the potential revenue and operational impacts of disruptions. Understanding the systems, dependencies, and business processes tied to critical applications is essential for designing a tailored Disaster Recovery (DR) solution that meets their organization’s specific continuity needs. This information will ensure the DR strategy aligns with their business’s tolerance for risk and recovery objectives, enabling us to propose the most effective solution. This information would be HIGHLY beneficial to have in designing a DR solution for your customer. 

2.  If a Business Impact Analysis (BIA) is not available, can you provide a breakdown of your applications along with their corresponding Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs)? Additionally, please specify the retention requirements for both on-premises and the recovery site. 

Why ask? RPOs help us identify the appropriate Nutanix products to meet your data recovery needs, while RTOs are crucial in determining how quickly applications need to be restored. The design of the Disaster Recovery (DR) solution may vary significantly based on these time-sensitive requirements. It’s also important that these objectives are driven by business priorities rather than IT assumptions, ensuring that the recovery strategy aligns with the organization’s operational needs. Additionally, it’s key to distinguish between DR and backup: DR retention should be long enough to meet the RTO, while longer RTOs may be more cost-effectively addressed with backup solutions instead of full DR. Understanding both retention and recovery needs allows us to design a solution that balances cost with business continuity requirements effectively.

 3.  For the applications listed, can you provide the change rate for each? You can calculate this using existing backups and the deltas between them. For applications with aggressive RPOs that may require near-sync, please also include the write throughput for each application.

Why ask? The change rate is crucial for understanding how much data is modified over time, which helps determine the appropriate data protection approach. For applications with aggressive RPOs requiring near-sync recovery, knowing the write throughput is essential, as it directly impacts the network bandwidth requirements. This information allows us to design a DR solution that ensures optimal performance and meets the required RPOs without overloading the network or the infrastructure

 4.  What backup and/or DR products are you currently using?

Why ask? Understanding the backup and DR products in use helps identify potential synergies and opportunities for transitioning to Nutanix solutions that align more closely with your business continuity goals. For example, if you’re using products like Zerto, you may be a good candidate for Nutanix’s near-sync capabilities, which can enhance your RPO and RTO requirements. Also, if a customer is using Zerto today with VMware (ESXi), this can’t be used with AHV but may be replaced with inherent Nutanix DR Orchestration. This insight allows us to tailor the solution to better integrate with your existing environment while optimizing recovery processes. 

 5.  In the event of a failure, who is on-call to help with the restore. Will all relevant personel be trained on recovery plans and procedures?

Why ask? This question helps the customer consider the operational aspects of disaster recovery and highlights the importance of having a clear, well-communicated recovery process. Ensuring that all relevant staff are trained and confident in executing recovery procedures is vital for minimizing downtime and reducing the stress of the situation. Simplifying and streamlining recovery workflows can make the process more efficient, allowing everyone to feel empowered and capable of contributing to the restoration efforts when a disruption occurs. 

 6.  What Disaster Recovery (DR) solutions are you currently using, and have you tested the solution to verify it functions as expected?  How well have the solutions met or not met your requirements and expectations?

Why ask? This question helps uncover the current DR topology and provides insight into its performance. Understanding whether the existing solution(s) meets the customer’s business continuity needs, and whether it has been properly tested, allows us to identify gaps or areas for improvement. It’s essential to assess the current state to determine if Nutanix can offer a more reliable, efficient, and scalable solution that aligns better with the customer’s recovery objectives and expectations. 

 7.  Are there any other systems such as physical servers, mainframes, or other physical devices that should be considered as part of your disaster recovery plan that are outside of the Nutanix Infrastructure? 

Why ask? This question ensures that the customer has thoroughly considered the full scope of their disaster recovery needs. It’s important to account for all systems and devices, not just those within the Nutanix infrastructure, to create a comprehensive DR plan that addresses all critical components of the IT environment. This helps prevent any overlooked systems from becoming a potential point of failure during recovery. 

8.  What types of disasters are you planning for in your business continuity and disaster recovery strategy? 

Why ask? The type of disaster you’re planning for will influence the products and operational procedures required for an effective recovery plan. For example, a pandemic might require a different approach (such as enabling remote work) compared to a geographic event like an explosion or earthquake, or a localized issue like a flooded ROBO office server closet. Additionally, cybersecurity events like a ransomware attack will require distinct recovery strategies. By helping the customer consider various disaster scenarios, we can guide them toward a comprehensive “recovery in depth” strategy—whether it involves snapshots, backups, remote disaster recovery sites, stretch clusters, cloud-based disaster recovery (Cloud Provider DR), or DRaaS. This ensures they are fully prepared for a broad range of potential disruptions. 

9.  Can your sites operate independently in the event of a failure? If so, for how long? 

Why ask? This question helps assess the criticality of each site and their ability to function without reliance on other locations. Understanding how long each site can remain operational independently is essential for determining the appropriate disaster recovery strategy, ensuring that business continuity is maintained during disruptions. It also helps us identify which sites may require additional resources or support to ensure they can continue operating effectively in the event of a failure at another site. 

10.  What is your data retention policy, and who is responsible for defining it (e.g., regulatory requirements or internal guidelines)? 

Why ask? Understanding the retention policy helps us identify the factors driving data storage requirements, whether they are mandated by external regulations or internally set business needs. This insight allows us to ensure the disaster recovery solution aligns with both compliance obligations and the organization’s operational requirements. It also enables us to design a solution that optimizes data storage, retention, and recovery processes based on those specific needs. 

11.  Do you have any regulatory requirements such as HIPAA, PCI, SEC, SOX?

Why ask? Regulatory requirements significantly impact the design of a disaster recovery solution. Understanding which regulations apply will help us determine the necessary features and controls, such as encryption, data sovereignty, role-based access control (RBAC), and logging, to ensure compliance. This ensures that the recovery solution meets both legal obligations and security standards, while minimizing risk to the organization. 

12.  Do you need immutable copies of the data or VMs?

Why ask? This question helps assess the customer’s data protection needs, particularly in relation to the retention period and the sophistication of their requirements. For example, if the customer has a long-term retention policy (such as seven years), they may need a third-party backup tool to tier data to an object store that supports WORM (Write Once Read Many) to ensure data immutability. Additionally, this requirement could be driven by the need to protect against ransomware attacks, as immutable data copies provide an added layer of security by preventing data tampering or deletion. Understanding these needs ensures we can design a solution that meets both compliance and security objectives. 

13.  Is this a new disaster recovery solution, or are you looking to enhance or replace an existing one?

Why ask? Understanding whether this is a completely new DR solution or an enhancement to an existing setup allows us to determine the flexibility we have in designing the most effective strategy. If it’s a net new solution, we can build it from the ground up to meet the customer’s current needs and future growth. If it’s an enhancement or replacement, we need to consider integrating with the existing infrastructure while addressing any gaps or limitations in the current DR approach? 

14.  Will the source and destination clusters both be Nutanix clusters?

Why ask? If both the source and destination clusters are not Nutanix-based, it could introduce constraints in designing the disaster recovery solution. For example, a non-Nutanix source or destination may dictate the choice of hypervisor and prevent array-based replication, requiring additional software like SRM or Zerto for replication. It may also necessitate adjustments in licensing or configuration. Understanding this early ensures that we can design the most efficient and cost-effective DR solution while accounting for any potential complexities. 

15.  Do you have a requirement for your backups and disaster recovery solution to be hosted on separate hardware from your primary infrastructure? 

Why ask? This question helps determine how to architect the infrastructure for both backup and disaster recovery targets. If the backups or DR solution must be on separate hardware, it will influence the design and selection of infrastructure, ensuring that redundancy and isolation requirements are met for business continuity. This also ensures that potential risks, such as hardware failure, are mitigated by keeping backup and DR systems independent from the primary infrastructure. 

16.  What does your desired replication topology for disaster recovery?

Why ask? Understanding the desired replication topology helps us determine whether you need Professional or Ultimate AOS licenses, as well as how to map out the replication setup (e.g., A->B->C, A->B, A->B and A->C). This also provides insight into whether your sites are active/passive and clarifies the specific definitions of “active” and “passive” in your environment (e.g., a data center with power, VMs ready to power on, or VMs already powered on and capable of an immediate switchover). Knowing this allows us to design a replication strategy that aligns with your recovery objectives and infrastructure requirements. 

17.  Does your disaster recovery solution need to provide the same performance as your production environment?

Why ask? This question helps determine whether disaster recovery is a critical business requirement or more of a compliance checklist item. If the DR solution needs to match production performance, it indicates a higher priority for minimizing downtime and avoiding significant business loss during outages. Understanding this will help size the solution appropriately, ensuring it meets the required performance levels. If the customer decides to accept a lower-performing DR setup, it’s important to get their sign-off to ensure alignment with business expectations and risk tolerance. 

18.  Do you need to have replicated copies of all the VMs, but only plan to restore a subset in the event of a disaster?

Why ask? This question helps identify potential licensing requirements and the size of the target cluster needed for disaster recovery. If all VMs are being replicated but only a subset will be restored, it may impact the resources required for replication and storage, as well as the licensing model. This insight allows us to design a more efficient DR solution that aligns with your actual recovery needs and avoids unnecessary resource allocation. 

19.  How frequently do you test your disaster recovery plan, and what does the plan entail?

Why ask? This question helps determine whether the customer has actively validated their disaster recovery plan to ensure it will perform as expected during an actual disaster. Many organizations may be optimistic about the effectiveness of their plans, but testing provides a reality check and reveals any gaps or areas for improvement. Understanding the testing frequency and details allows us to assess the maturity of their DR strategy and offer guidance on how to enhance it, ensuring better preparedness and faster recovery in the event of an incident. 

20.  Do you have existing runbooks in place for disaster recovery, and how are they used during an incident?

Why ask? This question helps us understand the customer’s current disaster recovery processes and whether they have a structured, documented plan (runbook) for responding to an incident. If they don’t have runbooks in place, it highlights an opportunity for us to assist in developing a comprehensive plan that can be tested and refined. A well-documented and regularly tested runbook ensures a faster, more organized response during a disaster, reducing downtime and minimizing business disruption. 

21.  Databases: How are you currently backing up and protecting your database environments?

Why ask? Databases are often among an organization’s most critical assets, so understanding how they are backed up and protected gives us insight into their data protection strategy. They may already be using database-level replication or clustering technologies that offer more granular control than traditional storage-based replication for VMs. This information is key when discussing RPO and RTO, as databases typically require higher levels of availability and more tailored recovery strategies than other parts of the infrastructure. It helps us design a solution that meets the unique needs of their most critical systems. Databases can use native BC/DR continuity meaning that the application itself can do the protection and further enhanced with potential snapshot technology.  

22.  What hypervisor(s) are you using, and are you open to alternatives even if just for DR? 

Why ask?  This question helps us understand which replication products and solutions are compatible with your environment, especially if the source or destination cluster is non-Nutanix. If you’re using ESXi on your primary cluster, this could open the possibility of leveraging Nutanix AHV for the target cluster, optimizing the disaster recovery setup. Understanding their willingness to explore cross-hypervisor DR options enables us to propose the most flexible and efficient solution, while maximizing your existing infrastructure investmentsIf they are using ESXi, this is a good time to also ask about renewal dates and if they have gotten their renewal costs. 

NOTE- This might have been asked in 1st call type contacts.  There is no reason to ask again.

23. Do you have a dedicated network for disaster recovery (DR) traffic, and do you require encryption on those links? 

Why ask? This question helps us determine if network segmentation is needed in Prism for DR replication traffic, ensuring that DR traffic is isolated for security and performance reasons. It also helps us understand if the customer requires encryption in transit for their DR network, which is critical for safeguarding sensitive data during replication. This insight allows us to design a DR solution that meets both security and network performance requirements.  

24.  What is the current bandwidth between sites that you plan to use for DR replication, and what is the latency between those sites?

Why ask? This question helps us assess whether the bandwidth between sites is sufficient to meet the customer’s defined RPO (Recovery Point Objective), based on the rate of change in their environment. Additionally, it ensures that the latency between sites meets the minimum requirements for solutions like Metro Availability or Metro Witness. Understanding these factors allows us to design a DR solution that can meet the customer’s recovery objectives efficiently and reliably. 

25.  What is the current rate of utilization of the network links between the sites you plan to use for replication traffic?

Why ask? These links may be used for other traffic and could impact the available bandwidth that you assume you will have access to for replication traffic.  See if you can get utilization over a 30 day period, and if possible over several months,  to see any trends of increase or decrease in utilization.

26.  How do you handle IP addresses on the recovery site for VMs that have failed to the recovery site?

Why ask? This question helps us understand the customer’s preferred networking failover approach and any potential challenges associated with IP address management during a disaster recovery event. It provides insight into whether they use overlay networks, a stretched Layer 2 subnet between sites, a full subnet failover by updating routes to point to the recovery site, or allow IP addresses to change during failover. Understanding this will help us design a solution that minimizes issues such as broken applications due to hard-coded IP addresses, DNS updates, or cache entry changes, ensuring a smoother failover process. 

  1. What is your current approach to data consistency and application consistency during disaster recovery

Why ask? Understanding how data consistency is ensured—whether through application-consistent snapshots, crash-consistent backups, or other methods—helps ensure that the recovery process doesn’t result in data corruption. This is especially important for transactional systems (like databases) or applications requiring high availability. Knowing how these are addressed can guide the design of the solution for higher data integrity during DR. 

  1. What is your approach to multi-cloud disaster recovery or cloud-based backup?

Why ask? This will help assess the customer’s strategy if they are leveraging or considering a multi-cloud or hybrid-cloud approach to business continuity. It’s crucial to know if the customer is looking to integrate public cloud resources (e.g., AWS, Azure) into their DR plan. This insight helps us integrate Nutanix solutions like Nutanix Cloud Clusters (NC2) for seamless multi-cloud disaster recovery. 

Leave a Reply