True VMware BC/DR with Zerto

Millennia Blog - image0011


One of the great advantages of virtualisation is the ability to dynamically create virtual servers quickly to meet a specific requirement. This had always been difficult with physical servers, and when the complexities of Business Continuity (BC) and Disaster Recovery (DR) were added the cost and resource requirements skyrocketed.

This often led to BC/DR being considered by only the largest organisations, which left smaller organisations dangerously exposed. It has long been touted about that 90% of companies that suffer major outages and data loss fail to survive their disaster, with all the implications for business owners, employees, customers and suppliers alike.

You can’t afford the luxury of leaving your most critical business asset – data – to chance.

However as many have found out just virtualising your server estate does not introduce a panacea that solves your BC/DR headaches. While creating the virtual servers may be relatively easy, effectively transferring them to an alternate site in an emergency can still be a complex task, and getting users online quickly with minimal data loss is key to a successful BC/DR strategy.

The “Cloud”

The Cloud as a dynamic pool of resources has been around for a long time, although widely available commercial versions have been more recent to market. The Cloud gives an end user the opportunity to have a BC/DR system without having to invest in a second site with duplicate hardware and everything that goes with it (power, cooling, staff, etc.).

However once again the existence of the resource does not automatically mean it is easy to incorporate with an end user network for implementation of BC/DR services. The replication of data to the remote site needs to managed and relatively easy to implement or it becomes a further roadblock to uptake.

Introducing Zerto – the FIRST VMware vCloud aware BC/DR system

Zerto is a hypervisor based data replication system that tightly integrates with VMware vSphere (version 4.0 and up) and is fully compatible with VMware vCloud Director (version 1.5 and up). This is something not even VMware has in their arsenal yet, although I don’t doubt Site Recovery Manger has designs on being number two. Zerto is multi-tenanted so that service providers can provide DR services for multiple organisations and keep data safe and separate.

Our experience of Zerto is as part of our newly launched Cloud DR as a Service (DRaaS), a service which we honestly believe would have been too complicated and expensive to contemplate without this software set from Zerto.



The diagram above is an example of a customer to Cloud DR arrangement. Note how the customer can run older versions of vSphere and still be compatible; as there is no storage replication involved the entire service is storage agnostic, and doesn’t even need to be specified in the design – it just underpins where the VMs are located.

Zerto is deployed as several components as follows:

·         The Zerto Virtualisation Manager (ZVM) is deployed at each site and can either be a stand-alone VM or installed on the vCenter server (recommended stand alone on large environments). The ZVM is responsible for managing the replication system, and the link to vCenter. It is accessed via a plugin to vCenter for single pane of glass management.

·         The Zerto Replication Appliance (ZRA) is an appliance installed on each host that is part of the protection system. It is bound to the host by affinity rules and starts and stops with the host during maintenance. This appliance intercepts the IO stream from a VM and splits it so that the IO is diverted away to the remote site in real time. This avoids snapshots and scheduling and provides Continuous Data Protection (CDP) with journaling, to enable point in time VM recovery at any point up to 5 days previous to the failover invocation.

·         Optionally a Cloud Connector appliance can be deployed to allow integration with vCloud Director, so VMs can be migrated into and out of vDCs under the same management system. This is not required for vCenter to vCenter protection.

VMs can be replicated individually or as Virtual Protection Groups (VPG), which allow grouping of VMs that are application aware and provides a consistent point of recovery (as in a web application and database server for example).

The screenshot below shows the Zerto vCenter plugin during one of our testing phases.


Millennia Blog - image003

Here you can see that replication can be bi-directional and although there are only two sites shown here there can be multiple sites in the replication network, and they can replicate in any direction and to any site as required.

Look closely and you will see that the Recovery Point Objective (RPO) is in the order of a few seconds. Many DR solutions would be happy with an RPO of 15 minutes but this is far better than that. Recovery Time Objectives (RTO) are in the order of minutes, with our tests we have recorded as low as 43 seconds to have a protected VM booted up 90 miles away in a second DC and ready to log on. However 5-15 minutes of RTO would be a more likely target, depending on the amount of VMs to be recovered.

In the Cloud the sites can be masked so that a customer can only see the vCloud resources they have access to, and not the network of any other customer. This provides the multi-tenant support that is very important to service providers like ourselves.

WAN compression is also available to minimise the data being sent on a slower WAN link to a remote site.

Our experience of Zerto

As part of the 14 day trial Zerto allow they hand over a DR test plan that details all the scenarios that are possible with Zerto, to ensure everything works and all features are explored. It is during this testing that we discovered a range of features that exceeded even the expectations we had developed during an earlier demo of the product. This is rare in our experience as demos often hide work arounds and features that are more road map than they lead you to believe.

The features available include:

·         Controlled failover for “Disaster Avoidance”. Not all disasters are unexpected, and indeed some  – such as major weather disruption – can be seen days in advance. Even major maintenance slots could be seen as a disaster to an always on application, so it is good to do a controlled migration to the alternate site so users can keep working. Zerto allows for a controlled move that can automatically re-protect to the original site so it can move back after the danger has passed. The move can be rolled back if there is a problem in implementation (say a network connection doesn’t come up due to a configuration problem) or committed to make the alternate site live. A scheduled move deletes the original VM as it is not a disaster and the alternate site becomes the primary, hence the importance of rollback/commit options.

·         Test failover without disruption of live service. A test failover can be done at any time, multiple times, to test the failover mechanism works and this will not affect the protection state of the live VM. Zerto can be configured to disconnect the network of the test VM so that it does not cause issues by connecting to the live network. When a test is completed the test VM is just destroyed.

·         Remote cloning of a VM. At any time, and without disruption to the performance of the live VM, a clone can be taken to a point in time and recovered at the remote site in a matter of seconds. This allows for helpdesk investigation and can mitigate data corruption issues without downtime to the main system. It also enables spawning testing systems from live copies of an application to help in tracking down issues.

·         VSS support. By means of a VSS agent on the VM the normally crash consistent VM recovery can be made application consistent for applications such as databases and mail systems.

·         Pre-seeding. If a VM is TB in size replicating the initial copy over the WAN could be very difficult. It is possible to take a copy of the VM to portable media and transfer to a remote site. This base copy is then used to apply a delta synchronisation, for a much quicker initial sync of a large virtual server.

·         Full vMotion and Storage vMotion compatibility. VMs can be moved from one host to another while being protected and this does not affect the ability of Zerto to protect the VM. Similarly a VM disk can be extended and Zerto will automatically extend the disk of the replica to maintain compatibility.

·         Manual checkpoints. As well as a specific point in time the user can define a manual checkpoint across all VMs, and then recover to that checkpoint to guarantee a specific point in time recovery. These checkpoints can be VSS aware if the VM has the agent installed.

·         Dynamic networking. The IP of the recovered VM can be changed to suit the new network on which it resides during the recovery process. This is automatic and allows for the VM to be online as quickly as possible in a failover situation. Boot orders can also be defined so, for example, a DB server doesn’t come up after an application server requiring to connect to it. However if IP addresses have changed care must be taken; it is advised that DNS is used to locate the DB server and the appropriate DNS server entries made in the network settings of the recovery VM. That way the IP change can be handled by the application through DNS lookups.

Performance and Recovery Objectives

Because Zerto takes a copy of the data in real time via the ZRA you do not get the overhead of constant snapshotting of a VM to enable replication of its disk. Once the data hits the protected VMs disk it is not affected in any way by its protection state. Depending on the amount of data being written to the protected VM there may be a delay in the IO stream while it is being replicated, so it is important in busy environments to monitor VM performance and make sure the VRAs have enough resources so that they don’t cause a bottleneck on the protected system.

We generally see RPOs in the order of 10 seconds or less on our systems, and RTOs never more than 6 minutes in testing. This, combined with the ease of deployments and management of the Zerto system, makes BC/DR finally feasible for the SME.

Millennia® is part of the Zerto Cloud EcoSystem (ZCE) –  and provide both DR to Cloud services, and consultancy for Enterprise site to site BC/DR utilising VMware vSphere and Zerto.