We surveyed 300 organisations operating a range of different sized virtual environments. Eighty percent of them had some portion of their virtual environment hosted in the cloud, with 29% hosting more than half of it there. Despite that, we found that only 43% of those organisations using the cloud to host their VMs also use the cloud as one of their backup technologies. This is surprising given the stated goals for recovery, as we’ll see later.
How often do businesses back up their virtual environment and what approach do they use?
The foundation of all data recoverability is a robust back up programme, so we asked organisations how much of their virtual environment was backed up at least daily. 37.5% of respondents said they backed up 100% of their environment every 24 hours, but a significant proportion (23%) back up less than 50% of their environment daily. This less rigorous approach was more prevalent among businesses with smaller virtual environments.
Organisations typically focused on two primary methods for back up. Almost 59% opt for on-premises disk-to-disk, while 50% use snapshots. 36% favour cloud-based backup and this figure is higher among those with smaller environments. When following the 3-2-1 rule of back up (three data copies, in two formats, with one stored offsite) organisations with large environments rely on SAN/NAS replication, offsite tape storage and application data mirroring, while those with smaller environments were making more use of the cloud to hold an air-gapped copy of their data.
When it comes to the main concern about data outages, it’s the thought of data loss that is keeping IT managers awake at night. 54% of respondents listed that as their top worry, with loss of productivity, revenue and reputation trailing behind.
What’s causing system outages?
While malicious attacks tend to grab the headlines, the root cause of most outages for our survey respondents was more mundane. 46% were the result of hardware failure while 41% originated in human error. These causes were particularly prevalent in the larger VM environments. Malicious attacks accounted for only 24% of the total and of those who had suffered a ransomware attack, only 4% paid up, with 71% leveraging backups or a formal disaster recovery plans to retrieve ransomed data.
Once an outage occurred 34% of businesses saw downtime of more than 4 hours, with 32% of these seeing downtime of 24 hours or more. And the larger the organisation’s virtual environment, the longer the reported downtimes. With disruption this significant, it’s essential that there are robust plans in place to bring systems back online when disaster strikes, so how are businesses doing on backup and recovery?
No plan survives contact with the enemy – especially if there is no plan…
Having back up and a disaster recovery (DR) plan is good, but if you never test it, it’s not worth the paper (or pixels) it’s written on. Sixty-eight percent of businesses surveyed had needed to fully recover an application or virtual machine due to an outage, but a significant proportion of these had no idea whether their recovery programme was going to work, as 27% only test back up during recovery – which is an awkward time to discover that your back up protocol has failed.
When it comes to DR plans, 50% of organisations had less than half of their VMs protected by a DR plan; an incautious 21% had no DR plan for any part of their virtual environment. For those that did have a plan, over half test it either annually or during recovery itself – it’s not a test if your business relies on it working! We frequently advise our customers here at iland that disasters don’t happen to a convenient schedule so it’s essential to run live tests at randomised intervals and under different workload conditions to ensure that your plan will survive the stress of a disaster scenario.
Given this cavalier approach to DR testing, what Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) do businesses have and are they realistic?
The majority of organisations require from a few hours to as long as 24 hours to recover critical applications – only 11% were confident that they could recover within minutes. When it comes to RPOs for data loss a quarter of organisations have zero tolerance, with a further 41% having a tolerance of less than 4 hours of lost data. This ambition doesn’t match reality. If only 11% of organisations can restore systems within minutes, there’s no way that the 25% who have zero tolerance for data loss can meet their objectives, and little likelihood that those who are aiming for an RPO of 4 hours or less can hit that, either.
To execute DR plans most businesses were using SAN/NAS replication and virtual machine level replication methods. The difficulty with this approach is that if the source data is encrypted by ransomware, for example, the replicated data will also be affected. With 24% of organisations having experienced a ransomware breach, this should be taken into consideration. This is where cloud-based backup and DR have the advantage, by guaranteeing an air-gapped data copy that cannot be penetrated by malicious code.
When taken altogether, these statistics paint a concerning picture of the ability of businesses to recover in the face of an outage in their virtual environment. Infrequent DR testing, partial VM backup, slow recovery times and a reliance on replication for DR execution means that the processes in place to facilitate backup and recovery don’t match the business continuity targets they’re meant to achieve. Part of this could be down to the fact that virtual environments are flexible and evolve with workloads, meaning that a strategy that worked six months ago is no longer suitable. We recommend that businesses regularly review and test backup and DR to ensure that your data protection plan is fit for purpose and won’t leave you facing a recovery disaster when breaches occur.