Cloud data backup – is it ready for the data centre?

Over the past 30 years, data backup has been accomplished by using a backup application that makes a copy of the data to tape and, more recently, to disk. A copy of the tape is sent offsite or data is replicated over a WAN to have an offsite copy for disaster recovery. By Bill Andrews, CEO of ExaGrid Systems.

  • 9 years ago Posted in

Backup can be difficult to operate as backing up changed data on a daily basis and all data on a periodic basis is a big task. The dream for all IT professionals is to simply have someone else run the backups and to pay by the month.

The million dollar question is can you simply outsource your backups, pay a monthly subscription, and move on? The answer is as complicated as backup itself. The challenge of backing up your data to the cloud is that backup is all about changing data. The goal of backup is to ensure that you have all the most recent changes to databases, email, user files, and other data in order to ensure that data does not have to be recreated if it is deleted, overwritten, corrupted, or destroyed. To that end, backups occur every night to make a full copy of all databases, a full copy of all email servers, and any file that has changed since the day before. If an organisation has a small amount of data with a low change rate, then not much data needs to be backed up each day or night. However, if the amount of data is large, then the daily changed data will also be large, creating bigger challenges in order to back it up.

To back up to the cloud, the data would have to leave the data centre, traverse the Internet, and land at the cloud storage provider whether that is a specific cloud backup provider or a public cloud provider such as Amazon, Microsoft Azure, or Google. The challenge of getting the data to the cloud depends on the amount of data. If the data is small, then low bandwidth is required to move the data to the Internet on the way to the cloud storage. However, if the data is too large, the amount of bandwidth required to move the data to the Internet becomes cost prohibitive. As a result of this challenge what you typically see is the following.

For consumers (low amounts of data and change rates), there is software that runs on a PC that captures the daily changes and sends them to the cloud. The consumer pays a flat yearly fee.

For small businesses (a few hundred gigabytes to a few terabytes), there is software that runs at the organisation’s site that backs up data to an onsite disk appliance to keep a local set of backups onsite. Copies of the data are sent to the cloud provider as a second copy or disaster recovery copy. Sometimes short-term backups are kept onsite and longer-term backups called “versions” or “history” are kept offsite. The organisation pays by the data amount stored per month. Over three years, this is more expensive than running backups yourself but if you don’t have the staff, this can certainly get the backup monkey off your back.

Above a few terabytes, the math does not work due to the amount of bandwidth required from the organisation’s data centre to the Internet. The cost of the bandwidth far exceeds running your own backups. This is true even if you use data deduplication and only move changed bytes or blocks. The reason is that backups occur every night and therefore you need enough bandwidth to complete the transfer into the cloud before the next backup begins. In organisations of over 5TB, with a few exceptions, organisations run their own backup application and back up to tape onsite and use tape for offsite, back up to disk onsite and use tape for offsite, or back up to disk onsite and replicate to disk offsite. If retention is lower than four weeks, typically straight disk is used. If the retention/history is weeks to months to years, then disk-based backup appliances with data deduplication are deployed. Data deduplication stores only the unique bytes or blocks from backup to backup to use the least amount of disk possible, greatly lowering the cost of using straight disk.

So, in summary, the answer to this highly debated topic is that it depends. If you are a consumer or a small business with a few terabytes of data or less, you can absolutely use the cloud if you don’t want to operate your own backups. In a three-year side by side comparison, it will cost more to use the cloud but avoiding the aggravation of running your own backups may be worth it. If your data is multiple terabytes to tens or even hundreds of terabytes of data, the cost of the on-ramp Internet bandwidth over three years will far exceed the cost of running your backups. When the on-ramp Internet bandwidth math will work is anyone’s guess.
 

Exos X20 and IronWolf Pro 20TB CMR-based HDDs help organizations maximize the value of data.
Quest Software has signed a definitive agreement with Clearlake Capital Group, L.P. (together with...
Infinidat has achieved significant milestones in an aggressive expansion of its channel...
Collaboration will safeguard HPC storage systems and customer data with Panasas hardware-based...
Peraton, a leading mission capability integrator and transformative enterprise IT provider, has...
Helping customers plan for software failure, data loss and downtime.
Cloud Computing and Disaster Recovery specialist, virtualDCS has been named as the first UK-based...
SharePlex 10.1.2 enables customers to move data in near real-time to MySQL and PostgreSQL.