Backup, DR and BCP for Dummies
Being new to vBridge and not overly technical, I was a little apprehensive and slightly confused about what content I could add to a technical blog channel that would be worthwhile for readers. I then started reflecting on my 16 or so years working in the technology sector, and it dawned on me that for business owners and IT decision makers, often the best content related to technology topics is presented in a non-technical form, that is easy to understand and is relative to everyday business challenges. When I thought of it like this, I realised that there are many things I could blog about at a higher level than a traditional engineering blog. For my first attempt in this forum, I would like to share a few thoughts about Backup, IT Data Replication and Disaster Recovery (both sometimes referred to as DR) and Business Continuity planning.
To start, I would like to share my definition on each of these terms:
- Backup – A point in time snapshot of the environment that is current and accurate at the time the snapshot is taken, this can be used as a point of recovery of data from that specific time
- Data Replication (DR) – Replicating Data on a defined schedule to a destination source – i.e. VM level replication to a different Geographic IT infrastructure away from the main production site
- Disaster Recovery Plan (DRP) – The plan that is put in place to enable the business to access and effectively use the data and infrastructure when disaster situation happens to the IT environment
- Business Continuity Plan (BCP) – A plan for the whole business to persist when a disaster event happens that affects the business as a whole. IT DR plan should be formed out of the business BCP plans
The reason I chose this as a blog topic is I am continually amazed by the confusion around these concepts. Also having lived through the Christchurch quakes and experienced a wide array of disasters that have occurred in my client base over the years, this is a topic that every business needs to consider and plan for.
At the base level, every business should have a robust backup solution for their important business data. It does not matter where your data resides, IT managers should always insist on understanding how your data is backed up, and how it is protected from possible corruption. Just because your data resides in a Software as a Service Cloud platform, do not assume it is safe, probe your supplier on their backup methodology and structure. The same is true with public cloud, private cloud and on-premise server deployments. Understand how the data is backed up and how it can be recovered in the event where recovery is necessary. Test your backups regularly to ensure your solution is robust and meets your requirements around recoverability and data retention.
A point I really want to emphasise, is that backups are not a disaster recovery solution. Backups, although an absolute must, are problematic as often data can be lost between the time of the backup snapshot and the time when a file needs to be restored. Some backups can also be slow to recover creating a longer return to operations.
All businesses should consider IT disaster recovery planning and how that flows into your overarching business BCP. In the interests of keeping this blog brief, my opinion is that BCP and DR should be a “top down” process. What I mean by this is, the business executive need to agree that having a robust BCP plan is valuable for the business and that to achieve a good result, all areas of the business need to buy into this, and accept that a good BCP will incur effort and cost to formulate and execute. IT should not be an island without the rest of the business when it comes to Business Continuity Planning.
Once BCP is agreed on at a strategic level, IT then can then consider and execute IT DR planning. In my mind there are three key scenario’s that should be planned for
- An event occurs that renders IT systems unavailable for a sustained period
- An event occurs that renders the businesses main offices inhabitable or unavailable
- An event occurs that stops users connecting to IT resources and data repositories
There are two key questions that always should be addressed when considering IT DR strategy. These are:
- How long can the business survive without functional IT systems?
- Can we afford any data loss?
The answers to these questions will drive our thinking when planning our approach to DR and BCP.
Whenever we think about IT DR Planning there are two key terms to be familiar with these are:
- RPO – Recovery Point Objective – this is how far back do we need to go to restore good data
- RTO - Recovery Time Objective – the amount of time it takes to recover systems after a disaster event
The figure below is a graphical representation of these points.
It is IT’s job is to work with the business to determine the key systems, acceptable level of data loss of these systems and how long can the business can be without each system. This is not a “one size fits all” scenario i.e. different RTO and RPO’s can be assigned to different systems based on importance to the business.
Once all key systems have been defined and an RTO and RPO figure has been placed against them the IT team can then look to valid technology solutions that will assist in achieving these metrics.
Part two of this blog will discuss some different DR solutions and how they can be applied in the real-world setting.