Don't be caught short when it all goes to sh.....
Well, the weekend has just passed, and it turned out to be a short weekend too. I was on-call and got the dreaded Sunday 6am wake-up call that no one wants ... that a reseller’s customer had been hacked, and my services were required. Sleep-in cancelled.
Among other things, the backup server was trashed, NAS compromised, shares deleted, recycle bins disabled, and the usual ransom demands made. All wonderful stuff. Compounding a long day of helping with recovery, the cloud backups that were available for restore (and thankfully untouched), were incomplete. One of the core servers was missing (despite the job name specifying that it was supposed to be in there). There was no way of checking what went wrong with the job either, as the backup server was unusable. It was irrelevant anyway – the server wasn’t in there, damage done, move along.
Eventually the customer was able to recover operationally, but clean-up continued well into the following week, and I’m sure there were some hard questions asked on the Monday.
Not checking backups aside, a lot of this pain could have been avoided with a simple DR test – the organisation in question, wasn’t large, and it would have been straightforward, restoring a domain controller, core server, and spinning up a network/vDom. A few hours work.
DR testing is often neglected because creating a plan for disaster recovery can seem expensive and require resourcing. And because in New Zealand, we IT folk tend to be jack-of-all-trades types and regularly juggling many balls in the air at once, time is at a premium. Some companies may consider just having a DR plan in writing is enough, a box ticking exercise, even if there is no evidence the plan will work when disaster strikes.
It is recommended conducting disaster recovery tests on a regular basis throughout the year and incorporating them into planned maintenance with ongoing staff training and familiarisation. Once testing has been completed, you will have a good idea of what worked, what didn’t, were RTO/RPOs met and where the short comings are.
IT systems are never static, so new and upgraded products need to be tested again. New applications may have be deployed, old ones upgraded, servers, storage, and networking infrastructure added, items moved to the cloud etc etc. A disaster recovery test helps to make sure that a DR plan stays current in an IT environment that changes constantly.
There is plenty of information on how to do DR Testing, best practice and all of that. No size fits all, but you can be assured that sitting on your hands, and doing nothing, while watching the world go by, isn’t going to end well when the actual shit hits the fan.
Oh, and check your backups are working too. IT 101.