Learning the Hard Way Why Proactive Maintenance is Essential
/We admit—it’s highly ironic that even technology companies occasionally encounter technical difficulties. But it’s true. We aren’t immune to unforeseen glitches that take down business-critical systems, like when a power outage at our office took down our Exchange database a couple months back, leaving us without e-mail for an uncomfortable amount of time. The good news, though, is that when you experience technical failures, you see with new eyes the value of both emergency preparedness and preventative maintenance. In other words, you learn a lesson. And we’d like to share ours with you.
The sun was just coming up at 6:30 a.m. on a Saturday morning. So a power outage wouldn’t have been asannoying as if it happened at night when, you know, you need lights to see. Unfortunately, this particular power outage proved a minor disaster for the 3n1media team, since it shut down several of our business-critical systems, most notably the server hosting our Microsoft Exchange database. And it wasn’t just a matter of waiting for the power to come back on. Once it was back, we had to begin the long process of getting the interrupted systems back in order so that our office could function normally.
Anyway, we put our disaster recovery plan into place (a must-have for all businesses), which took us through the process of assessing the overall health of our technology systems after the power came back on, going through a triage of business-critical systems to prioritize troubleshooting (we started with the phone system), methodically finding the problem with the email database, bringing it back to life, and then restoring it. It was a nuisance, to be sure, but, because we stuck to our plan, we were able to take care of it in relatively short order.
However, the main lesson here was the necessity of proactive maintenance—maintenance that could have prevented our small catastrophe in the first place. After the crisis was averted and we could sit down and breathe a little, we asked ourselves what we could have done to prevent this from happening. After all, power outages are a fact of 21st-century life, and we’re likely to experience another one at some point.
What we came up with is that we could have been more proactive about performing power tests—controlled, self-inflicted power outages that assess what happens to technology systems when power is removed. Power tests, as any good systems engineer will tell you, are essential maintenance procedures. And we definitely had done power tests on our systems—just not recently enough, so the outage caught us off guard. Had we performed a power test recently, we could have designed and implemented backup-power measures (stronger UPS systems, backup generators, etc.) to protect our systems.
Sure, maintenance procedures like power tests may seem like a hassle. But take it from us—dealing with the results of neglecting them on the other side of an emergency is a much bigger hassle.