The Seven Stages of (IT) Hell
We’ve all been there – where a specific IT scenario or event can be compared to a horror film, filled with suspense and terror.
We’ve all been there – where a specific IT scenario or event can be compared to a horror film, filled with suspense and terror.
I’m not talking about a situation with an application or two going offline, I’m talking about the red-faced, heart-pounding, sweat-pouring, tight-collared, labor-inducing adventure when your whole IT infrastructure is down. Thinking about those sorts of incidents, there always seems to be seven common stages of any architecture outage:
- The whisper. A single report trickles in. One of your core services is not available. You hear the famous words of Roy Trenneman from the IT crowd, “Have you tried turning it off and on again?” It must be a user error.
- The flood. Help desk phones go haywire. Escalation requests are coming in rapid succession. All of a sudden, your mouth is agape.
- Panic. Your boss is standing right behind you, and his boss behind him. They are asking what the problem is and you just know that something is broken. They want an ETA on resolution, and you just sat down at your desk.
- Investigation. The sleeves are rolled up at this point. You check basic network connectivity. Nope. DNS and DHCP are still working, but a large swath of layer-3 network appliances is having trouble. You start digging in, checking manuals, googling, and just figuring stuff out. You have found the right pile of hay, now if you could just find that elusive needle.
- The light bulb. You know of a few things to investigate and fiddle with, and they seem to fit and make sense. There may be one light bulb, or many light bulbs during the course of an IT “exercise.”
- Relief. Ahhh, problem solved. It was Tommy again. He decided to change the common OSPF area. Help desk phone lights go out. Reports from application teams verify that everything is back online.
- Prevention. This is the longest stage. You never want to experience that kind of panic again. You plan upgrades, you introduce auditing, and you change security permissions and policies. Most importantly, you change the passwords and never (ever!) let Tommy have access to the layer-3 devices in your network.
- Bonus stage: Déjà vu. The phone rings….
And then, you wake up and smile. It was all just a dream – more like a nightmare. You’re smiling because you’re the DDI architect and you’ve just invested in a reliable and resilient BlueCat DNS, DHCP and IPAM infrastructure and can finally sleep soundly at night, knowing that all those horrors are in the past.