The Seven Stages of (IT) Hell

We’ve all been there – where a specific IT scenario or event can be compared to a horror film, filled with suspense and terror.

We’ve all been there – where a specific IT scenario or event can be compared to a horror film, filled with suspense and terror.

I’m not talking about a situation with an application or two going offline, I’m talking about the red-faced, heart-pounding, sweat-pouring, tight-collared, labor-inducing adventure when your whole IT infrastructure is down. Thinking about those sorts of incidents, there always seems to be seven common stages of any architecture outage:

  1. The whisper.  A single report trickles in.  One of your core services is not available.  You hear the famous words of Roy Trenneman from the IT crowd, “Have you tried turning it off and on again?”  It must be a user error.
  2. The flood.  Help desk phones go haywire.  Escalation requests are coming in rapid succession. All of a sudden, your mouth is agape.
  3. Panic.  Your boss is standing right behind you, and his boss behind him.  They are asking what the problem is and you just know that something is broken.  They want an ETA on resolution, and you just sat down at your desk.
  4. Investigation.  The sleeves are rolled up at this point.  You check basic network connectivity.  Nope.  DNS and DHCP are still working, but a large swath of layer-3 network appliances is having trouble.  You start digging in, checking manuals, googling, and just figuring stuff out. You have found the right pile of hay, now if you could just find that elusive needle.
  5. The light bulb.  You know of a few things to investigate and fiddle with, and they seem to fit and make sense.  There may be one light bulb, or many light bulbs during the course of an IT “exercise.”
  6. Relief.  Ahhh, problem solved.  It was Tommy again.  He decided to change the common OSPF area.  Help desk phone lights go out.  Reports from application teams verify that everything is back online.
  7. Prevention. This is the longest stage.  You never want to experience that kind of panic again.  You plan upgrades, you introduce auditing, and you change security permissions and policies.  Most importantly, you change the passwords and never (ever!) let Tommy have access to the layer-3 devices in your network.
  8. Bonus stage: Déjà vu.  The phone rings….

And then, you wake up and smile.  It was all just a dream – more like a nightmare.  You’re smiling because you’re the DDI architect and you’ve just invested in a reliable and resilient BlueCat DNS, DHCP and IPAM infrastructure and can finally sleep soundly at night, knowing that all those horrors are in the past.

Critical conversations on critical infrastructure

Find out how your peers are managing their networks through profound change. Watch this series of live interactive discussions with IT pros & join the debate in Slack.

Join the conversation

Read more

Yes, IT should see what developers do in the cloud

Errors and outages occur when admins lack visibility into DNS and IP allocation in the cloud. With Bluecat, central DDI visibility is within reach.

Read more
Network admins’ top 10 checklist for holiday prep

From syncing NTP to having readily accessible DNS maps, here are 10 things you can do to keep your networks reliable during the holiday lull.

Read more
Temporary workaround for SAD DNS

Ahead of Linux’s patch taking effect, BlueCat Labs has a temporary workaround for protecting against the revived Kaminsky DNS cache poisoning attack.

Read more
IT pros debate: Should you DIY your DDI?

Five IT pros get real about DIY vs. enterprise DNS solutions during the second Critical Conversation on Critical Infrastructure hosted in Network VIP.

Read more