Last updated on April 29, 2021.
What’s the scariest thing that has ever happened to you?
Is it a mysterious shadow in your hallway at night? Nearly losing your child in a crowd? For folks in the IT industry, is it waking up to a network outage in the middle of the night?
If you think the scariest stories are only the ones involving possessed dolls and haunted houses, some network admins we know would say think again. When devices rely on DNS to communicate across the network, an outage will turn a network admin pale as a ghost.
In honor of Halloween, here are a few DNS horror stories which our users experienced before they made the move to BlueCat Adaptive DNS.
1. Jamaican Jump Scare
I was messaging one of the DNS operations folks about his vacation to Jamaica. Well, he must have looked away from his screen for a minute at a very inopportune time, because shortly thereafter I noticed that our company’s top level domain was not resolving anymore. No company.com internal, no external company.com, nothing was working. DNS was down.
I looked up one of the names I knew was supposed to be in the domain, and it was so-and-so.jamaica. He had inadvertently overwritten AND deployed the company top level domain to read ‘Jamaica’! We all spent about an hour deploying the correct domain to all the DNS servers. At least we spent it in Jamaica.
2. Hauntingly Fragile Architecture
In the early years, we had a ton of pain getting DNSSEC straight on our DNS/DHCP server’s hidden primary. Our DNS/DHCP server could not support the full SOA serial number dynamic updates need. At a certain point, this can cause dynamic updates to “run ahead”. Since the serial number is older on the hidden primary, the secondaries won’t pull the new zone, causing DNSSEC to age out and break the public-facing zone.
Every time this happens, the ops team will have to re-add sufficient dynamic updates to compensate the serial skew and make the hidden primary catch up with the secondaries. Spotting these errors is difficult without extremely good monitoring. It has gotten better now, but the underlying architecture is still fragile.
3. The Horrors of Typos
Trying to find a subnet by entering its IP address was a nightmare with an older version of one product.It works when we tried to look up an unassigned IP in IPv4, however, doing the same for IPv6 addresses is impossible.
Sometimes, we’ll make a typo in the IP address.This made things so much worse. For example, you try to search for a subnet using an IP address that you mistype. However, because of the typo, quick search does not return any results, and you land in advanced search. Now, say you realize your typo in advance search and correct it. Quick search could have pulled the corresponding subnet up in the first place, but advanced search will return an empty list! Now we have to copy the same IP address, go back to quick search, and push it back into quick search to get the result you initially wanted.
4. A Mysterious Zone Disappearance
We had our internet service provider hosting our DNS… why? I don’t know. One day, our zone just disappeared! The server was up and it was responding to queries, but not for our zone. We called our provider and they were like,“Who are you?”, “Is this ours?”, “How was it spelled again?”, and “Can you send a backup of your zone file?”
After a few hours of this back-and-forth, they finally figured out that some lackey over there “upgraded” the server… without knowing that they were hosting our zone there.
5. Late Night Terror
Our biggest nightmare was when we replaced our primary DNS server from one physical appliance to another and deleted the server from the management console. What we did not know is that the secondary server was not actually configured on all of the servers out on the farm. So a lot of the servers started to time-out after an hour. We started to put the new DNS box in place and re-push the configuration from the management console, but what we did not know is that all the deployment options (primary/secondary/forwarder) do not come back with the configuration. So it all went blank and we could not resolve anything.
Since the primary DNS was not available and the secondary one was not working, there was no way to resolve this. We lacked any sort of DNS high availability. Corporate websites, internal ticketing, alarms, notifications and monitoring – all of it wasn’t working.
We had to call our network to go physically to the building, plug our laptop in the port, and get access to the management server. But then it was pretty late at night and getting in at night is almost impossible! We had to wake up about 15 people to get proper authorization to get into the computer room, but thankfully, once we were in, we could restore everything without any problems.
WHAT A NIGHTMARE. Never, ever delete the primary DNS server from your management console.
6. The Spreadsheet of Doom
“Before our division was sold off and we became our own company, we had a parent company. In the process of the divestiture, we needed a new source of truth for DNS that was not coming with us. Two challenges we faced at the time was the fact that we didn’t have a single source of truth for DNS, and that DNS was more of an afterthought. At the time, everything we had for DNS was all in spreadsheets and local host files.”
Senior management barely knew anything about DNS. When I go, “No, what’s keeping track of your subnets and VLANS?” What I hear is, “Well, we’ve got these spreadsheets over here.”
Yeah, you have those spreadsheets, but you’ve got one spreadsheet up across 12 teams and everybody’s making changes. There’s one person with the spreadsheet open, and he makes a change while another one is making another change. His change overwrites the other person’s changes and nothing’s ever reflected. It was a friggin’ mess.
Sometimes, the scariest things for IT teams isn’t a movie they saw but takes the shape of what they deal with every day. If there’s one thing these stories show, it’s the fact that your team isn’t alone – these things happen to everyone! Ask any IT team, and they would probably have their own DNS horror story to tell.
However, not all DNS stories have to be scary. If your team has a DNS solution that is flexible, scalable, and full of powerful features and integrations, some of these problems can be mitigated. Get visibility, security, and automation through Adaptive DNS. Learn how BlueCat’s DDI solution can rid your IT house of any horrors. Or, to start fixing right away, read more about best practices for deploying infrastructure.