What causes a DNS outage? Humans, mostly

Human error is behind most DNS outages. Learn more from BlueCat about the dire impacts of outages and why homegrown DNS solutions increase outage risk.

Broken road landscape
Key Takeaways
  • Most DNS outages stem from human-driven misconfigurations, including incorrect DNS records and inappropriate TTL values, rather than from hardware failures or attacks.
  • DNS outages can completely disrupt web presence and application availability, often surfacing as unreachable sites or NXDOMAIN errors for end users.
  • High-profile incidents at providers like Dyn, Akamai, Microsoft Azure, and Cloudflare illustrate that configuration errors, software defects, and flawed updates can quickly cascade into large-scale DNS failures.
  • Homegrown DNS implementations based on BIND or Microsoft Active Directory increase outage risk due to individually managed servers, script-based changes, and concentrated operational knowledge (bus factor).
  • Lack of vendor support for homegrown DNS solutions lengthens outage resolution times, as diagnosis and remediation are manual and depend heavily on in-house expertise.
  • BlueCat’s centrally managed DNS platform reduces misconfiguration risk by enabling single-interface, multi-server updates, eliminating single points of failure, and simplifying server rebuild and recovery workflows.

The majority of the time, a DNS outage has a simple cause: misconfiguration. It’s most likely to occur as a result of changing or adding something in your enterprise network.

Of course, it’s more likely that a novice will make a configuration mistake, but experienced DNS admins can make them, too. We are all human, after all.

Sometimes equipment failures or malicious attacks can direct the blame elsewhere. But human errors are, by far, the most common culprit.

This post will explore the causes and signs of DNS outages. Next, it will examine how dire the consequences can be. Then, it will look at why homegrown approaches to DNS management increase your risk and how BlueCat’s platform can help keep an outage at bay.

Causes and signs of a DNS outage

Most of the time, the Domain Name System (DNS) just works. (DNS translates human-readable domain names (like bluecatnetworks.com) to computer-friendly Internet Protocol (IP) addresses (like 104.239.197.100)). But when it doesn’t, things get really bad in a hurry. It potentially impacts or can even fully take down your organization’s web presence.

A DNS server not responding can happen for any number of reasons, but there are a few frequent causes.

Improper configuration of DNS records, which tell servers exactly how to respond to a DNS record, is one of the most common. Additionally, configuring a too-high time to live (TTL), a server setting that tells a cache how long to store DNS records, can also cause an outage.

Hardware or network failures and high DNS latency (loading time) are also sources of trouble. Furthermore, distributed denial of service (DDoS) attacks can cause some DNS outages by overloading DNS servers.

The most common sign of trouble is straightforward: Your website isn’t accessible or your web-based applications fail to function. Oftentimes, you might encounter an NXDOMAIN error.

The consequences of a DNS outage can be dire

There is no shortage of high-profile DNS outage stories, and the reigning king is arguably Dyn.

In 2016, Dyn was the victim of DDoS attacks that brought down its managed DNS service for nearly 12 hours. The real-time impacts quickly spread across the U.S. and Europe, taking down about 70 sites, including behemoths like Amazon, Twitter, and Netflix.

The series of attacks were coordinated through a botnet of IoT devices infected with malware. At the time, it was the largest outage ever of its kind.

U.S. map of the widespread impacts of Dyn

Other headline grabbers

Akamai fell victim to a DNS outage in July 2021 that took down numerous high-profile websites, such as Delta Airlines, PlayStation Network, and Oracle, for close to an hour. According to the content delivery network services provider, the outage was thanks to a software configuration update that triggered a bug in Akamai’s Edge DNS service. Akamai returned things to a normal state after rolling back the update.

Microsoft is no stranger to DNS outages making headlines, either. In 2019, Azure services were down for nearly two hours thanks to a nameserver delegation change affecting DNS resolution. The company admitted that, during a migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated.

In April of this year, Azure again went down for an hour due to DNS. This time, it was caused by a code defect in Azure’s DNS service that was uncovered after an unusual surge of DNS queries and excessive client retries.

In 2020, CloudFlare, a major provider of DNS services, had its 1.1.1.1 DNS service down for about 25 minutes due to a configuration error on a router located in Atlanta. The error caused all traffic across CloudFlare’s backbone to be sent to Atlanta, overriding any load balancing. All network locations connected to it failed.

Who wants to tout that someone made a mistake?

While there’s no official compendium of DNS outages, one thing is certain: They are likely underreported. After all, who wants to tout that their DNS went down because someone screwed up?

Sure, if it’s a DDoS attack, you’ll probably read about it. There’s no individual to blame and they hold a certain level of intrigue (who was behind it all?). Furthermore, they are useful cautionary tales about the importance of cybersecurity.

But human error causes most outages, which organizations sometimes don’t want to publicly report. They just provide a vague status update that they’re working on it.

Homegrown approaches put enterprises at greater risk for a DNS outage

Organizations that use BIND, Microsoft Active Directory, or some other kind of homegrown approach to managing DNS are at greater risk for DNS outages.

Why? Because there are far more opportunities for error.

With these types of network solutions, DNS servers are each configured individually, often with one or two DNS admins executing scripts. And that one person who knows how to configure all the DNS servers might be really great at what they do.

But what if they suddenly aren’t there to configure them anymore (known as the bus factor)? Who will? With BIND, Active Directory, or anything else homegrown, DNS administration can easily become a single point of failure.

Further, if something goes south as a result of an error, there is no tech support number or professional services team to call for BIND. It’s a long and manual process to determine where the problem lies and fix it.

BlueCat’s DNS platform reduces your DNS outage risk

BlueCat’s DNS platform provides a single pane of glass from which to view and centrally manage all your DNS activity. There is no need to individually update dozens of DNS servers one at a time with scripts. Instead, you can easily update them all at once from the same user interface.

Completing just one configuration significantly reduces the risk of basic human error. You can also expand the pool of network admins who can do configurations, spreading the workload and eliminating a single point of failure.

Should a server fail, BlueCat’s platform makes it much easier to either rebuild the server or replace the hardware, deploy the configuration, and get it back up and running quickly.

BlueCat often hears from customers when they have a DNS outage. BlueCat can work with you to get to the bottom of it and get your network back up and running.


Published in:


An avatar of the author

Rebekah Taylor is a former journalist turned freelance writer and editor who has been translating technical speak into prose for more than two decades. Her first job in the early 2000s was at a small start-up called VMware. She holds degrees from Cornell University and Columbia University’s Graduate School of Journalism.

Related content

Three armored figures walking toward a futuristic Las Vegas skyline with pyramids, glowing orb, and "Welcome to Fabulous Las

Your journey to intelligent NetOps begins at Cisco Live

Visit BlueCat’s booth or book a meeting now to learn more about how our solutions can help you build a network that supports constant change.

Read more
Stacked colorful wooden directional arrows on a post by a calm seaside with distant hills and blue sky

Replace BIND and ISC with Micetro DNS/DHCP Server (MDDS)

Tired of patching and manually configuring BIND DNS and ISC DHCP? Discover how Micetro MDDS appliances can replace them for modern DDI.

Read more
Row of orange industrial robotic arms positioned along an automated conveyor belt in a factory setting

Automate it all in Integrity with REST v2 API-first DDI management

Discover API-first DDI with Integrity X by using REST v2 to automate DNS, DHCP, and IPAM for scalable, secure network operations.

Read more
Three colleagues at monitors collaborating, overlaid with network, analytics, cloud, and gear icons.

Agentic AI adoption in network observability propels NetOps teams

Network observability is crucial for today’s networks and even more capable with agentic AI, according to new Omdia and BlueCat research.

Read more

⏳ Cisco Live is almost here. Put BlueCat on your agenda for smarter, more secure networks.