What causes a DNS outage? Humans, mostly

Human error is behind most DNS outages. Learn more from BlueCat about the dire impacts of outages and why homegrown DNS solutions increase outage risk.

Broken road landscape
Key takeawaysThis key takeaway was generated through LLMs crawling the page and coming up with an overview of the content.

The article examines common causes, signs, and consequences of DNS outages in enterprise networks, emphasizing that human error—most often misconfiguration when changing or adding network elements—is the primary driver. It describes real-world high-impact outages (Dyn, Akamai, Microsoft Azure, CloudFlare) to show operational impact on web presence and applications, and explains why homegrown DNS approaches like BIND or Active Directory increase risk through decentralized, manual configuration and single-person knowledge. Finally, it presents BlueCat’s DNS platform as a centralized management solution that reduces human-error risk, removes single points of failure, and helps restore services faster through centralized configuration and support.

What are the most common technical causes of a DNS outage described in the article?

The article identifies misconfiguration of DNS records as the most frequent cause, including mistakes made when changing or adding network elements. Other technical causes discussed are overly high TTL settings that cause stale caching, hardware or network failures, high DNS latency, and distributed denial of service (DDoS) attacks that overload DNS servers. It also notes that software defects or configuration updates (as in Akamai’s case) and incorrect nameserver delegation changes (as occurred with Microsoft Azure) can trigger outages. The article emphasizes that human error underlies many of these technical problems.

Why do homegrown DNS solutions like BIND or Active Directory increase the risk of outages?

According to the article, homegrown approaches such as BIND or Microsoft Active Directory increase outage risk because each DNS server is configured individually, often via scripts run by one or two administrators. This creates many more opportunities for human error and can concentrate institutional knowledge in a single person (the bus factor), making administration a single point of failure. When problems occur, there is typically no vendor tech support to call, so troubleshooting and remediation are manual and time-consuming. Those factors combined mean mistakes are harder to prevent and slower to recover from.

How does BlueCat’s DNS platform reduce the likelihood and impact of DNS outages?

The article explains that BlueCat’s platform provides a single pane of glass to view and centrally manage all DNS activity, eliminating the need to update dozens of servers individually with scripts. Centralized configuration lets administrators apply one change across servers, reducing routine human-error risk and enabling more people to share administrative tasks, which removes single-person dependencies. In the event of server failure, the platform simplifies rebuilding or replacing hardware and deploying configurations quickly, shortening recovery time; additionally, BlueCat offers customer engagement to investigate outages and help restore networks.

The majority of the time, a DNS outage has a simple cause: misconfiguration. It’s most likely to occur as a result of changing or adding something in your enterprise network.

Of course, it’s more likely that a novice will make a configuration mistake, but experienced DNS admins can make them, too. We are all human, after all.

Sometimes equipment failures or malicious attacks can direct the blame elsewhere. But human errors are, by far, the most common culprit.

This post will explore the causes and signs of DNS outages. Next, it will examine how dire the consequences can be. Then, it will look at why homegrown approaches to DNS management increase your risk and how BlueCat’s platform can help keep an outage at bay.

Causes and signs of a DNS outage

Most of the time, the Domain Name System (DNS) just works. (DNS translates human-readable domain names (like bluecatnetworks.com) to computer-friendly Internet Protocol (IP) addresses (like 104.239.197.100)). But when it doesn’t, things get really bad in a hurry. It potentially impacts or can even fully take down your organization’s web presence.

A DNS server not responding can happen for any number of reasons, but there are a few frequent causes.

Improper configuration of DNS records, which tell servers exactly how to respond to a DNS record, is one of the most common. Additionally, configuring a too-high time to live (TTL), a server setting that tells a cache how long to store DNS records, can also cause an outage.

Hardware or network failures and high DNS latency (loading time) are also sources of trouble. Furthermore, distributed denial of service (DDoS) attacks can cause some DNS outages by overloading DNS servers.

The most common sign of trouble is straightforward: Your website isn’t accessible or your web-based applications fail to function. Oftentimes, you might encounter an NXDOMAIN error.

The consequences of a DNS outage can be dire

There is no shortage of high-profile DNS outage stories, and the reigning king is arguably Dyn.

In 2016, Dyn was the victim of DDoS attacks that brought down its managed DNS service for nearly 12 hours. The real-time impacts quickly spread across the U.S. and Europe, taking down about 70 sites, including behemoths like Amazon, Twitter, and Netflix.

The series of attacks were coordinated through a botnet of IoT devices infected with malware. At the time, it was the largest outage ever of its kind.

U.S. map of the widespread impacts of Dyn

Other headline grabbers

Akamai fell victim to a DNS outage in July 2021 that took down numerous high-profile websites, such as Delta Airlines, PlayStation Network, and Oracle, for close to an hour. According to the content delivery network services provider, the outage was thanks to a software configuration update that triggered a bug in Akamai’s Edge DNS service. Akamai returned things to a normal state after rolling back the update.

Microsoft is no stranger to DNS outages making headlines, either. In 2019, Azure services were down for nearly two hours thanks to a nameserver delegation change affecting DNS resolution. The company admitted that, during a migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated.

In April of this year, Azure again went down for an hour due to DNS. This time, it was caused by a code defect in Azure’s DNS service that was uncovered after an unusual surge of DNS queries and excessive client retries.

In 2020, CloudFlare, a major provider of DNS services, had its 1.1.1.1 DNS service down for about 25 minutes due to a configuration error on a router located in Atlanta. The error caused all traffic across CloudFlare’s backbone to be sent to Atlanta, overriding any load balancing. All network locations connected to it failed.

Who wants to tout that someone made a mistake?

While there’s no official compendium of DNS outages, one thing is certain: They are likely underreported. After all, who wants to tout that their DNS went down because someone screwed up?

Sure, if it’s a DDoS attack, you’ll probably read about it. There’s no individual to blame and they hold a certain level of intrigue (who was behind it all?). Furthermore, they are useful cautionary tales about the importance of cybersecurity.

But human error causes most outages, which organizations sometimes don’t want to publicly report. They just provide a vague status update that they’re working on it.

Homegrown approaches put enterprises at greater risk for a DNS outage

Organizations that use BIND, Microsoft Active Directory, or some other kind of homegrown approach to managing DNS are at greater risk for DNS outages.

Why? Because there are far more opportunities for error.

With these types of network solutions, DNS servers are each configured individually, often with one or two DNS admins executing scripts. And that one person who knows how to configure all the DNS servers might be really great at what they do.

But what if they suddenly aren’t there to configure them anymore (known as the bus factor)? Who will? With BIND, Active Directory, or anything else homegrown, DNS administration can easily become a single point of failure.

Further, if something goes south as a result of an error, there is no tech support number or professional services team to call for BIND. It’s a long and manual process to determine where the problem lies and fix it.

BlueCat’s DNS platform reduces your DNS outage risk

BlueCat’s DNS platform provides a single pane of glass from which to view and centrally manage all your DNS activity. There is no need to individually update dozens of DNS servers one at a time with scripts. Instead, you can easily update them all at once from the same user interface.

Completing just one configuration significantly reduces the risk of basic human error. You can also expand the pool of network admins who can do configurations, spreading the workload and eliminating a single point of failure.

Should a server fail, BlueCat’s platform makes it much easier to either rebuild the server or replace the hardware, deploy the configuration, and get it back up and running quickly.

BlueCat often hears from customers when they have a DNS outage. BlueCat can work with you to get to the bottom of it and get your network back up and running.


Published in:


An avatar of the author

Rebekah Taylor is a former journalist turned freelance writer and editor who has been translating technical speak into prose for more than two decades. Her first job in the early 2000s was at a small start-up called VMware. She holds degrees from Cornell University and Columbia University’s Graduate School of Journalism.

Related content

Close-up of interlocked metal chain links symbolizing connected network objects and relationships in IPAM

How to map your network with user-defined links in Integrity X

Map your network with user-defined links in Integrity X to define and manage custom relationships, such as dual-stack and NAT environments.

Read more
Flock of geese flying in formation across a blue sky, framed by a pink graphic border, symbolizing coordinated network migrat

Automate your DDI modernization path by migrating with Micetro

Automate cross-platform DNS and DHCP migration with Micetro to reduce risk, eliminate manual effort, and modernize infrastructure faster.

Read more
Three armored figures walking toward a futuristic Las Vegas skyline with pyramids, glowing orb, and "Welcome to Fabulous Las

Your journey to intelligent NetOps begins at Cisco Live

Visit BlueCat’s booth or book a meeting now to learn more about how our solutions can help you build a network that supports constant change.

Read more
Stacked colorful wooden directional arrows on a post by a calm seaside with distant hills and blue sky

Replace BIND and ISC with Micetro DNS/DHCP Server (MDDS)

Tired of patching and manually configuring BIND DNS and ISC DHCP? Discover how Micetro MDDS appliances can replace them for modern DDI.

Read more