What causes a DNS outage? Humans, mostly

Human error is behind most DNS outages. Learn more from BlueCat about the dire impacts of outages and why homegrown DNS solutions increase outage risk.

The majority of the time, a DNS outage has a simple cause: misconfiguration. It’s most likely to occur as a result of changing or adding something in your enterprise network.

Of course, it’s more likely that a novice will make a configuration mistake, but experienced DNS admins can make them, too. We are all human, after all.

Sometimes equipment failures or malicious attacks can direct the blame elsewhere. But human errors are, by far, the most common culprit.

This post will explore the causes and signs of DNS outages. Next, it will examine how dire the consequences can be. Then, it will look at why homegrown approaches to DNS management increase your risk and how BlueCat’s platform can help keep an outage at bay.

Causes and signs of a DNS outage

Most of the time, the Domain Name System (DNS) just works. (DNS translates human-readable domain names (like bluecatnetworks.com) to computer-friendly Internet Protocol (IP) addresses (like 104.239.197.100)). But when it doesn’t, things get really bad in a hurry. It potentially impacts or can even fully take down your organization’s web presence.

A DNS server not responding can happen for any number of reasons, but there are a few frequent causes.

Improper configuration of DNS records, which tell servers exactly how to respond to a DNS record, is one of the most common. Additionally, configuring a too-high time to live (TTL), a server setting that tells a cache how long to store DNS records, can also cause an outage.

Hardware or network failures and high DNS latency (loading time) are also sources of trouble. Furthermore, distributed denial of service (DDoS) attacks can cause some DNS outages by overloading DNS servers.

The most common sign of trouble is straightforward: Your website isn’t accessible or your web-based applications fail to function. Oftentimes, you might encounter an NXDOMAIN error.

The consequences of a DNS outage can be dire

There is no shortage of high-profile DNS outage stories, and the reigning king is arguably Dyn.

In 2016, Dyn was the victim of DDoS attacks that brought down its managed DNS service for nearly 12 hours. The real-time impacts quickly spread across the U.S. and Europe, taking down about 70 sites, including behemoths like Amazon, Twitter, and Netflix.

The series of attacks were coordinated through a botnet of IoT devices infected with malware. At the time, it was the largest outage ever of its kind.

U.S. map of the widespread impacts of Dyn

Other headline grabbers

Akamai fell victim to a DNS outage in July 2021 that took down numerous high-profile websites, such as Delta Airlines, PlayStation Network, and Oracle, for close to an hour. According to the content delivery network services provider, the outage was thanks to a software configuration update that triggered a bug in Akamai’s Edge DNS service. Akamai returned things to a normal state after rolling back the update.

Microsoft is no stranger to DNS outages making headlines, either. In 2019, Azure services were down for nearly two hours thanks to a nameserver delegation change affecting DNS resolution. The company admitted that, during a migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated.

In April of this year, Azure again went down for an hour due to DNS. This time, it was caused by a code defect in Azure’s DNS service that was uncovered after an unusual surge of DNS queries and excessive client retries.

In 2020, CloudFlare, a major provider of DNS services, had its 1.1.1.1 DNS service down for about 25 minutes due to a configuration error on a router located in Atlanta. The error caused all traffic across CloudFlare’s backbone to be sent to Atlanta, overriding any load balancing. All network locations connected to it failed.

Who wants to tout that someone made a mistake?

While there’s no official compendium of DNS outages, one thing is certain: They are likely underreported. After all, who wants to tout that their DNS went down because someone screwed up?

Sure, if it’s a DDoS attack, you’ll probably read about it. There’s no individual to blame and they hold a certain level of intrigue (who was behind it all?). Furthermore, they are useful cautionary tales about the importance of cybersecurity.

But human error causes most outages, which organizations sometimes don’t want to publicly report. They just provide a vague status update that they’re working on it.

Homegrown approaches put enterprises at greater risk for a DNS outage

Organizations that use BIND, Microsoft Active Directory, or some other kind of homegrown approach to managing DNS are at greater risk for DNS outages.

Why? Because there are far more opportunities for error.

With these types of network solutions, DNS servers are each configured individually, often with one or two DNS admins executing scripts. And that one person who knows how to configure all the DNS servers might be really great at what they do.

But what if they suddenly aren’t there to configure them anymore (known as the bus factor)? Who will? With BIND, Active Directory, or anything else homegrown, DNS administration can easily become a single point of failure.

Further, if something goes south as a result of an error, there is no tech support number or professional services team to call for BIND. It’s a long and manual process to determine where the problem lies and fix it.

BlueCat’s DNS platform reduces your DNS outage risk

BlueCat’s DNS platform provides a single pane of glass from which to view and centrally manage all your DNS activity. There is no need to individually update dozens of DNS servers one at a time with scripts. Instead, you can easily update them all at once from the same user interface.

Completing just one configuration significantly reduces the risk of basic human error. You can also expand the pool of network admins who can do configurations, spreading the workload and eliminating a single point of failure.

Should a server fail, BlueCat’s platform makes it much easier to either rebuild the server or replace the hardware, deploy the configuration, and get it back up and running quickly.

BlueCat often hears from customers when they have a DNS outage. BlueCat can work with you to get to the bottom of it and get your network back up and running.


An avatar of the author

Rebekah Taylor is a former journalist turned freelance writer and editor who has been translating technical speak into prose for more than two decades. Her first job in the early 2000s was at a small start-up called VMware. She holds degrees from Cornell University and Columbia University’s Graduate School of Journalism.

Related content

Micetro 11.1 boosts DHCP management for Cisco Meraki SD-WAN

Learn how BlueCat Micetro 11.1 can help you overcome the limitations of Cisco Meraki SD-WAN devices to manage your distributed DHCP architecture.

Read more
Banner announcing BlueCat's acquisition of LiveAction, displaying both logos and the phrase "We're about to get bigger."

BlueCat acquires LiveAction to drive network modernization and optimization

BlueCat’s acquisition of LiveAction will allow customers to expand their view beyond DNS and dive deeper into the health of their network.

Read more

Simplify NIS2 compliance with DNS management

Learn whether the EU’s NIS2 requirements apply to your organization and about how DNS management and BlueCat can boost your path to compliance.

Read more

Detect anomalies and CVE risks with Infrastructure Assurance 8.4 

The Infrastructure Assurance 8.4 release features an anomaly detection engine for outliers and a CVE analysis engine to uncover device vulnerabilities.

Read more

BlueCat has acquired LiveAction

It’s official! BlueCat has acquired LiveAction’s network observability and intelligence platform, which helps large enterprises optimize the performance, resiliency, and security of their networks.