What causes a DNS outage? Humans, mostly

Human error is behind most DNS outages. Learn more from BlueCat about the dire impacts of outages and why homegrown DNS solutions increase outage risk.

Rebekah Taylor

July 15, 2021

The majority of the time, a DNS outage has a simple cause: misconfiguration. It’s most likely to occur as a result of changing or adding something in your enterprise network.

Of course, it’s more likely that a novice will make a configuration mistake, but experienced DNS admins can make them, too. We are all human, after all.

Sometimes equipment failures or malicious attacks can direct the blame elsewhere. But human errors are, by far, the most common culprit.

This post will explore the causes and signs of DNS outages. Next, it will examine how dire the consequences can be. Then, it will look at why homegrown approaches to DNS management increase your risk and how BlueCat’s platform can help keep an outage at bay.

Causes and signs of a DNS outage

Most of the time, the Domain Name System (DNS) just works. (DNS translates human-readable domain names (like bluecatnetworks.com) to computer-friendly Internet Protocol (IP) addresses (like 104.239.197.100)). But when it doesn’t, things get really bad in a hurry. It potentially impacts or can even fully take down your organization’s web presence.

A DNS server not responding can happen for any number of reasons, but there are a few frequent causes.

Improper configuration of DNS records, which tell servers exactly how to respond to a DNS record, is one of the most common. Additionally, configuring a too-high time to live (TTL), a server setting that tells a cache how long to store DNS records, can also cause an outage.

Hardware or network failures and high DNS latency (loading time) are also sources of trouble. Furthermore, distributed denial of service (DDoS) attacks can cause some DNS outages by overloading DNS servers.

The most common sign of trouble is straightforward: Your website isn’t accessible or your web-based applications fail to function. Oftentimes, you might encounter an NXDOMAIN error.

The consequences of a DNS outage can be dire

There is no shortage of high-profile DNS outage stories, and the reigning king is arguably Dyn.

In 2016, Dyn was the victim of DDoS attacks that brought down its managed DNS service for nearly 12 hours. The real-time impacts quickly spread across the U.S. and Europe, taking down about 70 sites, including behemoths like Amazon, Twitter, and Netflix.

The series of attacks were coordinated through a botnet of IoT devices infected with malware. At the time, it was the largest outage ever of its kind.

U.S. map of the widespread impacts of Dyn

Other headline grabbers

Akamai fell victim to a DNS outage in July 2021 that took down numerous high-profile websites, such as Delta Airlines, PlayStation Network, and Oracle, for close to an hour. According to the content delivery network services provider, the outage was thanks to a software configuration update that triggered a bug in Akamai’s Edge DNS service. Akamai returned things to a normal state after rolling back the update.

Microsoft is no stranger to DNS outages making headlines, either. In 2019, Azure services were down for nearly two hours thanks to a nameserver delegation change affecting DNS resolution. The company admitted that, during a migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated.

In April of this year, Azure again went down for an hour due to DNS. This time, it was caused by a code defect in Azure’s DNS service that was uncovered after an unusual surge of DNS queries and excessive client retries.

In 2020, CloudFlare, a major provider of DNS services, had its 1.1.1.1 DNS service down for about 25 minutes due to a configuration error on a router located in Atlanta. The error caused all traffic across CloudFlare’s backbone to be sent to Atlanta, overriding any load balancing. All network locations connected to it failed.

Who wants to tout that someone made a mistake?

While there’s no official compendium of DNS outages, one thing is certain: They are likely underreported. After all, who wants to tout that their DNS went down because someone screwed up?

Sure, if it’s a DDoS attack, you’ll probably read about it. There’s no individual to blame and they hold a certain level of intrigue (who was behind it all?). Furthermore, they are useful cautionary tales about the importance of cybersecurity.

But human error causes most outages, which organizations sometimes don’t want to publicly report. They just provide a vague status update that they’re working on it.

Homegrown approaches put enterprises at greater risk for a DNS outage

Organizations that use BIND, Microsoft Active Directory, or some other kind of homegrown approach to managing DNS are at greater risk for DNS outages.

Why? Because there are far more opportunities for error.

With these types of network solutions, DNS servers are each configured individually, often with one or two DNS admins executing scripts. And that one person who knows how to configure all the DNS servers might be really great at what they do.

But what if they suddenly aren’t there to configure them anymore (known as the bus factor)? Who will? With BIND, Active Directory, or anything else homegrown, DNS administration can easily become a single point of failure.

Further, if something goes south as a result of an error, there is no tech support number or professional services team to call for BIND. It’s a long and manual process to determine where the problem lies and fix it.

BlueCat’s DNS platform reduces your DNS outage risk

BlueCat’s DNS platform provides a single pane of glass from which to view and centrally manage all your DNS activity. There is no need to individually update dozens of DNS servers one at a time with scripts. Instead, you can easily update them all at once from the same user interface.

Completing just one configuration significantly reduces the risk of basic human error. You can also expand the pool of network admins who can do configurations, spreading the workload and eliminating a single point of failure.

Should a server fail, BlueCat’s platform makes it much easier to either rebuild the server or replace the hardware, deploy the configuration, and get it back up and running quickly.

BlueCat often hears from customers when they have a DNS outage. BlueCat can work with you to get to the bottom of it and get your network back up and running.


Published in:


An avatar of the author

Rebekah Taylor is a former journalist turned freelance writer and editor who has been translating technical speak into prose for more than two decades. Her first job in the early 2000s was at a small start-up called VMware. She holds degrees from Cornell University and Columbia University’s Graduate School of Journalism.

Related content

Get fast, resilient, and flexible DDI management with Integrity 9.6

With Integrity 9.6, network admins can get support for new DNS record types, architect and configure multi-primary DNS, and automate IP assignments.

Read more

Deepen your security insight with Infrastructure Assurance 8.3

BlueCat Infrastructure Assurance 8.3, with an enhanced analytics dashboard, including interactive widgets and top 10 alerts, is now available.

Read more

Security, automation, cloud integration keys to DDI solution success

Only 40% of enterprises believe they are fully successful with their DDI solution. Learn how to find greater success with new research from EMA and BlueCat.

Read more

Our commitment to Micetro customers and product investment

From CEO Stephen Devito, a word on BlueCat’s ongoing commitment to supporting Micetro customers and Micetro’s evolution as a network management tool.

Read more

Seven reasons to rethink firewall monitoring and boost automation 

With BlueCat Infrastructure Assurance, you can better protect your network with automated alerts and suggested remedies for hidden issues in your firewalls.

Read more

Five ways to avert issues with BlueCat Infrastructure Assurance

By flagging and notifying you of hidden issues before they cause damage, you can go from reactive to proactive in your Integrity DDI environment.

Read more