Last updated on August 23, 2021.
The majority of the time, a DNS outage has a simple cause: misconfiguration. It’s most likely to occur as a result of changing or adding something in your enterprise network.
Of course, it’s more likely that a novice will make a configuration mistake, but experienced DNS admins can make them, too. We are all human, after all.
Sometimes equipment failures or malicious attacks can direct the blame elsewhere. But human errors are, by far, the most common culprit.
This post will explore the causes and signs of DNS outages. Next, it will examine how dire the consequences can be. Then, it will look at why homegrown approaches to DNS management increase your risk and how BlueCat’s platform can help keep an outage at bay.
Causes and signs of a DNS outage
Most of the time, the Domain Name System (DNS) just works. (DNS translates human-readable domain names (like bluecatnetworks.com) to computer-friendly Internet Protocol (IP) addresses (like 126.96.36.199)). But when it doesn’t, things get really bad in a hurry. It potentially impacts or can even fully take down your organization’s web presence.
A DNS server not responding can happen for any number of reasons, but there are a few frequent causes.
Improper configuration of DNS records, which tell servers exactly how to respond to a DNS record, is one of the most common. Additionally, configuring a too-high time to live (TTL), a server setting that tells a cache how long to store DNS records, can also cause an outage.
Hardware or network failures and high DNS latency (loading time) are also sources of trouble. Furthermore, distributed denial of service (DDoS) attacks can cause some DNS outages by overloading DNS servers.
The most common sign of trouble is straightforward: Your website isn’t accessible or your web-based applications fail to function. Oftentimes, you might encounter an NXDOMAIN error.
The consequences of a DNS outage can be dire
There is no shortage of high-profile DNS outage stories, and the reigning king is arguably Dyn.
In 2016, Dyn was the victim of DDoS attacks that brought down its managed DNS service for nearly 12 hours. The real-time impacts quickly spread across the U.S. and Europe, taking down about 70 sites, including behemoths like Amazon, Twitter, and Netflix.
The series of attacks were coordinated through a botnet of IoT devices infected with malware. At the time, it was the largest outage ever of its kind.
Other headline grabbers
Akamai fell victim to a DNS outage in July 2021 that took down numerous high-profile websites, such as Delta Airlines, PlayStation Network, and Oracle, for close to an hour. According to the content delivery network services provider, the outage was thanks to a software configuration update that triggered a bug in Akamai’s Edge DNS service. Akamai returned things to a normal state after rolling back the update.
Microsoft is no stranger to DNS outages making headlines, either. In 2019, Azure services were down for nearly two hours thanks to a nameserver delegation change affecting DNS resolution. The company admitted that, during a migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated.
In April of this year, Azure again went down for an hour due to DNS. This time, it was caused by a code defect in Azure’s DNS service that was uncovered after an unusual surge of DNS queries and excessive client retries.
In 2020, CloudFlare, a major provider of DNS services, had its 188.8.131.52 DNS service down for about 25 minutes due to a configuration error on a router located in Atlanta. The error caused all traffic across CloudFlare’s backbone to be sent to Atlanta, overriding any load balancing. All network locations connected to it failed.
Who wants to tout that someone made a mistake?
While there’s no official compendium of DNS outages, one thing is certain: They are likely underreported. After all, who wants to tout that their DNS went down because someone screwed up?
Sure, if it’s a DDoS attack, you’ll probably read about it. There’s no individual to blame and they hold a certain level of intrigue (who was behind it all?). Furthermore, they are useful cautionary tales about the importance of cybersecurity.
But human error causes most outages, which organizations sometimes don’t want to publicly report. They just provide a vague status update that they’re working on it.
Homegrown approaches put enterprises at greater risk for a DNS outage
Organizations that use BIND, Microsoft Active Directory, or some other kind of homegrown approach to managing DNS are at greater risk for DNS outages.
Why? Because there are far more opportunities for error.
With these types of network solutions, DNS servers are each configured individually, often with one or two DNS admins executing scripts. And that one person who knows how to configure all the DNS servers might be really great at what they do.
But what if they suddenly aren’t there to configure them anymore (known as the bus factor)? Who will? With BIND, Active Directory, or anything else homegrown, DNS administration can easily become a single point of failure.
Further, if something goes south as a result of an error, there is no tech support number or professional services team to call for BIND. It’s a long and manual process to determine where the problem lies and fix it.
BlueCat’s DNS platform reduces your DNS outage risk
BlueCat’s DNS platform provides a single pane of glass from which to view and centrally manage all your DNS activity. There is no need to individually update dozens of DNS servers one at a time with scripts. Instead, you can easily update them all at once from the same user interface.
Completing just one configuration significantly reduces the risk of basic human error. You can also expand the pool of network admins who can do configurations, spreading the workload and eliminating a single point of failure.
Should a server fail, BlueCat’s platform makes it much easier to either rebuild the server or replace the hardware, deploy the configuration, and get it back up and running quickly.
BlueCat often hears from customers when they have a DNS outage. BlueCat can work with you to get to the bottom of it and get your network back up and running.
Two levels of BlueCat support offer health checks to analyze customers’ system data for potential problems and fix them before they take networks down.
New features tame network complexity, reduce costs, improve security, and automate DDI tasks to drive rapid innovation.
Whether you’re a newbie or an expert, BlueCat training offers self-paced online learning, instructor-led training, and expert certification badges.
Learn how the Java-based Log4j2 logging vulnerability works, how severe it is, its potential effects on BlueCat products, and what has been done to fix it.