Case Study: DNS data identifies network performance issues

The Domain Name System (DNS) is a powerful tool for enhancing visibility into all aspects of a network.  At an individual query level, DNS records are a strong indicator of user intent – useful for tracking anomalous behavior back to the source device or IP address.  In aggregate, DNS traffic paints a picture of how well a network is operating – useful for an overall assessment of network security and performance.

One of BlueCat’s large enterprise customers recently discovered how powerful the DNS protocol can be in identifying and mitigating large-scale network performance problems.

It all started with a noticeable lag in performance for users of the company’s virtual desktop infrastructure (VDI).  Certain subnets with newer workstation images were facing particular connectivity problems.  These subnets usually had between 1,000-3,000 active VDI clients.

DNS data provides critical clues

Looking into the DNS request data collected by BlueCat’s intelligent security system, a clear pattern started to emerge.  The problematic subnets showed exceedingly high NXDOMAIN volumes, indicating that something wasn’t resolving correctly.  At the same time, the subnets also showed a large amount of anomalous PTR (reverse lookup) activity.

The PTR activity all had the same timestamp, indicating a simultaneous barrage of reverse lookups from across the network.

With BlueCat, users can easily adjust the search command to pull up relevant logs. With one click, pivot into the DNS insights and analytics tab for a graphical view of those logs. Integrate easily with Splunk with our free Splunk app to drill deeper into query patterns.

These data points were then triangulated against network utilization information for various applications.  It emerged that the local firewall was the largest consumer of DNS on the network.  Looking at packets from workstations impacted by performance issues, a large number of Link-Local Multicast Name Resolution (LLMNR) queries were also identified.

Identifying root causes

High utilization of DNS by the local firewall in combination with a surge of LLMNR queries finally allowed the team to piece together the issue.  Here’s what was happening:

A newer version of a workstation image contained a firewall setting that enabled reverse name look-ups on connections.  That same image had LLMNR and NetBios enabled.

For an inbound connection, the firewall would attempt to perform a reverse name look-up through a PTR query.  Those queries failed due to a Windows registration issue – the clients were not consistently registering reverse records.  The DNS result was an NXDOMAIN.

When the lookup failed, the client would send out an LLMNR broadcast to all other clients on the subnet.  Those clients would then perform PTR queries on the same record, producing the same NXDOMAIN result.

The firewall kept producing PTR and LLMNR queries across the network in an increasing cascade.  There were so many lookups that network performance began to degrade – hence the connectivity issues faced by VDI clients in a growing number of subnets.

Solving the problem and monitoring results

Once they discovered the source of the issue, the team turned off the firewall’s reverse lookup function.  Returning to the DNS Edge console, the team saw results in real time.  That simple switch immediately improved network performance to the tune of around 5,000 queries per second – around half of all network queries!  The change quickly restored VDI connectivity and dramatically reduced the strain on core network infrastructure.  The spike in PTR and NXDOMAIN queries vanished.

In this case, the granular, client-level DNS data from DNS Edge provided a critical clue which allowed the network team to identify the source of performance issues.  At a basic level, this shows the core value of collecting and analyzing DNS data.  Without this information at hand, the team may have gone down the wrong path in attempting to mitigate VDI performance issues.  They would have never expected DNS as a source of the problem.  They probably would have pursued other (wrong) root causes instead.

This also highlights the value of a DNS security system which can be deployed at the source of queries.  Externally-facing DNS firewalls would not have detected the PTR and NXDOMAIN queries, as they never made it to the network boundary.  Only by looking deeper into the network was the team able to discover the critical information needed to rectify the core issue.

Learn more about the security gold just sitting on your DNS servers, and how it can be used for root cause investigations, in our video about reducing incident response time.


An avatar of the author

BlueCat provides core services and solutions that help our customers and their teams deliver change-ready networks. With BlueCat, organizations can build reliable, secure, and agile mission-critical networks that can support transformation initiatives such as cloud adoption and automation. BlueCat’s growing portfolio includes services and solutions for automated and unified DDI management, network security, multicloud management, and network observability and health.

Related content

Micetro 11.1 boosts DHCP management for Cisco Meraki SD-WAN

Learn how BlueCat Micetro 11.1 can help you overcome the limitations of Cisco Meraki SD-WAN devices to manage your distributed DHCP architecture.

Read more
Banner announcing BlueCat's acquisition of LiveAction, displaying both logos and the phrase "We're about to get bigger."

BlueCat acquires LiveAction to drive network modernization and optimization

BlueCat’s acquisition of LiveAction will allow customers to expand their view beyond DNS and dive deeper into the health of their network.

Read more

Simplify NIS2 compliance with DNS management

Learn whether the EU’s NIS2 requirements apply to your organization and about how DNS management and BlueCat can boost your path to compliance.

Read more

Detect anomalies and CVE risks with Infrastructure Assurance 8.4 

The Infrastructure Assurance 8.4 release features an anomaly detection engine for outliers and a CVE analysis engine to uncover device vulnerabilities.

Read more

Unlock the secrets to modernizing your IT network! Join our webinar on January 23 to learn how self-service DNS and DHCP can help you solve the cloud puzzle.