Case Study: DNS data identifies network performance issues


May 9, 2019

The Domain Name System (DNS) is a powerful tool for enhancing visibility into all aspects of a network.  At an individual query level, DNS records are a strong indicator of user intent – useful for tracking anomalous behavior back to the source device or IP address.  In aggregate, DNS traffic paints a picture of how well a network is operating – useful for an overall assessment of network security and performance.

One of BlueCat’s large enterprise customers recently discovered how powerful the DNS protocol can be in identifying and mitigating large-scale network performance problems.

It all started with a noticeable lag in performance for users of the company’s virtual desktop infrastructure (VDI).  Certain subnets with newer workstation images were facing particular connectivity problems.  These subnets usually had between 1,000-3,000 active VDI clients.

DNS data provides critical clues

Looking into the DNS request data collected by BlueCat’s intelligent security system, a clear pattern started to emerge.  The problematic subnets showed exceedingly high NXDOMAIN volumes, indicating that something wasn’t resolving correctly.  At the same time, the subnets also showed a large amount of anomalous PTR (reverse lookup) activity.

The PTR activity all had the same timestamp, indicating a simultaneous barrage of reverse lookups from across the network.

With BlueCat, users can easily adjust the search command to pull up relevant logs. With one click, pivot into the DNS insights and analytics tab for a graphical view of those logs. Integrate easily with Splunk with our free Splunk app to drill deeper into query patterns.

These data points were then triangulated against network utilization information for various applications.  It emerged that the local firewall was the largest consumer of DNS on the network.  Looking at packets from workstations impacted by performance issues, a large number of Link-Local Multicast Name Resolution (LLMNR) queries were also identified.

Identifying root causes

High utilization of DNS by the local firewall in combination with a surge of LLMNR queries finally allowed the team to piece together the issue.  Here’s what was happening:

A newer version of a workstation image contained a firewall setting that enabled reverse name look-ups on connections.  That same image had LLMNR and NetBios enabled.

For an inbound connection, the firewall would attempt to perform a reverse name look-up through a PTR query.  Those queries failed due to a Windows registration issue – the clients were not consistently registering reverse records.  The DNS result was an NXDOMAIN.

When the lookup failed, the client would send out an LLMNR broadcast to all other clients on the subnet.  Those clients would then perform PTR queries on the same record, producing the same NXDOMAIN result.

The firewall kept producing PTR and LLMNR queries across the network in an increasing cascade.  There were so many lookups that network performance began to degrade – hence the connectivity issues faced by VDI clients in a growing number of subnets.

Solving the problem and monitoring results

Once they discovered the source of the issue, the team turned off the firewall’s reverse lookup function.  Returning to the DNS Edge console, the team saw results in real time.  That simple switch immediately improved network performance to the tune of around 5,000 queries per second – around half of all network queries!  The change quickly restored VDI connectivity and dramatically reduced the strain on core network infrastructure.  The spike in PTR and NXDOMAIN queries vanished.

In this case, the granular, client-level DNS data from DNS Edge provided a critical clue which allowed the network team to identify the source of performance issues.  At a basic level, this shows the core value of collecting and analyzing DNS data.  Without this information at hand, the team may have gone down the wrong path in attempting to mitigate VDI performance issues.  They would have never expected DNS as a source of the problem.  They probably would have pursued other (wrong) root causes instead.

This also highlights the value of a DNS security system which can be deployed at the source of queries.  Externally-facing DNS firewalls would not have detected the PTR and NXDOMAIN queries, as they never made it to the network boundary.  Only by looking deeper into the network was the team able to discover the critical information needed to rectify the core issue.

Learn more about the security gold just sitting on your DNS servers, and how it can be used for root cause investigations, in our video about reducing incident response time.

Published in:

An avatar of the author

BlueCat is the Adaptive DNS company. The company’s mission is to help organizations deliver reliable and secure network access from any location and any network environment. To do this, BlueCat re-imagined DNS. The result – Adaptive DNS – is a dynamic, open, secure, scalable, and automated DDI management platform that supports the most challenging digital transformation initiatives, like adoption of hybrid cloud and rapid application development.

Related content

Get fast, resilient, and flexible DDI management with Integrity 9.6

With Integrity 9.6, network admins can get support for new DNS record types, architect and configure multi-primary DNS, and automate IP assignments.

Read more

Deepen your security insight with Infrastructure Assurance 8.3

BlueCat Infrastructure Assurance 8.3, with an enhanced analytics dashboard, including interactive widgets and top 10 alerts, is now available.

Read more

Security, automation, cloud integration keys to DDI solution success

Only 40% of enterprises believe they are fully successful with their DDI solution. Learn how to find greater success with new research from EMA and BlueCat.

Read more

Our commitment to Micetro customers and product investment

From CEO Stephen Devito, a word on BlueCat’s ongoing commitment to supporting Micetro customers and Micetro’s evolution as a network management tool.

Read more

Seven reasons to rethink firewall monitoring and boost automation 

With BlueCat Infrastructure Assurance, you can better protect your network with automated alerts and suggested remedies for hidden issues in your firewalls.

Read more

Five ways to avert issues with BlueCat Infrastructure Assurance

By flagging and notifying you of hidden issues before they cause damage, you can go from reactive to proactive in your Integrity DDI environment.

Read more