Seven best practices to keep your NTP resilient

When NTP, which synchronizes network clocks, gets off-kilter, DNS and other network disruptions follow. Keep your NTP in shape with BlueCat’s expert tips.

Many overlapping colorful stopwatches with a central silver stopwatch showing elapsed time
Key takeawaysThis key takeaway was generated through LLMs crawling the page and coming up with an overview of the content.

The article explains why Network Time Protocol (NTP) resiliency is critical to enterprise networks and describes seven best practices to prevent time-related failures that can disrupt DNS, DHCP, Active Directory, and security logging. It outlines real-world problems—unreliable security logs, failed DNS zone transfers, AD domain controller outages, DNSSEC loss—caused by clocks drifting when NTP is misconfigured or insufficiently redundant, and emphasizes that topology, latency, and consistent stratum architecture directly affect operational impact. Key outcomes include deploying multiple and geographically-considered NTP sources (at least four), avoiding exclusive reliance on the public NTP pool, preventing time loops, and planning site-level resilience and recovery procedures to minimize and remediate slow drift and large-scale outages.

Why does the article recommend having at least four NTP servers instead of just one or two?

The article recommends at least four NTP servers because a single server is a single point of failure, and two servers create the “two-clock problem” where it can be impossible to determine which source is correct if they disagree. Three servers can usually let NTP choose the probable correct time, but if one fails you can fall back to two and reintroduce ambiguity. Using four or more servers avoids this dilemma and provides resilience, though the article also notes the optimal number depends on network topology and geographic distribution to prevent partitioned two-clock situations.

What are the risks of relying exclusively on the public NTP pool for enterprise time synchronization?

Relying exclusively on the public NTP pool risks service-level and availability problems because the pool is a volunteer resource without enforceable SLAs or contractual guarantees. In enterprise environments with many DNS, DHCP, and Active Directory servers, using the pool as the majority of time sources can leave you exposed to internet outages or inconsistent performance. The article advises augmenting public servers with dedicated, local time sources (for example GPS or other high-precision devices) so the enterprise does not depend primarily on external volunteers for critical time synchronization.

What operational practices help detect and recover from NTP-related failures before they cause widespread outages?

The article highlights that NTP failures often manifest as slow clock drift rather than abrupt jumps, making monitoring and detection challenging. Best practices include designing a consistent stratum architecture across roles and sites, ensuring similar latencies between strata and servers, avoiding time loops, and providing site-level resilience so isolated locations can function autonomously. Importantly, organizations should plan recovery procedures in advance because cleanup from NTP failure is hard; the operational nightmare is not only losing time sync but having no clear disaster plan to restore consistent clocks across DNS, DHCP, and AD services.

What is the worst thing you can imagine happening to your network when NTP is off-kilter?

BlueCat recently asked a small group of IT professionals, who came up with a few troubling scenarios: Your security logs are suddenly unreliable. DNS zone transfers can’t occur. Active Directory domain controller services shut down. A sudden loss of DNSSEC.

When NTP is malfunctioning and system clocks are out of sync, everything stops.

Graphic with text that reads, "Proactively find and fix outage risks," with a button to click and learn how to enhance network health.

Network Time Protocol (NTP) is the protocol for synchronizing clocks between computers on a network. Designed by David L. Mills, NTP synchronizes computer systems to within a few milliseconds of Coordinated Universal Time (UTC). It uses a hierarchical system of time sources, with each level called a stratum. High-precision time devices like atomic clocks, GPS, or other radio clocks are stratum 0.

Example of an NTP server stratum hierarchy

NTP may be as old as Nintendo (happy 35th anniversary, Super Mario Brothers!), but it’s still just as crucial to a functioning network. But for all its importance, NTP is a detail easily overlooked or misconfigured.

A small group of IT professionals who are part of our open DDI and DNS expert conversations recently discussed these ideas. All are welcome to join Network VIP on Slack. Below, we’ll explore seven best practices to keep your NTP resilient and avoid network disruptions.

Seven best practices for NTP resiliency

1. Have at least four NTP servers

Each network system should have at least four NTP servers, and preferably more. How many, exactly, depends on your particular network infrastructure. There is no one correct answer.>

Just one is never a good idea—if it fails, you’re up the creek.

But even just two can create what is termed the two-clock problem. If one NTP server says 11:00 and one says 12:00, which is right? It’s rather difficult to tell. If you add a third, NTP can work out which is probably the right one.

However, if you’ve got three and then one of those fails or goes offline, you’re down to two again and back to the two-clock problem. Going to at least four NTP servers avoids this dilemma.

But even four might not be the right number. Say you have four GPS receivers that you’ve configured on all your DNS and DHCP servers and your IP address management tool (for BlueCat users, this would be your BDDSes and Address Manager). Two of them are in Europe and two of them are in North America.

What happens if your link between Europe and North America goes down? Then you have the two-clock problem in two places.

2. Consider your network layout

How to implement NTP is not a simple decision. You must consider the entirety of your network infrastructure to find that magic number and configuration for true NTP resilience.

You must be careful to not make NTP structure too deep, either. After stratum 1, you might have a distribution layer at stratum 2, and then a third distribution layer at stratum 3 before getting to end clients. But each layer gets further from the clock source and adds more uncertainty to the time.

That’s not to say three strata is fine and four is bad or four is fine and five is bad. It all depends on your network infrastructure.

3. Ensure stratum architecture is consistent

Whatever stratum hierarchy you have architected, be sure that is consistent across your enterprise. For example, if you have a BDDS at stratum 3, ensure that all your BDDSes are at stratum 3 at all locations. Machines performing the same role should not reside in different strata.

4. Consider site resilience

What happens if you’ve got a critical factory site that must survive and keep producing even if it’s isolated from the rest of your organization? That site has to be able to cope on its own if it’s totally isolated.

It’s all well and good having everything else redundant on that particular site. But if isolation occurs and it doesn’t have good time sources, things aren’t going to work very well for long.

5. Don’t use the NTP pool exclusively

The NTP pool is essentially an NTP server for everyone. It’s a collection of networking computers that volunteer to provide accurate time via NTP to clients across the world. Machines in the pool are part of the ntppool.org domain.

Without a doubt, the NTP pool project and public NTP servers are a great thing. They’re used by hundreds of millions of systems and especially helpful for average internet users.

But in an enterprise environment, you might a large number of DNS and DHCP servers that need to be synchronized and communicate with hundreds of Active Directory servers. Augment them with the pool, sure, but you should never have a majority—let alone all—of your NTP sources be from the pool.

These days, enterprises that take this approach might find themselves learning a hard penny-wise, pound-foolish lesson.

In the days when GPS receivers were pricey, it perhaps made more sense to rely on the NTP pool as a way to save money. The NTP pool, while a valuable resource, has no enforceable service-level agreement or contract with your organization. Additionally, should you have an internet outage, you could experience further internal issues with NTP.

6. When it comes to redundancy, latency matters

Each DNS and DHCP server on your network should have similar latency to an uplink. Likewise, all the latencies between your strata should follow a similar latency scale. That can often be achieved by placing servers in close geographical proximity, but don’t solely rely on it. Geographical proximity alone can be misleading, depending on your location. (For example, there is a much higher latency from Vancouver to Seattle than there is from Toronto to New Jersey. But the latter pair of locations is near quadruple the distance apart.)

It is more critical that all DNS servers are running on the same time rather than running on the right time. If all your DNS servers are off by four hours from UTC, that’s not a problem. But if all are off by four hours except one, that’s a problem. It’s crucial for DNS and DHCP server clocks to be in sync across your enterprise.

7. Avoid time loops

Time loops are when server A talks to server B, which talks to server C, which is then requesting time from server A again. Don’t create one.

NTP monitoring and troubleshooting challenges

Monitoring the health of NTP is hard. It isn’t as if your server clock will suddenly be 15 hours, 46 minutes, and 33 seconds off. When NTP starts to fail or you don’t have enough good reference clocks, your clocks will slowly drift. And, after a period of time, it will cross thresholds that unleash problems that give you a clue about the underlying cause. For example, if NTP drifts more than five minutes out, GSS-TSIG will fail. As one of BlueCat’s customers put it,

My consumer base all the time thinks that the time is the time, and shouldn’t everything automatically already know it, and can’t you just put a monitor on it that says when the time’s wrong? And when you get down into the actual weeds of it… Since NTP as a protocol and as a service is so dependent on your network topology and the traversal of that topology, all of the variances that they put into the math to figure out how to shift the clock, when to shift the clock, and how much to shift the clock, if you really look at that and put your math hat on, doing a monitoring thing for that, that is huge.

It’s also tough to clean up from NTP-related failures. Again, one of BlueCat’s customers said it best:

Your DNS is torqued, no doubt about it. But everything else stops. Really, the nightmare isn’t what happens when time is screwed up. It’s what is your disaster plan for how do you go about fixing it to get everything back up? That’s the nightmare. Not that you lost it—that’s bad. The nightmare is, how do you recover?


Published in:


An avatar of the author

Rebekah Taylor is a former journalist turned freelance writer and editor who has been translating technical speak into prose for more than two decades. Her first job in the early 2000s was at a small start-up called VMware. She holds degrees from Cornell University and Columbia University’s Graduate School of Journalism.

Related content

Close-up of interlocked metal chain links symbolizing connected network objects and relationships in IPAM

How to map your network with user-defined links in Integrity X

Map your network with user-defined links in Integrity X to define and manage custom relationships, such as dual-stack and NAT environments.

Read more
Flock of geese flying in formation across a blue sky, framed by a pink graphic border, symbolizing coordinated network migrat

Automate your DDI modernization path by migrating with Micetro

Automate cross-platform DNS and DHCP migration with Micetro to reduce risk, eliminate manual effort, and modernize infrastructure faster.

Read more
Three armored figures walking toward a futuristic Las Vegas skyline with pyramids, glowing orb, and "Welcome to Fabulous Las

Your journey to intelligent NetOps begins at Cisco Live

Visit BlueCat’s booth or book a meeting now to learn more about how our solutions can help you build a network that supports constant change.

Read more
Stacked colorful wooden directional arrows on a post by a calm seaside with distant hills and blue sky

Replace BIND and ISC with Micetro DNS/DHCP Server (MDDS)

Tired of patching and manually configuring BIND DNS and ISC DHCP? Discover how Micetro MDDS appliances can replace them for modern DDI.

Read more