Modern Network Monitoring Goes Deep

IT team reviews code on dual monitors while troubleshooting network performance and monitoring alerts

Notice: This blog post was originally published on Indeni before its acquisition by BlueCat.

The content reflects the expertise and perspectives of the Indeni team at the time of writing. While some references may be outdated, the insights remain valuable. For the latest updates and solutions, explore the rest of our blog

Long Outages Prevented By Prediction and Simple-to-Understand Fixes

Old-school monitoring provides network administrators simple metrics and indications that a problem exists at a particular device or location in a network infrastructure. But without insight into precise problems and their causes, difficult-to-diagnose issues can quickly turn into costly outages, inaccessibility, or downtime.

By contrast, proactive infrastructure monitoring solutions tap into device APIs for statistics and to read configurations, arriving at deeper insights than simple metrics make possible. These solutions also consider and validate the best solutions provided by a group of practitioners, using troubleshooting fixes crowdsourced from industry experts to provide administrators with easy-to-follow directions that will correct the issue.

In other words, unlike the limited and siloed information conventional monitoring tools give IT pros, modern proactive monitoring solutions can consume large amounts of data from across the enterprise’s infrastructure. It can be turned into accurate projections about when a problem is likely to occur, giving administrators clear insights about the best course of action to correct it—all in simple-to-understand human language.

Here’s a look at the various infrastructure components that can cause headaches for your team if they’re not carefully monitored and properly managed.

Switches

Networks all start with switches. One easy way to understand what a switch does is to compare its role in a network with a router. Switches create a network; routers connect networks.

Many mysterious network problems involve switches. One of the most common causes is when a change is made to a switch without saving configurations. A problem might have caused a switch to crash, resulting in a core dump and reboot. Whenever a switch is power cycled, any unsaved configurations causing the switch to be out of sync, and problems are likely to ensue.

Routers

Routers connect disparate networks. Advanced infrastructure health tools can scour routing paths calculated by different protocols for potential interruptions or vulnerabilities. Advanced monitoring can even determine when vendor-recommended configurations may be the unlikely (or at least unforeseen) cause of problems for many users. While many router-related issues are standard across the board, there can also problems at the router level that are brand-specific.

For example, Cisco routers enable a feature called Proxy ARP by default, in which a router with the Proxy ARP feature enabled will reply to any broadcast with its own MAC address. Clients that try to communicate with devices outside the local network will be sent to the router that then forwards the traffic.

Despite the potential benefits of such a feature, it’s also fraught with risk. Any device can be reached by sending an ARP request, which may increase the amount of ARP traffic on your network. That would make it harder to detect ARP spoofing, since an attacker could easily hide behind the MAC address of the router or switch.

Conventional monitoring tools won’t uncover the problem until it’s too late. Proactive solutions, however, are able to scour your entire environment to identify potentially problematic or vulnerable routers and recommend disabling or otherwise re-configuring the devices through a vetted, crowd-sourced script with a step-by-step guide for how to apply it.

Load Balancers

A load balancer takes a request for a resource (such as a server) and directs it to one of the available systems according to a load-balancing policy. These policies can be based on simple round-robin rotation or on which server has the lowest system load to ensure network and application stability as well as an overall optimized performance.

Load balancing requires checking which systems are available and making sure to spread user requests intelligently across multiple servers. It also ensures that requests from a user who already has a session will go to the same server (otherwise that session’s work will never be completed).

Load balanced systems may also be clustered, allowing even sessions in progress to continue in the event of a hardware failure. However, one of the difficult-to-diagnose problems involving a load balancer might come as the result in email failure. For example, despite having backup and failover email servers, it could take hours of investigation to discover that the secondary member was not configured for the mail VLAN.

Firewalls

Firewalls run on a machine, real or virtual. A number of issues including memory leaks can cause a core dump and reset. Other difficult-to-diagnose problems might result from a myriad of firewall misconfigurations that can degrade performance or create security vulnerability.

Servers

Beyond the kind of problem discussed in load balancing, servers may be involved in a number of other hidden issues. Often, those challenges are related in some manner to certification. For example, a server’s attempts to authenticate a device asking for its resources may fail if a Certificate Authority is not included in the VLAN configuration.

Network Storage

Storage shared across the network and used by databases can fail for a number of reasons, creating huge and costly outages that take your team offline for prolonged periods of time. In relational databases (DB), for example, reserve memory called a cursor for any ongoing database transaction. Occasionally, an abundance of database cursors may have been opened, draining memory and causing the DB to time out.

End-To-End Proactive Network Monitoring

Today’s IT infrastructures are complex, sprawling entities that require time and attention to maximize performance and optimize health. Conventional monitoring tools are limited in scope and capability, usually revealing issues in the environment after they’ve become problems in need of an immediate and reactive response.

In contrast, the best modern monitoring solutions look at all network components in combination and simplify or even automate resolution of the many mysterious network problems and outages that occur–even if such a problem is the result of something as simple as a configuration or introduction of a brand new network device feature.

From routers and switches, to load balancing and storage optimization, proactive monitoring systems periodically check configurations to detect issues like clock drift or the presence of core dumps in real time, while offering simple remediation steps before firewall issues create a security issue and result in downtime.

By analyzing hundreds of device statistics in real-time and combining them with insights and expertise from thousands of the world’s leading IT professionals, Indeni proactive monitoring solutions help find errors before they become problems, reduce enterprise downtime, and free administrators to focus more heavily on higher-value activities and strategies.

Join Indeni Crowd, a community for next-generation IT professionals

Key takeawaysThis key takeaway was generated through LLMs crawling the page and coming up with an overview of the content.

The article explains how modern proactive infrastructure monitoring prevents long outages by going beyond traditional metric-based tools to ingest device APIs, configurations, and crowd-sourced troubleshooting. It describes common failure sources across core components—switches, routers, load balancers, firewalls, servers, and network storage—and shows how proactive systems detect conditions like unsaved switch configs, vendor-feature pitfalls (e.g., Proxy ARP on routers), misconfigured load balancer members, firewall core dumps, certificate/VLAN authentication failures, and excessive DB cursors. The outcome is earlier detection, clear step-by-step remediations drawn from practitioner-vetted scripts, reduced downtime, and more time for administrators to focus on higher-value tasks.

How do proactive monitoring solutions find problems that traditional monitoring misses?

Proactive monitoring solutions tap device APIs and read configurations rather than relying solely on simple metrics. They consume large amounts of data across the enterprise—periodically checking configurations, device statistics, and real-time conditions such as clock drift or core dumps—to derive deeper insights and accurate projections about when issues are likely to occur. They also validate recommended fixes using troubleshooting steps crowdsourced from many practitioners, producing vetted, easy-to-follow remediation guidance that reduces the need for reactive investigation after a failure is already visible.

What are common switch and router problems that can lead to outages?

Switch problems often stem from unsaved configuration changes: a crash or power cycle can reboot a switch and leave it out of sync if configurations weren’t saved, triggering network issues. Router problems include both standard routing interruptions and brand-specific behaviors; for example, Cisco’s default Proxy ARP can increase ARP traffic and make ARP spoofing harder to detect because devices reply using the router’s MAC. Advanced monitoring can scour routing paths, identify potentially problematic or vulnerable routers, and recommend disabling or reconfiguring risky features with practitioner-vetted instructions.

How can proactive monitoring help with load balancer, firewall, server, and storage issues?

For load balancers, proactive tools verify membership and VLAN configuration to catch cases where secondary servers aren’t reachable (e.g., mail VLAN not configured), ensuring session persistence and failover work as intended. For firewalls, they detect conditions like memory leaks or core dumps and flag misconfigurations that could degrade performance or create vulnerabilities. For servers, the tools can surface authentication problems tied to certificate authorities or VLAN settings. For networked storage and databases, they identify resource drains such as excessive open DB cursors that exhaust memory and cause timeouts, enabling remediation before prolonged outages occur.

Related content

Close-up of interlocked metal chain links symbolizing connected network objects and relationships in IPAM

How to map your network with user-defined links in Integrity X

Map your network with user-defined links in Integrity X to define and manage custom relationships, such as dual-stack and NAT environments.

Read more
Flock of geese flying in formation across a blue sky, framed by a pink graphic border, symbolizing coordinated network migrat

Automate your DDI modernization path by migrating with Micetro

Automate cross-platform DNS and DHCP migration with Micetro to reduce risk, eliminate manual effort, and modernize infrastructure faster.

Read more
Three armored figures walking toward a futuristic Las Vegas skyline with pyramids, glowing orb, and "Welcome to Fabulous Las

Your journey to intelligent NetOps begins at Cisco Live

Visit BlueCat’s booth or book a meeting now to learn more about how our solutions can help you build a network that supports constant change.

Read more
Stacked colorful wooden directional arrows on a post by a calm seaside with distant hills and blue sky

Replace BIND and ISC with Micetro DNS/DHCP Server (MDDS)

Tired of patching and manually configuring BIND DNS and ISC DHCP? Discover how Micetro MDDS appliances can replace them for modern DDI.

Read more