Check Point Firewall Clusters Healthy Checklist
Do you need help to ensure future fail overs are smooth, even during peak times? Read our post to maintain your CPFW clusters healthy. Check it.
Notice: This blog post was originally published on Indeni before its acquisition by BlueCat.
The content reflects the expertise and perspectives of the Indeni team at the time of writing. While some references may be outdated, the insights remain valuable. For the latest updates and solutions, explore the rest of our blog
The article provides a practical checklist for diagnosing and preventing problematic Check Point firewall cluster failovers that can occur during peak-time or unexpected events. It outlines common causes—routing table differences, CoreXL/SecureXL and kernel parameter mismatches, lack of GARP support, poor sync performance, topology misconfiguration, clock mismatch, and hardware/software/license inconsistencies—and highlights operational impacts such as interrupted traffic and stressful recovery during outages. The piece recommends specific mitigations (NTP, VMAC via SK50840, disabling sync on short-lived connections via SK23695) and promotes using indeni to proactively detect and remediate these issues before they become critical.
What are the most common configuration causes of a problematic Check Point cluster failover according to the article?
The article identifies several frequent configuration causes of problematic cluster failovers: routing table differences between members (now less common when managed via SmartDashboard), mismatched CoreXL, SecureXL, fwkern.conf or other .conf/.def files, incorrect topology configuration (including multicast/broadcast settings checked via “cphaprob -a if”), clock mismatch (recommendation to use NTP), and hardware, software or license mismatches such as differing appliance models, software versions, hot fixes/HFAs, or installed licenses.
How can lack of GARP support and poor sync performance affect a failover, and what fixes does the article recommend?
Lack of GARP support can cause short outages (around 30 seconds) because surrounding network equipment may not respond to gratuitous ARPs from the newly active cluster member. The article recommends enabling VMAC per SK50840 to mitigate this. Poor synchronization performance—common on slow or congested sync networks, especially DC-to-DC links—can prevent cluster members from keeping state, worsening with more enabled blades. The article suggests disabling synchronization for short-lived connections such as HTTP following SK23695 to reduce sync load.
What operational approach does the article recommend to avoid outages and how can indeni help?
The article advocates proactive detection and remediation to avoid outages by identifying cluster issues before they become critical. It recommends routine checks for the listed causes (routing, kernel parameters, topology, clocks, hardware/software/license parity) and using indeni to automate identification of these and many more issues. According to the article, indeni installs in less than an hour, can pinpoint potential problems that lead to failovers, and the vendor offers support for installation and further assistance; readers are invited to download indeni or contact support and to fill out a form to learn more.
Each and every organization we work with goes through the trouble of setting up a cluster of firewalls in every single critical location in the network. The cluster is there to ensure that there is no single point of failure. It normally works very well – fail overs are smooth and traffic proceeds uninterrupted. However, sometimes, a fail over doesn’t go smoothly. If the fail over is performed intentionally during a maintenance window then it’s usually easy to revert. But what if the fail over occurs spontaneously during peak-time due to an issue such as a power outage in your primary data center?
Knowing that a bad cluster fail over during peak time is one of the most stressful situations you can be in, we took the time to prepare the check list below. Hopefully, it will help you to ensure future fail overs are smooth, even during peak times.
- Routing table differences – happens a lot less now that routing tables are controlled via the SmartDashboard, but one of the top causes of fail over issues we’ve seen.
- CoreXL, SecureXL, kernel parameter differences – check that the configurations of CoreXL, SecureXL, fwkern.conf and any other .conf or .def file you may have manually changed are the same across cluster members.
- Lack of GARP support – happens more often that you’d think. If your outage only lasts 30 seconds (or so), this is probably your issue. The network equipment around your firewalls isn’t listening to the gratuitous ARPs sent out by the newly active cluster member. You may want to enable VMAC by following SK50840.
- Poor sync performance – if your sync network is slow or congested, which is especially common in DC-to-DC sync networks, your cluster members may have trouble keeping up. This gets worse the more features/blades you enable. We recommend disabling sync on short-lived connections, like HTTP. Follow sk23695.
- Wrong configuration of topology – take a close look at the results of “cphaprob -a if” on all cluster members and make sure they are the same. Don’t forget to make sure the same *cast (multicast/broadcast) is used.
- Clock mismatch – make sure the clocks are the same on all cluster members. We highly recommend using NTP.
- Hardware, software, license mismatch – you’d think this never happens, but it does. Don’t overlook this check – make sure the appliances are of the same model, the software (including hot fixes and HFA) is the same and the licenses installed are the same.
You can you can use indeni to identify all of the above issues and hundreds more. For us, it’s all about avoiding outages by pin-pointing issues before they turn critical. It takes less than an hour to install (download now) and we’ll be happy to help you do it (contact our support).
Want to see what indeni can help you uncover in your Check Point firewalls?
If you want to learn more about how indeni can help your network management workflow and achieve high availability, just fill out the form below.