How to monitor F5 devices – SNMP vs API vs SSH
F5 has many ways of interfacing with their products and when writing monitoring we had to do some research which one is more suitable in terms of performance. After all, monitoring should not harm the device it monitors. When choosing methods we looked into iControl REST, SNMP and TMSH. See below for how this test was conducted and which one won.
The best way to monitor F5 – How the test was conducted
We ran each type ~20 minutes continuously through command-runner. While running the tests the web interface was used to make sure that the web interface responsiveness was up to par.
The commands to run each test
#REST
while true; do
command-runner.sh full-command –basic-authentication user,password rest-pool-statistics.ind 10.10.10.10
done
#tmsh
while true; do
command-runner.sh full-command –ssh user,password ./show-ltm-pool-detail-raw-recursive.ind 10.10.10.10
done
#SNMP
while true; do
command-runner.sh full-command –ssh user,password ./snmp-pool-statistics.ind 10.10.10.10
done
Results
The test started out with 283 pools (with 200 additional ones created just for this test). However, when trying the tmsh command, command-runner timed out, so we had to reduce to the original 83 pools and rerun the test using rest to make it fair.
- Test 1: REST = 283 pools
- Test 2: Tmsh = 83 pools
- Test 3: SNMP = 83 pools
- Test 4: REST (take 2) = 83 pools
4 hour graph
24 hour graph for reference
REST
- Did not produce any timeouts in the GUI in any of the two tests.
- Always produced results.
- Management interface only became sluggish one time during the second attempt. Most likely because of the already high swap usage created by the TMSH tests.
TMSH
TMSH produced these once in awhile:
- When that happened you can see the gaps in the graph. It is unknown what the gap after the graph was because we was working on the snmp metrics at that time.
- TMSH also failed to give results sometimes.
- Forced to run with fewer metrics than rest in order to even get a result.
SNMP
- Truncated the pool names sometimes. It is unclear why ast was always done on long names, but different lengths.
- Did not produce any timeouts in the GUI.
- Always produced results.
- Did not have as many metrics as REST since the exact same metrics was not available in one command (pool state and availability is missing).
- Management interface became a bit sluggish on and off.
Conclusion
Over all REST won the test with SNMP as second. TMSH did not even qualify as it takes up very large amounts of memory and swap which negatively affected the overall system.
Thank you to Patrik Jonsson for contributing this article.