The Art and Science of Upgrading Infrastructure Services

How can upgrading a service infrastructure be an art AND a science?  I just click a button, the stuff upgrades, and I’m good… right?

How can upgrading a service infrastructure be an art AND a science?  I just click a button, the stuff upgrades, and I’m good… right?  We’re talking infrastructure services here, people.  If the infrastructure is unavailable, the business loses money.

If I asked the following question to 100 admins, “Do you enjoy testing software upgrades in the lab?” Exactly 100 admins would say, “NO”.  But guess what? Testing is more important than actually performing the upgrade.

What do I test?

  • The hardware and software you’re upgrading – You can’t test if you don’t have an environment. It doesn’t have to be a mirror image, but having similar hardware/software is needed, albeit in a reduced capacity.
  • Test matrix with success criteria – Having a matrix of what you’ve tested, and if it passed (simple acceptance criteria) is essential. It’s a big CYA (cover your behind) move, so if management asks, “Did you test XYZ?” You can say, “YES!” Using DNS, DHCP and IP Address Management for an example, your test matrix should include things like:

o    Upgrading of your DNS primary and DHCP server(s) from version A to version B

o    Testing a variety of services and devices, if available

o    Validating if any customizations still work, including API environments

  • Having an upgrade document – You’re testing the upgrade in your lab. You’re performing steps that will simply need to be repeated in production, why not document it? This ensures that nothing gets overlooked or forgotten during the production upgrade. And, you might be able to use this document to help support your change request.

What’s my upgrade strategy?
Alright, so you’ve tested the upgrade in the lab in your “spare” time and everything is good. Now do you upgrade with a slow roll, or do a fork-lift upgrade?

  • Slow-roll and segmentation of upgrades, if possible – Again, we’re talking about business-critical core services here. Doing a fork-lift upgrade and then having to revert a large portion of network infrastructure can be painstaking.  That said, maintenance windows for infrastructure services are hard to procure and it’s not always possible to slow-roll. If you have to do a fork-lift upgrade, it’s all the more important to test your upgrade meticulously beforehand.
  • Aligning resources – Most, if not all, enterprises have multiple data centers in multiple locations. It’s important that you’ve got hands and feet ready to hit the DC if problems arise. It’s also important to line up resources from your network teams, firewall/security teams, etc.
  • Go/no-go checkpoints – What happens if you’re a couple hours into your six-hour maintenance window and you know you simply won’t come close to completing your task? There’s no sense in completing more of the work when you’ll simply need to revert.

Planning for the worst
We know that nothing will go wrong with your upgrade, especially since you’re running BlueCat gear.  But, you still need to plan for the worst and ensure all your bases are covered – no one ever got in trouble for being prepared.

  • Backup resources – What if something unexpected happens during your 6-hour maintenance window and the network team forgot to inform you of a core router change, the network is down and you can’t validate your upgrade?  If it takes the network team hours to fix their problems, you’ll have been awake for 24+ hours.  People make mistakes when they’re tired.  Having a back-up resource available and up-to-speed is a good plan.
  • Roll-back – What about if you need to roll back? Do you need “boots on the ground” remotely?  How long will it take? Do you have the files at the ready if needed? Have you tested the roll-back? Does anything else need to happen after the roll-back has taken place?

Engage the upgrade ninjas
Upgrading infrastructure isn’t something that happens often – maybe once or twice a year. Engaging the BlueCat “upgrade ninjas” will help you navigate through a successful upgrade. Here at BlueCat, we’ve got a handful of teams – from Professional Services to our Technical Account Management teams – that have the field experience and solid, vetted methodologies to ensure a successful upgrade.

Alright, I’m upgraded.  Now what?
OK, your upgrade is done. No alerts have fired. Nothing seems to have blown-up. What’s next? VALIDATION! Everyone loves validation – you know, checking log files, running some tests, working with other operational teams, ensuring applications are up and running. When doing a slow-roll upgrade, having some burn-in time before your next major upgrade will be the ultimate validation that your upgrade has been successful.

I’ll leave you with a quote from an unnamed Samurai: “Cry in the dojo, laugh on the battlefield.”  It’s a mantra that the upgrade ninjas at BlueCat try to live by.


An avatar of the author

BlueCat provides core services and solutions that help our customers and their teams deliver change-ready networks. With BlueCat, organizations can build reliable, secure, and agile mission-critical networks that can support transformation initiatives such as cloud adoption and automation. BlueCat’s growing portfolio includes services and solutions for automated and unified DDI management, network security, multicloud management, and network observability and health.

Related content

Micetro 11.1 boosts DHCP management for Cisco Meraki SD-WAN

Learn how BlueCat Micetro 11.1 can help you overcome the limitations of Cisco Meraki SD-WAN devices to manage your distributed DHCP architecture.

Read more
Banner announcing BlueCat's acquisition of LiveAction, displaying both logos and the phrase "We're about to get bigger."

BlueCat acquires LiveAction to drive network modernization and optimization

BlueCat’s acquisition of LiveAction will allow customers to expand their view beyond DNS and dive deeper into the health of their network.

Read more

Simplify NIS2 compliance with DNS management

Learn whether the EU’s NIS2 requirements apply to your organization and about how DNS management and BlueCat can boost your path to compliance.

Read more

Detect anomalies and CVE risks with Infrastructure Assurance 8.4 

The Infrastructure Assurance 8.4 release features an anomaly detection engine for outliers and a CVE analysis engine to uncover device vulnerabilities.

Read more

Unlock the secrets to modernizing your IT network! Join our webinar on January 23 to learn how self-service DNS and DHCP can help you solve the cloud puzzle.