The Art and Science of Upgrading Infrastructure Services

How can upgrading a service infrastructure be an art AND a science?  I just click a button, the stuff upgrades, and I’m good… right?

Last updated on April 29, 2021.

How can upgrading a service infrastructure be an art AND a science?  I just click a button, the stuff upgrades, and I’m good… right?  We’re talking infrastructure services here, people.  If the infrastructure is unavailable, the business loses money.

If I asked the following question to 100 admins, “Do you enjoy testing software upgrades in the lab?” Exactly 100 admins would say, “NO”.  But guess what? Testing is more important than actually performing the upgrade.

What do I test?

  • The hardware and software you’re upgrading – You can’t test if you don’t have an environment. It doesn’t have to be a mirror image, but having similar hardware/software is needed, albeit in a reduced capacity.
  • Test matrix with success criteria – Having a matrix of what you’ve tested, and if it passed (simple acceptance criteria) is essential. It’s a big CYA (cover your behind) move, so if management asks, “Did you test XYZ?” You can say, “YES!” Using DNS, DHCP and IP Address Management for an example, your test matrix should include things like:

o    Upgrading of your DNS primary and DHCP server(s) from version A to version B

o    Testing a variety of services and devices, if available

o    Validating if any customizations still work, including API environments

  • Having an upgrade document – You’re testing the upgrade in your lab. You’re performing steps that will simply need to be repeated in production, why not document it? This ensures that nothing gets overlooked or forgotten during the production upgrade. And, you might be able to use this document to help support your change request.

What’s my upgrade strategy?
Alright, so you’ve tested the upgrade in the lab in your “spare” time and everything is good. Now do you upgrade with a slow roll, or do a fork-lift upgrade?

  • Slow-roll and segmentation of upgrades, if possible – Again, we’re talking about business-critical core services here. Doing a fork-lift upgrade and then having to revert a large portion of network infrastructure can be painstaking.  That said, maintenance windows for infrastructure services are hard to procure and it’s not always possible to slow-roll. If you have to do a fork-lift upgrade, it’s all the more important to test your upgrade meticulously beforehand.
  • Aligning resources – Most, if not all, enterprises have multiple data centers in multiple locations. It’s important that you’ve got hands and feet ready to hit the DC if problems arise. It’s also important to line up resources from your network teams, firewall/security teams, etc.
  • Go/no-go checkpoints – What happens if you’re a couple hours into your six-hour maintenance window and you know you simply won’t come close to completing your task? There’s no sense in completing more of the work when you’ll simply need to revert.

Planning for the worst
We know that nothing will go wrong with your upgrade, especially since you’re running BlueCat gear.  But, you still need to plan for the worst and ensure all your bases are covered – no one ever got in trouble for being prepared.

  • Backup resources – What if something unexpected happens during your 6-hour maintenance window and the network team forgot to inform you of a core router change, the network is down and you can’t validate your upgrade?  If it takes the network team hours to fix their problems, you’ll have been awake for 24+ hours.  People make mistakes when they’re tired.  Having a back-up resource available and up-to-speed is a good plan.
  • Roll-back – What about if you need to roll back? Do you need “boots on the ground” remotely?  How long will it take? Do you have the files at the ready if needed? Have you tested the roll-back? Does anything else need to happen after the roll-back has taken place?

Engage the upgrade ninjas
Upgrading infrastructure isn’t something that happens often – maybe once or twice a year. Engaging the BlueCat “upgrade ninjas” will help you navigate through a successful upgrade. Here at BlueCat, we’ve got a handful of teams – from Professional Services to our Technical Account Management teams – that have the field experience and solid, vetted methodologies to ensure a successful upgrade.

Alright, I’m upgraded.  Now what?
OK, your upgrade is done. No alerts have fired. Nothing seems to have blown-up. What’s next? VALIDATION! Everyone loves validation – you know, checking log files, running some tests, working with other operational teams, ensuring applications are up and running. When doing a slow-roll upgrade, having some burn-in time before your next major upgrade will be the ultimate validation that your upgrade has been successful.

I’ll leave you with a quote from an unnamed Samurai: “Cry in the dojo, laugh on the battlefield.”  It’s a mantra that the upgrade ninjas at BlueCat try to live by.

Read more

BlueCat Infrastructure Assurance

BlueCat Infrastructure Assurance provides automated DDI issue detection and insight into remediation to help proactively reduce network downtime.

Read more

BlueCat acquires Men&Mice and Indeni to add important capabilities in DDI orchestration and network infrastructure resiliency to improve visibility and management of networks.Learn more