The Art and Science of Upgrading Infrastructure Services

How can upgrading a service infrastructure be an art AND a science?  I just click a button, the stuff upgrades, and I’m good… right?


June 17, 2014

How can upgrading a service infrastructure be an art AND a science?  I just click a button, the stuff upgrades, and I’m good… right?  We’re talking infrastructure services here, people.  If the infrastructure is unavailable, the business loses money.

If I asked the following question to 100 admins, “Do you enjoy testing software upgrades in the lab?” Exactly 100 admins would say, “NO”.  But guess what? Testing is more important than actually performing the upgrade.

What do I test?

  • The hardware and software you’re upgrading – You can’t test if you don’t have an environment. It doesn’t have to be a mirror image, but having similar hardware/software is needed, albeit in a reduced capacity.
  • Test matrix with success criteria – Having a matrix of what you’ve tested, and if it passed (simple acceptance criteria) is essential. It’s a big CYA (cover your behind) move, so if management asks, “Did you test XYZ?” You can say, “YES!” Using DNS, DHCP and IP Address Management for an example, your test matrix should include things like:

o    Upgrading of your DNS primary and DHCP server(s) from version A to version B

o    Testing a variety of services and devices, if available

o    Validating if any customizations still work, including API environments

  • Having an upgrade document – You’re testing the upgrade in your lab. You’re performing steps that will simply need to be repeated in production, why not document it? This ensures that nothing gets overlooked or forgotten during the production upgrade. And, you might be able to use this document to help support your change request.

What’s my upgrade strategy?
Alright, so you’ve tested the upgrade in the lab in your “spare” time and everything is good. Now do you upgrade with a slow roll, or do a fork-lift upgrade?

  • Slow-roll and segmentation of upgrades, if possible – Again, we’re talking about business-critical core services here. Doing a fork-lift upgrade and then having to revert a large portion of network infrastructure can be painstaking.  That said, maintenance windows for infrastructure services are hard to procure and it’s not always possible to slow-roll. If you have to do a fork-lift upgrade, it’s all the more important to test your upgrade meticulously beforehand.
  • Aligning resources – Most, if not all, enterprises have multiple data centers in multiple locations. It’s important that you’ve got hands and feet ready to hit the DC if problems arise. It’s also important to line up resources from your network teams, firewall/security teams, etc.
  • Go/no-go checkpoints – What happens if you’re a couple hours into your six-hour maintenance window and you know you simply won’t come close to completing your task? There’s no sense in completing more of the work when you’ll simply need to revert.

Planning for the worst
We know that nothing will go wrong with your upgrade, especially since you’re running BlueCat gear.  But, you still need to plan for the worst and ensure all your bases are covered – no one ever got in trouble for being prepared.

  • Backup resources – What if something unexpected happens during your 6-hour maintenance window and the network team forgot to inform you of a core router change, the network is down and you can’t validate your upgrade?  If it takes the network team hours to fix their problems, you’ll have been awake for 24+ hours.  People make mistakes when they’re tired.  Having a back-up resource available and up-to-speed is a good plan.
  • Roll-back – What about if you need to roll back? Do you need “boots on the ground” remotely?  How long will it take? Do you have the files at the ready if needed? Have you tested the roll-back? Does anything else need to happen after the roll-back has taken place?

Engage the upgrade ninjas
Upgrading infrastructure isn’t something that happens often – maybe once or twice a year. Engaging the BlueCat “upgrade ninjas” will help you navigate through a successful upgrade. Here at BlueCat, we’ve got a handful of teams – from Professional Services to our Technical Account Management teams – that have the field experience and solid, vetted methodologies to ensure a successful upgrade.

Alright, I’m upgraded.  Now what?
OK, your upgrade is done. No alerts have fired. Nothing seems to have blown-up. What’s next? VALIDATION! Everyone loves validation – you know, checking log files, running some tests, working with other operational teams, ensuring applications are up and running. When doing a slow-roll upgrade, having some burn-in time before your next major upgrade will be the ultimate validation that your upgrade has been successful.

I’ll leave you with a quote from an unnamed Samurai: “Cry in the dojo, laugh on the battlefield.”  It’s a mantra that the upgrade ninjas at BlueCat try to live by.

Published in:

An avatar of the author

BlueCat is the Adaptive DNS company. The company’s mission is to help organizations deliver reliable and secure network access from any location and any network environment. To do this, BlueCat re-imagined DNS. The result – Adaptive DNS – is a dynamic, open, secure, scalable, and automated DDI management platform that supports the most challenging digital transformation initiatives, like adoption of hybrid cloud and rapid application development.

Related content

Detect anomalies and CVE risks with Infrastructure Assurance 8.4 

The Infrastructure Assurance 8.4 release features an anomaly detection engine for outliers and a CVE analysis engine to uncover device vulnerabilities.

Read more

Get fast, resilient, and flexible DDI management with Integrity 9.6

With Integrity 9.6, network admins can get support for new DNS record types, architect and configure multi-primary DNS, and automate IP assignments.

Read more

Deepen your security insight with Infrastructure Assurance 8.3

BlueCat Infrastructure Assurance 8.3, with an enhanced analytics dashboard, including interactive widgets and top 10 alerts, is now available.

Read more

Security, automation, cloud integration keys to DDI solution success

Only 40% of enterprises believe they are fully successful with their DDI solution. Learn how to find greater success with new research from EMA and BlueCat.

Read more

Our commitment to Micetro customers and product investment

From CEO Stephen Devito, a word on BlueCat’s ongoing commitment to supporting Micetro customers and Micetro’s evolution as a network management tool.

Read more

Seven reasons to rethink firewall monitoring and boost automation 

With BlueCat Infrastructure Assurance, you can better protect your network with automated alerts and suggested remedies for hidden issues in your firewalls.

Read more