Expert Review of Runbook Templates


Runbook templates are used by operations teams to automate routine maintenance and respond to system alerts and outages. Infrastructure is changing so rapidly, that it is difficult to keep documentation up to date. To improve incident response times and reduce errors in the troubleshooting process it is critical to have operating steps documented. Before you can gather the information, it is important to have a solid template as a starting point. What background information is important to include in a runbook? What is a must have vs. nice to have? We asked our community of certified IT professionals for their review of free runbook templates. Check out what they said:

Templates

  • THWACK member
  • Skeleton Thatcher
  • Indeni

Runbook Template #1 by THWACK

What I like about it:

Tells you in plain english what the issue is

  • Description of the problem
  • What the symptoms are
  • What the recovery process is
  • Provides links to review it in the related operation tool dashboard

What’s missing

  • How was the issue uncovered, what commands did the tool use?
  • How major of an issue is this?
  • What could the issue be related to?

Download it here

Runbook Template #2 by Skelton Thatcher Consulting

What I like about it

Provides background and contextual information about the system or service affected

  • Background
    • What is the system or service
    • What part of the business is impacted
    • What are the expectations for availability, performance and our SLAs
      • Expected traffic and load
      • Required resources
      • Security and access control
      • How security validation on ongoing basis
      • How system configuration is managed
      • Which parts of the system are backed up
    • Tools
      • What tools are available to help operate the system?
      • What significant metrics will be generated?
      • How does the system report its own health?
      • Does it perform routine and sanity checks?
  • Contextual
    • What are the contributing applications, daemons, services, middleware
    • Infrastructure and network design – What servers, containers, schedulers, devices, vLANs, firewalls, etc. are needed?
      • Differences between Production/Live and other environments

Tells you how to resolve the issue

  • Restore procedures
  • Operational instructions – Deployment, Batch processing
  • How to perform maintenance tasks such as patching, daylight-saving time changes, Data clear down, Log rotation
  • Failover and Recovery procedures – What needs to happen when parts of the system are failed over to standby systems? What needs to during recovery?

What’s missing

  • When there is an issue, what commands we’re using by those tools to identify it?

Download it here

Runbook Template #3 by Indeni

What our community likes about it:

  • Tells you in plain english what the issue is
    • Description of the problem
    • What the symptoms are
    • What the recovery or remediation process is
  • Provides visibility into the commands that are used
    • What metrics does it inspect
    • What are the rules, or thresholds that caused the notification to be generated
    • Tells you how else you could of found the problem
  • Are written in collaboration between engineering, IT operations and a subject matter expert from the Indeni Crowd Community.
    • Scripts are continuously updated

Download Template

In Summary

Great runbook templates must include three things

  1. Written in collaboration between the subject matter expert and IT operations
  2. Are written for humans, and machines
    1. Provide readable summaries of the issue that has occurred, or about to occur.
    2. Simple instructions to resolve the problem
    3. Give visibility into the commands used so that it can be:
      1. Edited by an individual
      2. automated by a machine
  3. Are continuously kept up to date

Interested in automating runbook tasks?

Download Indeni and connect to up to 5 devices for free when you engage in the Indeni Crowd.

If you found this article to be helpful, please share with your social networks using the buttons above. If you have feedback or other best practices you use please comment below. Thanks!

BlueCat to acquire LiveAction

BlueCat adds LiveAction’s network observability and intelligence platform, which helps large enterprises optimize the performance, resiliency, and security of their networks.