For network automation decisions, metrics are key
Uber engineer Ryan Patterson shares how data drives network automation projects, which must also be scalable, save costs, and meet user needs.
When you’re implementing network automation, there’s a fundamental question that can be surprisingly tough to answer: What do we automate?
Here’s one word that might help: metrics.
Data from service tickets and time spent on day-to-day operations can illuminate where your resource hogs lie. You can hone in on what is worth automating to increase efficiency, free up resources, and better support your infrastructure.
For the fifth episode of the third season of the Network Disrupted podcast, Uber’s Ryan Patterson sat down with host and BlueCat Chief Strategy Officer Andrew Wertkin. Now a security engineer, Patterson was a systems engineer at Uber for nearly three years.
They chatted about how data underpins Uber’s decisions to implement IT automation. They also explored how IT implementations must aim for cost savings, account for growth, and recognize stakeholder needs. Finally, they touched on how Uber takes a layered approach to infrastructure support, whether it’s on-premises or in the cloud.
Data drives decisions for IT automation implementation
Uber is long removed from its rapid growth stage. The company is now focused on consolidation and automation to increase efficiency in its infrastructure.
But how do you know that you’ve become more efficient?
For the network team at Uber, it’s a data-driven approach. When launching a network automation project, one of the first steps Patterson’s team takes is to determine what kind of metrics they can use to measure their current state of work.
The first thing we always do is look at data.
“We first have to gather the metrics on what we’re doing and how much time are we investing currently into this process,” Patterson says. “We would go through our ticketing systems to see how many tickets we’re getting for something, how much time we’re investing into these individual tasks, and then use those metrics to see if it’s something that we should be investing our time in to automate or to streamline or to self-service.”
Automating DHCP reservations: An example of how metrics drive decisions
Patterson used the example of deciding whether to automate DHCP reservations. He says they would start by gathering their service tickets in Jira. By reviewing tickets, they can measure how much time they’ve invested into support requests for DHCP reservations.
Then, they would add in the hours for people performing DHCP-related day-to-day operational tasks.
Whether it’s DHCP reservations, Active Directory, or something else, metrics can illuminate the resource hogs. And those hogs might be potential opportunities for automation.
“We have these metrics that we gather that are saying, hey, we’re spending a lot of time doing group additions to Active Directory or DHCP reservations for BlueCat. And if we start seeing that, hey, 30 hours a week is being invested into these projects, what can we do from an automation point of view or a self-service point of view?” Patterson explains.
“If somebody needs to make a DHCP reservation, configure your system to allow them to do it for their VLANs that they’re working on or anything else like that. And offload that responsibility from your team so that your team can invest itself into other projects that are going on.”
The next step: How to automate
After identifying a potential automation opportunity, Wertkin noted that IT teams must take it a step further. They have to think through how to automate it. What platform and network automation tools will you use to execute it? And what information exists on that platform? Who is the end-user and what level of understanding do they have?
“I often see things like automation being measured just with man-hour savings. But in some cases, you’re just sort of pushing the complexity to somebody else,” Wertkin quips.
IT must aim for cost savings and account for growth
IT doesn’t generate revenue for profit-driven organizations. Its role is to be as efficient as possible to save the organization money instead.
“Our job is to step in and say, ‘How much money can we save the company by doing X, Y, and Z?’ We have to almost flip that theory of money-making on its head and do the opposite,” Patterson notes. “How much money through this project or this project can we save the company in the long run?”
In addition to potential cost savings, Patterson says, it’s also important to account for flexibility and growth. Sure, a new product or service may work for your enterprise now. But what if the company were to grow by 10 or 20 percent?
“Whether you have four offices or 200 across the world, you have to make sure you deploy systems that can scale easily,” he says.
Furthermore, IT teams must factor in how much of an investment in employee time a product or service will require. A service might only cost $100,000. However, it could take three people who earn much more than that in salary and benefits to manage it.
“Total cost of ownership—and then operation expenses, obviously—oftentimes dwarves the cost of the technology or the product or whatever you’ve implemented,” Wertkin adds.
The key to customer service is understanding stakeholder needs
Indeed, regardless of what self-service or automation tools are implemented, Patterson is focused on providing his customers—Uber’s internal users—with a world-class experience.
Much like Matt McComas uses adoption as a measure of success in automation, Patterson knows which teams use his services and what they’re trying to accomplish. And he constantly engages with them to understand how his services are consumed.
“When we’re doing project planning for the year, I always try to reach out and figure out what projects they’re working on so that I can adjust and have them included into whatever I’m working on as well,” Patterson says.
What a network team might want to accomplish in a given year might not actually benefit the stakeholders who need the services they provide.
“I try to base my projects on the work that I’m accomplishing on what my stakeholders need, not what I think I need,” he adds.
A layered approach to supporting cloud and on-premises infrastructure
When it comes to implementing infrastructure, Uber takes a layered approach that is agnostic to whether it’s in the cloud or on-premises.
The company’s platform team manages both on-premises and cloud-based infrastructure, all the way up to server hosting. Another team is focused on operating system configuration. Then another team owns services.
“Do you need DNS in the cloud? Do you need DNS on-prem? Do you need Active Directory? What do you need and we’ll find a way to use all this flexibility that we’ve built on the layers beneath in order to deploy what you need when you need it,” Patterson says.
This approach avoids the typical on-premises and cloud silos that afflict many large enterprises.
They also prevent single points of failure by both having service co-owners and cross-training.
“We don’t have one person that’s fully responsible for everything. We have the ability to go learn other services in our infrastructure,” Patterson says. “If I’m the sole owner of DDI within our infrastructure, that doesn’t mean I can’t take any vacation.”
To hear all of his thoughts, listen to Ryan Patterson’s full episode on the Network Disrupted podcast below. You can also catch Ryan in our Critical Conversation on Should you DIY your DDI?