Why the network industry is struggling to fully adopt network automation

00:00

0.5
1
1.25
1.5
1.75
2

This is a podcast episode titled, Why the network industry is struggling to fully adopt network automation. The summary for this episode is: <p>In this episode of Telemetry Now, Peter Sprygada from Itential joins us to talk about why the networking industry has been so slow to adopt network automation and how observability and automation go hand-in-hand.</p>

Transcript

Philip Gervasi: I got to attend AutoCon 0 recently, which if you're not familiar, is a new event by the Network Automation Forum, and what an awesome event it was. I mean, how much better can it get than hundreds of network nerds getting together to talk shop? Now, the actual event was about network automation, as the name implies, and specifically about why the industry wasn't farther along with it, but there were actually a couple themes that emerged as well. So with me today is Peter Sprygada, Head of Product at Itential, who noticed some of those same themes popping up in the main sessions but also in those side conversations too. Now, I can't promise that we're not going to talk about AI, so get ready. My name is Philip Gervasi, and this is Telemetry Now. So Peter, thanks for joining today. It's really great to have you on, and it was great to meet you for the first time a few days ago, and like I said in the intro, talk shop a little bit about what we were seeing at AutoCon. So thanks very much for joining.

Peter Sprygada: Yeah, hey, Philip. Thanks for having me. I'm super excited to be here. Anytime I can sit around and just talk tech is a good day.

Philip Gervasi: Yeah, it really is. It really is. I mean, I try to do that at home, and then eyes start to glaze over and people start to exit the room conspicuously. It doesn't really work well. So then I have to resort to the online forums, and social media and stuff, which is like a love- hate relationship these days anyway. Yeah, you know what I mean? So you got to tell me before we get moving, because when I heard your name but didn't see the spelling, I'm like, " Okay, Peter Sprygada, that must be Italian." But then I saw the spelling, I'm like, " That's not Italian." So what is your name heritage?

Peter Sprygada: It's amazing how many people do think it's Italian. I can always ask the question, " Have you ever heard of Sprygada pizza?" No, of course not. Now, the spelling of my name has changed over the years. It was originally spelled S- Z- P- R- Y yada, yada. And so I tell people that like, " Oh yeah, that's definitely Polish." So I am of Polish heritage.

Philip Gervasi: Okay, great, great. The premise of the event, maybe I'm talking out of turn here, but from what I understand, the premise of the event was, why aren't we farther along with network automation today in the networking industry specifically, right? What's funny is I was talking to one of the organizers, Scott Raban, and I said, " Hey, Scott, one of the things that you guys should do in the sessions or something is talk about why we aren't farther along in network automation." He looked at me, he's like, " That's what this whole thing is about." So he gently me. I mean I sort of agree with that. Thinking about it, and I mean hence why I brought it up to him, why aren't we farther along? And that's one of the things that I wanted to talk to you about because well, frankly, I mean your company, Itential, is eyeball deep in automation and programmability. In fact, that's why Kentik, my company, is partnering with you, kind of the convergence of observability, programmability into that closed loop workflow that network engineers all dream about. So we can spark an event, see what's going on, understand it, actually push some sort of configuration, validate it, and then get back to keeping the lights on. So why do you think we're not farther along with network automation today?

Peter Sprygada: Yeah, I tell you, you're asking the million dollar question. There is no doubt about that. Even the sessions that we saw at AutoCon was definitely one of the undertones of I think every presentation, no matter what the presentation was on, is why aren't we further? And I can actually trace my network automation routes all the way back to 2012. I've been doing this now for 12, 13 years. I mean, that's just focusing on network automation, and it is frustrating at times to look back and wonder why we aren't further. But to answer the question, it really comes down to a couple of things. First and foremost, I think it comes down to fear, fear of what automation really means. It's new, it's different, it's scary, it's fast, it's aggressive. And networking professionals, I think generally speaking are conservative by nature. Without question. The other one, and this one's a little bit more tongue in cheek to a degree, but I think there is some basis in it, is that many individuals that got into the networking industry started out in school, and they took a CS101 class, their first programming class and realized, " Hell no, I don't want to do this for the rest of my life." And they move into networking, and lo and behold, here we are X number of years later and now we're saying, " Okay, yeah, you got to become a programmer." I'm like, " Wait a second time out. What's going on here?"

Philip Gervasi: Yeah, I think there's a couple of misconceptions out there too. So I think the fear, I mean some of it's legitimate, but some of it's unfounded, like this idea of oh, fail fast, fail often, and we're all going to be hipster network engineers and the chaos monkey thing, well, that's not necessarily true. But if you say that to a network manager or IT manager, whatever, at a major hospital, they're like, " No, we can tolerate exactly zero risk. We don't fail fast fail often in a medical facility." But that's not the idea of fail fast, fail often, and it's not literally go around breaking everything and then let's see how we can improve. That's not the idea. The idea is to be more proactive with trying things. That's in the context of rollback plans, and in the context of carefully planned projects and things like that. And I think that might be kind of mixed into this incorrect understanding of what network automation is about. So I mean, you can automate the collection of information from a closet of switches. Okay, why should there be fear there? There's no real impact. And then on top of that, it also kind of misses the point that the network is so impactful to application delivery, to the actual end user, that if I just do a no shut on the wrong interface, there goes my job. So you don't need network automation to have that fear. I did enough cutovers where I had plenty of fear being an engineer, without automation having to do it. You know what I mean?

Peter Sprygada: It's funny you mentioned that because as someone who has done no shut on the wrong interface, I appreciate that very much. There's a couple of very interesting subtle points there. I think one of them is that you talk about this idea of automation, and process and the two things going hand in hand. And one of the fear factors I think we're starting to realize with a lot of teams is the fear stems from, and you kind of touched on it, really stems from the fact that they don't have the right operational practices and culture and policies, and those things in place and automation starts to expose that. For so long in the networking industry we've been able to overcome that by sheer will, right? The CLI command line interface hero, who can jump on the box and hammer out the commands at lightning speed, has really masked a lot of the problems that we've got with the way that we approach managing network infrastructure today.

Philip Gervasi: Yeah, there actually was quite a bit of discussion about culture. In hindsight, I should have expected I guess because that does pop up. The DevOps definition is largely about culture, and I heard one person talking about CAMS. So there is that component, that network automation is incorrectly viewed as well. " I got a couple of scripts, I need to learn Python," and yeah, there are scripts and Python involved, possibly Python, with network automation. But the idea is bigger than that, it's the workflows and processes, but I don't want to go too far and then just kind of equate network automation and orchestration, one of the main speakers there called it out. There was somebody from the New York Times and I actually became friendly with him and his name escapes me at the moment, but he spoke and it was great, and he sort of made the distinction between a collection of scripts and pushing out scripts, which is basically like the CLI at scale. So it wasn't that compelling. And then orchestration. So Peter, how would you personally define the difference?

Peter Sprygada: To put it in the simplest definition that I can, because I mean we could probably write paragraphs if not pages, if not books on this topic right here. But I think to put it in its most simplest form, automation is really all about automating tasks. It's all about task management, it's all about automating tasks. And orchestration is all about the coordination of all of those different automation functions to achieve a business outcome. And so that's how I would phrase it. But I would back up one moment, and I do want to challenge one thing here, and that is, we tend to take the stance that scripts is not automation, and I disagree with that to a degree. Scripts is still automation. One of the things that prevents the industry from moving forward is we don't have an end state and goal in mind. It's okay if all automation means to you is writing a 10 line script that puts a description on an interface, maybe that's all you need and that's okay. It just kind of feeds in that fear factor that if I want to do automation, I have to be a software engineer, and I have to have inaudible and I have to have CICD pipelines. Those are all good things, but automation is really a mindset in terms of how you want to operationalize your infrastructure. I never want to lose sight of that focus.

Philip Gervasi: That makes sense. In speaking to I guess what you would call medium- sized enterprise engineers, and I use that term medium- sized the way Cisco does, because they in their documentation or the textbooks over the years, I'm studying for a certification, and they'll call a medium- sized business and then they'll equate it to a 10, 000 person organization. I don't know where they get these things from. It's crazy. I'm like medium- sized, okay, whatever with a billion dollar IT budget. So when I say medium- sized, I'm talking about some organizations with several thousand people. So in my world that still there can be some serious complexity and sophistication there, but not an unlimited budget like a web scale company, or a huge global service provider. So what you have other than the fear factor, so maybe there isn't fear, but maybe there's just limited resources and now you have engineers that are like, " I've been doing this for 18 years and my goodness, I don't have the time to learn this stuff, and I don't have the resources to go and implement this because I am busy keeping the lights on." Which ironically, a gentleman there on the panel, Carl Newell made the comment, " You don't need to keep the lights on if they're automated." I'm misquoting him. He said something a little bit differently, but that was the spirit of what he said.

Peter Sprygada: It was a great quote though, it really was.

Philip Gervasi: But ultimately that is a problem as well. And speaking to some enterprise engineers, like one gentleman who's a senior network manager or engineer at a healthcare facility here in the northeast, it was just like, " I love this stuff. There's no way I'm able to just start tomorrow or when I get back to work because I got a hundred other things to do." So I think that's a problem too, the reality of limited resources in those smaller and then quote unquote, you can't see my air quotes, but quote unquote medium- sized organizations.

Peter Sprygada: Yeah, I mean it goes to that premise that automation, while it's all about doing repetitive tasks or letting software do repetitive tasks that take away valuable time from network engineers, from doing actual engineering work. Network automation, even automation in general needs to become a state of mind. It needs to become a way of thinking about how we do things, and that's what starts to drive some of those behavioral changes, those cultural changes that are necessary. To expand on that point there was another presentation that I thought was really interesting, and I forget who was doing it, but he had put a slide up on the screen that really resonated with me, and he had three boxes and he talked about engineering layer, and he talked about process, and he talked about management. The point he was making is that all of those pieces have to care. It can't just be one engineer off in the corridor driving an automation strategy. It really needs to be something that is embraced by the organization from the top down, the separation of automation, orchestration, understanding their place in the world, and then start to layer on additional capabilities, like observability if you will, on top of those things to start to create that closed loop where we start to achieve some of those outcomes we're looking for.

Philip Gervasi: Yeah, absolutely. And what would you say are some of those outcomes then? So I'm going to preface my question by saying, I don't necessarily believe and agree with this idea that the purpose of network automation is to automate away the mundane tasks, so I can do the ones that I'm more interested in. I don't really buy that because I've been an enterprise network engineer for most of my career, and I'm going to work, I like what I do, I loved being an engineer. It's not that I wasn't breaking rocks in the field under the sun with a chain around my ankle, but certainly it was still a job. I put a meme out there recently where it's like, " Network automation helps me to automate a way all the mundane work so I have more time to do other mundane work." I'm still at work, so I don't buy that as the real main compelling reason that a network engineer should adopt network automation, from a technical perspective, and then as you said, from a cultural mindset. So what do you say is the goal or are the goals of network automation?

Peter Sprygada: If there's one thing we can all agree on, no matter where you're coming from in the networking spaces, the stuff is getting more complex by the day. We're a long, long ways away from the three tier architecture, way back in the day of inaudible and Edge, and we could keep the entire mental map of the network in our head and knew everything that needed to be done at any given time. So when we start to think about all of the different domains that are coming into play now and all of the things we have to do, and it's not done the same way anymore. The way I manage my SD- WAN infrastructure is not the way I'm managing my cloud network infrastructure, is not the same way I'm still managing some of my traditional network infrastructure. So I think this is where network automation really can start to play out and offer huge value to an organization, because it allows you to offload some of that knowledge into your platforms, and let your platforms take on that responsibility so that you can continue to focus on what I think you got into the industry to do. And that is to engineer networks.

Philip Gervasi: But what you just said, correct me if I'm wrong, is that I can use a network automation workflow and whatever that means from a technical perspective, to unify the operations of divergent systems. I mean that sounds like a really heavy lift. I have one vendor A, SD- WAN, vendor B, data center networking, vendor C on my campus, I'm doing a rip and replace of my wireless, so I got a vendor D now in the mix just on wireless. That sounds like a really heavy lift. How does an engineer even get started and wrap their mind around that?

Peter Sprygada: It's a huge lift, no question. I think you just hit at the crux of the separation of automation and orchestration, and this is a lot of what we talked about at AutoCon is, I can automate my domain, and I should automate my domain where it makes sense for me to automate my domain. It's understanding that there is a layer above that, a horizontal layer across the organization that allows me to orchestrate across those domains. And actually now when I get to there, and it takes time, let's be honest, this is not something that's happening in a day, a week, a month, in six months. This is a committed direction for an organization. But when I get there, we start to see, I think for the first time, real attachment of business value to what's happening at the network layer. And I think that is such a key statement to make, because for so long we struggled to understand what's happening at the network layer translate to the business and add value to the business, as opposed to just simply being a drain in a cost center.

Philip Gervasi: That makes sense. I would say add value to the business, sure, and then you get your approval from your management folks. But for the folks that are turning wrenches both virtual and physical, they need to see some value as well, and they're not going to necessarily be as concerned about the business value, but they're be concerned about what am I getting for all of this time and effort that I'm putting into it? " Oh, I can push config faster," or" I can get more information into my IPAM faster automatically. Okay, that's cool, but I got three or four engineers working on my team. We could do that pretty quick. I have to spend three weeks learning this new tooling in order to do that a little quicker. Who cares?" I think that's actually one of the reasons that there isn't as much adoption is that a lot of engineers don't see as much value in just pushing some config, unless it's at huge scale. I think that the value is going to start to be seen more readily and clearly when we start to see these systems, whether they be in one huge system or the partnerships like Kentik and Itential, or even huge homegrown systems where you start to have this direction toward an automated root cause analysis. Maybe one day the pie in the sky for engineers has always been automated remediation. Who knows if we'll ever get there, but where we're actually looking at the state of the network, whatever that means, that's an entire series of podcasts unto itself, taking direction from what we see from events and alerts, things like that, and then making adjustments to configuration, actual devices, virtual, physical, ephemeral, into configuration, and then validating that everything's working properly. The way I see it is, at the end of the day, this thing that we love that we call the network, which is completely distributed and all wacky today, is ultimately just the substrate. It's the mechanism by which we deliver applications to human beings, or computer to computer, but nevertheless, it's data, application data. That's really what it is. So I want to know what's the state of my network? What's the issue with that application delivery or whatever it happens to be? Push some config automatically and then tell me if that worked, and if the application's performing the way it should. That's very, very compelling to me, and that's where I want to see things go.

Peter Sprygada: I think it is, and there's no doubt that you can start to attach real value here, but here's where I see the challenge, and I would love to kind of peel this one back a little bit, is that when I sit down with network engineers, whether they're starting their career or they're 30 years into their career, it is surprising to me, and I'm overgeneralizing, so let me start... let me make that statement, but it's surprising to me how many network engineers don't take their heads far enough out of the bits of the network to understand the systems that are running on top of them, so they understand how to add some of that value. It's like I can build the network, I can troubleshoot the network, I can optimize the network, I can do a lot of that, but I don't really know what's going on at layers above me and I don't care. And I think that that's one of the flawed lines of thinking that many network engineers have. They really need to start to understand, what are the needs and requirements of the lines of businesses they're providing infrastructure for.

Philip Gervasi: Yeah, I mean that's what Network Observability does specifically, is put all of the telemetry that we derive from the network, network adjacent services and devices and components, and it's very, very diverse and divergent and voluminous, and we put that into the context of application delivery, first and foremost. There's other context, we look at costs and stuff, but ultimately it comes down to what is the state of this system and how is it impacting the application delivery and ultimately an end user's experience? And then when there's a problem, let's push some config. Now another issue is that I over the years have heard folks, but mostly vendors, and mostly on the blogs and podcast say, " Well, I don't trust the system though, so I just want a big red button that says, let me review, send to my cab board and then push the config. Sure, you can push it automatically. It'll run all these cool playbooks and stuff. Awesome. I want the final say," and the funny thing is that the more actual in the trenches operators I speak to, especially as what Cisco calls the medium size and smaller enterprises, I don't think that's really an issue. I've spoken to folks that work at large organizations, universities, huge school districts here in the northeast, near New York City, so very big, healthcare, and they're like, " Wait a minute, so I got a problem with my wireless and whatever it happens to be, and you're telling me that the system will throw an alert, let me know and automatically push config and then fix it?" " Yeah." " Take my money." I get it. If there's a gigantic... I've had all sorts of complex overlays in my data center, I need to be involved. I get it. There is an element of risk aversion in certain scenarios, but I don't think that operators are struggling as much with trusting the system as I think some would suggest.

Peter Sprygada: Yeah, I would agree. It's not necessarily about trusting the system. I think it's, at least my experience, it's more about trusting the external factors that they can't control.

Philip Gervasi: Oh, yeah, that's true.

Peter Sprygada: That's really where I think a lot of those trust issues come into play. I also think it's very easy in hindsight in retrospectives to be able to look back and say, " Oh, we should have been able to either see that coming. We should have seen that outage coming. We should have been able to easily automate this thing or that thing from happening." And I think we forget sometimes that it's a lot harder to anticipate some of these things than it is to look at them in hindsight, and make decisions about how we could have done better. That's one of the, I think one of the real big challenges that networking teams have, because they're constantly in that reactive mode, and they're not proactively thinking about, " How do I make my environment more bulletproof? How do I make it self remediable?" If that's a word, it's one thing to self remediate, but it's another thing to build an architecture or design or an infrastructure that is self remediable.

Philip Gervasi: Yeah, that lends itself to the ability to remediate itself if and when you finally apply those types of workflows and stuff. But I do agree with the bulletproof thing that you said, both from experience and from years working with colleagues who have that opinion that the primary goal for a lot of engineers, maybe some of our listeners disagree, I don't know, but in my experience, the primary goal for most network engineers is a reliable, stable, predictable network. Very rarely do I hear performance as the top goal. They're like, " Yeah, whatever. I got a crap ton of bandwidth. We're good. Yeah, of course if there's a latency problem, we'll fix it. We'll fix it." But as they say, latency is the new outage. Reliability is the key, instability, that's one of the things that I've heard year after year after year. And then from my own experience again as the primary goal. And I think that once folks, engineers start to adopt the culture and mindset of network automation, and then of course the subsequent technical components as well, they're going to find that that's actually going to help them reach that goal of reliability and stability. It's not just about pushing config, which is cool unto itself. " Oh, look how much I was able to change all these prefixes, my BGB prefixes, and whatever, for all these devices at once," and that's cool, and there's a fulfillment and being able to do something efficiently. But ultimately wrapped around all of that is a more reliable overall infrastructure. And I think we got to remember that when we say network and network infrastructure, we're talking about a huge variety of devices, different vendors, some of which we kind of like lightly manage. It's some cloud construct that we have limited access to. Maybe it's just campus boxes that are still physical. I have console cables and a console server, who knows? So there's an incredible amount of complexity, not to mention the services that are important to the network like DNS and load balancers and maybe your IPAM and things like that. So there's a lot of stuff going on there that can lend itself to unreliability, that can lend itself to instability. So I think the adoption of this culture mindset and the technical components of network automation is going to help us get us there, to a more reliable and stable network. And that's even in networks that aren't hundreds of thousands of end users around the world. Just in your 10, 000 person organization.

Peter Sprygada: Your medium sized enterprises.

Philip Gervasi: And I use the word just with air quotes because the reality is I was a VAR engineer for many years. There's a lot of complexity in those multiple data centers-

Peter Sprygada: For sure.

Philip Gervasi: Mission critical wireless, the largest hospital in my area that owns a bunch of... they all consolidate, they have a bunch of hospitals, they have like 18, 000 end users, and that's the largest medical system in my area. That's 18, 000. So Cisco, that's an SMB. Yeah, but they're like operating rooms and heart transplants. I don't know if they do heart transplants, but you know what I'm trying to say, helicopters landing on roofs. This is all mission critical serious stuff here.

Peter Sprygada: It is. It is, absolutely. And one of the things that disappointed me coming out of some of the sessions at AutoCon, and not just there, from customers as well, is this idea that here we are in almost 2024, we're at the end of 2023, and my goodness, we're still talking about how to get to closed loop automation. And the fact that we still haven't gotten to the point of understanding that it's one thing, and I think you touched on it, right? It's one thing to configure a network, and you're right, there is a lot of satisfaction in seeing the BGP peer come up, yay. And seeing my routes exchange, all right, I'm excited, but it can't end there, and it is ending there. And that's disappointing. It really needs to take that next step, because we have to be able to have a sense of what the infrastructure is doing. Otherwise there's no way we're ever going to get to some of the promised lands that we all want to get to, whatever that may be. Whether that's self- driving networks, I think it was discussed at AutoCon, or self- healing networks or whatever self et cetera you want to put there. In addition, it also starts to lend itself to understanding, once we can make this stuff almost second nature, it allows us to start looking at what is the next thing we can layer on top of the network infrastructure. We're not there yet, but I'm hopeful this is the beginnings of us being able to start to move in that direction.

Philip Gervasi: Oh yeah, we're definitely moving in that direction. I agree. Even if it is just in terms of awareness, right? When we saw it ourselves, there were almost 400 people at AutoCon, and many of them because there were several questions where they said, raise your hand if you're doing this or raise your hand if you're doing that. And the surprising reality is that a lot of people aren't doing that much stuff, but they were there. They were there, they spent real money to be there, whether their company spent it or they spent it, they spent money to be there, and move forward on their own network automation journey, whether it was from zero, the starting line or progress from wherever they were. So I'm encouraged, and the fact is that I saw there were a lot of vendors there, of course not a lot, but there were a handful of vendors there that were sponsoring the event, made it happen. And so going around those tables and talking to folks like you guys at Itential and some other companies, and really understanding how there's a whole community that's built around this, there's an entire ecosystem of stuff online, and people, what a small world it is too, and I think the resources are there to move forward as well. But one of the things that I heard in a couple presentations, you and I chatted about it, a little bit different from network automation, and that what we've been talking about is the introduction of AI, artificial intelligence, and maybe more specifically LLMs and how it can or maybe is fundamentally changing IT operations. Now that is a can of worms that I just opened. They're all over the place now, so let's try to pick them up.

Peter Sprygada: I was just thinking, so we're going to go there, huh?

Philip Gervasi: It came up, and that is the big thing to talk about these days. And the thing is, I work in marketing now. My title is Director of Technical Evangelism, but I have been a network engineer for many, many years. I've been on the side of the table where marketing or sales came in, and did a little song and dance, and I wasn't interested. Wanted to know the hard facts and details and show me the PCAP. Let me see how this thing works and solves my problem. And so when I heard things back in the day about SDN and some other different technologies over the course of the years, a lot of it was smoke and mirrors and eye rolls. But I am seeing actual technology with regard to LLMs and AI being integrated with various systems, not even necessarily networking, but tech related and tech adjacent systems, and actually have a positive impact. And so I'm wondering if this is, sure it's on the Gartner hype cycle, we can track it there, the trough of disillusionment, maybe that's where we are, but I actually think that there's something here very much so. I mean, what is your opinion? Do you think this is all smoke and mirrors or is there some value to AI and LLM specifically in the realm of IT operations?

Peter Sprygada: Yeah, it's a fascinating time to be in this industry. There is no question about it. And when you step back and just look at it, if I take off my engineer hat and my product hat and all the different hats that I wear, and I just look at it as a fan of technology, it's exciting to see what's going on. I think that there is applicability to it. The question that immediately comes up in my mind though is, and if I think about the state of networking today, and in fact you even touched on it in a previous comment about trust issues, is the network, and are we at a point where we're even ready to have this discussion? There is no question that there are forward thinkers out there, and there are some interesting applications or applicability of LLMs to networking on the periphery, but is it going to really start to make its way into mainstream? I'm really struggling with that at the moment. I'm not saying it's not, but I'm fearful that we're getting out way over the tips of our skis, now, if I could use the old analogy there with this.

Philip Gervasi: I'm going to disagree with you, Peter on this one, because I'm thinking in terms of networking, so maybe it's not as applicable in other areas, but I'm a network person, so this is where my brain goes. I think number one, if we're talking about it, it's not too early to talk about it because we're talking about it. That's a circular argument, but we're talking about it. It's a technology, it's a technology, it's a real technology. We can talk about natural language processing and engrams and stochastic reasoning and grammatological reasoning. These things are real and exist, and they exist in math. We can look at the algorithms, now are they relevant or are we looking over... what is it... over the tips of our skis? What did you say?

Peter Sprygada: Out over the tips of our skis.

Philip Gervasi: By the way, that was the first time I ever heard that and I'm going to-

Peter Sprygada: Oh, really?

Philip Gervasi: Yeah. But I disagree that I think it will soon become very impactful to IT operations, and is already beginning to. I'm thinking specifically with the incredible volume and diversity of data that we ingest from network devices, network adjacent devices, and those services that we rely on, and of course visibility, observability companies. The amount of information and the types of information and the variety of formats coming into our underlying UDR is tremendous. So how does a human being engineer figure that out an interface with that data? So let's take AI out of it, but just applying data analysis workflows and some of the statistical analysis algorithms that we learned as sophomores in college. So nothing fancier than that, is still going to be a great aid in moving forward, in understanding how these data points relate to each other. So for example, I have millions of packets per second coming in, and over here I have a flow data, so I get that. And over here I have some metric from my DNS server. So these are all very different. So we normalize them and transform, that's all machine learning pre- processing. I can look at inaudible, all that really cool stuff, but that's still difficult. And I think LLMs, especially as they get more sophisticated within the domain of IT and networking, right? Because that's going to take time to reduce hallucinations and increase accuracy. We can interface with that data more quickly. And something I've said on many podcasts before is, if I had a team of 30 engineers that all had PhDs from MIT, I could probably get away with that, and they can do the work and figure out all these answers. But even then it would be slower than a machine could do it. So I think like anything else, it's iterative, I do believe that as we experiment with using this technology in networking, and in my case observability, and then of course the fine tune and adjust. And I use that term fine tune specifically because that is something that we do with LLMs, right? We fine tune the temperature, for example, so we can reduce hallucinations and other things, that I think there's going to be significant value for an engineer just saying, " What the heck is going on? What is this data telling me?" And then using natural language to do it. I don't know how soon that's going to be, but Peter, I say this all the time, I don't know if you're a Star Trek person like me, but I love how Commander Geordi La Forge or Lieutenant Commander, depending on which episode, he talks to the enterprise computer and he's like, " Computer, what's the deal with the EPS conduit?" Whatever he said. And the computer does its thing and gives him an answer. And it doesn't mean that he isn't an engineer anymore. He still has to solve the problem. But I think that's one of the ways that we're going to see some improvements in IT operations. I could be wrong.

Peter Sprygada: It would be great if we did. I think one of the things though that leads me down this path of skepticism is when you think about ... and I'm talking about networking specifically now within the broader domain of it, but when you think about networking, there is still a lot of networking in terms of the design, the realization of the design, the implementation. That is still a lot more art than it is science. And that's where I start to become concerned about, sure, you can give me a long enough runway, I'm sure we can train LLMs to be able to understand the numbers of permutations in someone's head for making sure that this app continues to run or that app continues to run. The question is, is it worth the effort to get there? I don't know yet. And that's a lot of the thoughts that I'm having in terms of how this is going to start to realize itself in the infrastructure. You listen to some of the big thinkers in our industry, and around AI and ML, and we got to hear from one of them at AutoCon, inaudible Capella, and I think that you listen to the way they present it, and you walk away thinking it makes perfect sense, but then you get back to your desk and you sit down and say, " How do I apply this to my world today?" And that gap is so big still.

Philip Gervasi: Yeah, true.

Peter Sprygada: That you can become lost in this journey to the point where you turn yourself in circles enough times that you end up doing nothing. That's where I become worried about how is this going to move forward.

Philip Gervasi: Yeah, that's a good point. I do have some friends, I'm not going to call them colleagues, they don't work in networking, but I have some friends, one in particular, a PhD from I think Texas, whatever, he's from Texas, but he works for a... I don't want to say the company's name, but it's a large global company with over 300,000 people, multiple BUs. One of their BUs is oil and gas, another one is nuclear, healthcare, things like that. And they use ML, which is the technical component of the broader term AI, to find correlation and predict medical issues, specifically looking at, what are they called, MRI scans and data based on who you are, your age, location, race, things like that. And they use that to find strong and weak correlation, causal relationship, and it's literally a mathematical correlation coefficient. So there's a use case and it's very, very valuable to them. So I think the technology is actually already there. How do we integrate this into what we're doing in IT? We have ephemeral information, containers that live for a short time, or maybe the IP address lives for a short time. And here's the thing that I agree with you, the whole art form thing. There is a subjective component here, like the end user experience, is this application a little bit too slow or is this webpage or the example I've used in the past is, should I care about this alert? Should the system send this alert to an engineer? So if I have two 100 gig links in a data center and they're an active standby, which I think is poor design, you want to use them both. Let's say you have 200 gig links and you have one of them being utilized as you're active and your standby is dribbling along at literally one meg per day as an average, and then it spikes to 10 megs. That's a statistically huge increase, but absolutely irrelevant to application performance in the end user. Well, there's a subjective component there. An engineer would say, " I don't care. Don't worry about it." How do you add subjectivity into literal math that lives in Python, in Jupiter notebooks, and how do you do that? So I think the whole art form thing is a challenge, but we're solving that too. I mean maybe we don't alert on it, but maybe if we see it go from 10 to 20 and then 20 to 30, day by day we see a trend and then we alert. I don't know the answer, but I do agree with you. There's a lot more complexity. The cool factor, it makes sense. How do I actually apply it and get value? That's a little harder, especially for an engineer that's sitting there literally trying to keep these lights on. And when I say lights, I mean a little green, hopefully all green lights on their switches, right?

Peter Sprygada: That's absolutely right. Absolutely right. If engineers are struggling today to adopt automation because they don't want to write 30 lines of Python, getting them to AIMO is going to be a challenge.

Philip Gervasi: I don't see them adopting it. Right now they're going to go learn Python. You're going to go take a course at Coursera or learn Python the hard way, whatever, you're going to learn Python. Maybe you start with Terraform, that's easy, whatever it happens to be. So those are... certainly, there's effort, but it's not like you're going out and writing algorithms and applying ML models and you're not doing that. I have a feeling that that's going to be as a service in the sense that organizations, vendors, networking vendors, observability vendors, automation vendors are going to integrate that within their own systems, and offer it as part of their overall platform offering.

Peter Sprygada: That I do agree. Yeah, that I do absolutely agree. I don't foresee organizations, at least, again, I'm overgeneralizing, but a majority of organizations embarking on a ML strategy that is unique to them. It's going to come from products, it's going to come from services, it's going to come from their vendors.

Philip Gervasi: Yeah. Which is tough because you got to train a model on your own network. I think it was Greg Furrow who always likes to say that every network is a snowflake, right? In the sense that 80% of all the networks are the same. It's TCP, IP, switchers, routers, hubs, hubs. Did I just say hubs?

Peter Sprygada: Wow.

Philip Gervasi: It just came out of my mouth.

Peter Sprygada: What year are we?

Philip Gervasi: I don't know. We're going back in time, hopefully not hubs in your network, but 80% of all the networks are the same. But then you always have that 20% where I have the pets versus cattle thing. And oh, I remember that switch when I installed it and I got some really interesting PDR and some custom route maps that I lovingly came up with. And so when you have that 20%, that's different network to network, that implies that sure, you have models that you can apply in the general, but then for your own specific network, you need to train that LLM to be able to answer questions relevant to your network. And then underneath that LLM, because remember the LLM is just the NLP section, that's just the natural language interface. You need something that's able to actually look at the data and find the correlation and do the fancy cool stuff. So I mean, what do I go out and buy a product and I'll have it sit there and train it from my own network for a month, or many months? I don't know. I don't know. There's a lot of questions there. A lot of questions.

Peter Sprygada: There is. There absolutely is. As skeptical as I am. Again, there's that technologist in me though, it is still exciting to see and think through what the possibilities are. And I think that's maybe the most important takeaway, at least for me around AI and what it means long- term for the networking industry, is that we can start to see some of the possibilities and as we continue to work to get our houses in order around people, process and tools, we can start to potentially explore what some of these utopian visions could be for how we build and operate infrastructure.

Philip Gervasi: Or Dystopian, depending on how the AI thing goes.

Peter Sprygada: Indeed, indeed.

Philip Gervasi: Yeah. No, but I think it made sense that that conversation was there at AutoCon though, in a conference or event, all about network automation, they kind of go hand in hand. I mean, AI is this concept of a closed loop workflow with auto remediation and intelligence within the system, beyond just an intelligent control plane rerouting my traffic, but looking at correlation and making a decision based on rule sets and all these things. It makes sense that it's in a conversation around network automation, that is a workflow of looking at events and then pushing config to adjust events. We're just adding that intelligence over the top at some point. So yeah, I think it made a lot of sense and I really enjoyed talking about it. I got corrected in the AutoCon slack because I made a comment about how you can remediate or mitigate hallucinations. Yeah, there was somebody there much smarter than I who corrected me and showed me some resources that I could read. So a really cool learning event as well for me, in that regard. So I hope for you as well, by the way.

Peter Sprygada: It was absolutely... getting an opportunity to really hear firsthand what some organizations are doing, and it's something that it's nice to finally see us getting to this point in our industry. For so long, I've been doing this for the better part of 30 years, and I can remember a time when you didn't talk about what you did in your network, that was your secret sauce. You didn't talk about your design or how you built your configs because that was intellectual property. You'd never let that information go. And it's nice to see that by and large, I think we've overcome that at this point. People are starting to realize the benefit and the value of community and sharing and idea exchange and network automation forum, really did a good job bringing these groups together, and giving us a nice wide variety of sessions from, like you said, everything from AIML to observability to understanding automation and orchestration, and the whole event really came together in a fantastic way.

Philip Gervasi: Yeah, yeah, I agree. And then based on the vendors that were there, there are options to work with a partner to adopt a network automation mindset, culture, and actual technical practice, using the help of a vendor like Itential that literally specializes that and has the team behind it to make it happen. The professional services, I assume, right?

Peter Sprygada: Absolutely.

Philip Gervasi: And therefore, the skillset to make that work for your particular network. So it was a lot of fun and I'm looking forward to AutoCon One. I assume that they're going to call it AutoCon One since we started at zero. I don't know.

Peter Sprygada: I'm assuming so.

Philip Gervasi: Whatever they call it, I am looking forward to it and looking forward to being there. It was cool to get to the event and immediately saw, these are my people, I felt it was very familiar and I enjoyed that. Anyway, Peter, it was really great to have you on today. I hope to speak to you again soon. But for now, how can folks reach you online if they vehemently disagree with something that you said and want to yell at you?

Peter Sprygada: Please do, and reach out and point out everything I'm wrong. I love to hear when I'm right, but we all grow by hearing when we're wrong. That is for sure. But yeah, you can reach out to me. Probably easiest way is on Twitter. I'm at privateip, so you can hit me on Twitter. You can also hit me on Mastodon at privateip, and that's the easiest way to get in touch with me.

Philip Gervasi: Great, thanks. And you can find me online. I'm still on Twitter at Network_Phil. You can search me on LinkedIn. I'm on Bluesky now as well. And my blog network phil. com. Now, if you have an idea for an episode of Telemetry Now, or you'd like to be a guest, I'd love to talk to you. You can reach out at Telemetrynow @kentik. com. So for now, thanks for listening today. Talk to you soon. Bye- Bye.

DESCRIPTION

In this episode of Telemetry Now, Peter Sprygada from Itential joins us to talk about why the networking industry has been so slow to adopt network automation and how observability and automation go hand-in-hand.

Today's Host

Phil Gervasi

|Head of Technical Evangelism at Kentik

Today's Guests

Peter Sprygada

|Vice President of Product Management at Itential

Peter Sprygada serves as the Vice President, Product Management at Itential after serving as the Chief Technology Officer at Pureport where he was responsible for their multi-cloud network as a service interconnect platform. Prior to Pureport, Sprygada was a Distinguished Engineer for Red Hat, where he played the role of Chief Architect for the Ansible Automation Platform. Sprygada also held senior technical and leadership positions at Arista and Cisco, as well as several networking startups.