CloudPro #105

3rd Sept: Why Cloud, Why Now? Join Forrester & Atlassian to understand costs, risks, and AI opps.

ai-alone-wont-deliver-the-autonomous-network-img-0

Today’s CloudPro is written by Daren Fulwell, Field CTO at IP Fabric. Daren has over 25 years of experience in networking, and he is well-known for his work mentoring engineers, and helping organizations bridge the gap between design fundamentals and modern automation.

In today’s article, he explores why AI on its own won’t deliver true network autonomy. Instead, Daren shows how the real path forward lies in combining automation, AI agents, and a network digital twin, so teams can move beyond hype and build networks that are transparent, predictable, and genuinely autonomous.

And if you’re looking to put these ideas into practice, we’ve just released theNetwork Automation Cookbook, Second Edition. Packed with over 100 hands-on recipes, it shows you how to automate network devices and cloud platforms using Ansible, AWX, Nautobot, and Terraform. It’s 30% off exclusively for CloudPro readers, for the next 72 hrs. Grab your copy and start building real-world automation workflows today.

Cheers,

Shreyans Singh

Editor-in-Chief

SAVE THIS ARTICLE AND READ LATER

AI alone won’t deliver the autonomous network

By Daren Fulwell

SHARE THIS ARTICLE

In the network engineering community right now - as in most areas of IT - you can't escape the AI hype. We've been working to understand how network automation will change the way we operate our infrastructure, and agentic AI is being proposed as the missing piece of the puzzle. Folks in the know have been experimenting and making their results available in blog posts and Youtube videos for the rest of the world to salivate over. Finally, it looks like we have taken the right turn towards the self-driving network.

Or have we? Are a handful of small-scale experiments with limited scope and even more limited capability proving anything? At best, there is a lot more investigation required, at worst the experiments that we don't see are proving that AI is not to be trusted with our critical infrastructure yet.

Networks aren't just collections of individual devices that we configure and then they do what we tell them: they are interconnected, propagating their world view to their immediate neighbors and beyond, to create a "hive mind" behavior for the whole system. And in most cases, our networks are actually networks of networks - interconnected and sharing state information to extend that collective view from user to workload.

In traditional network operations, this meant having multiple teams - with their own documentation and subject matter experts in the technologies and platforms - who all needed to interface to provide end-to-end service. Maintenance of the infrastructure required deep collaboration between teams and across silos. A thorough understanding of the networking technologies needed to be applied to tooling and documentation to ensure change impacts were tracked and understood.

In the agentic AI world, this is taken to the next level. The work is divided up for agents to be given small, carefully-defined scopes to work within, making specific types of change or reporting on specific behaviors. But due to the distributed, interconnected nature of networking, none of those agents can work independently of the others: the effects caused by one will potentially be felt by them all. Without true collaboration between the agents, it becomes impossible for us to trust that they will give us the desired outcomes without humans (with an understanding of the infrastructure) manually checking everything they do.

In short, AI agents cannot operate the network autonomously without some collective understanding of the end-to-end network.

The Sources of Truth that we have been building for our network automation processes seem to fulfil at least elements of this need. But they alone are not enough as they really represent the desired state of the network, not its current operating state.

Consider these four key requirements for that source of knowledge:

We need a view of the end-to-end network in the form of structured data, with a well-documented schema, and able to be accessed over clearly defined APIs or protocols to provide a consistent end-to-end view to all agents
It must be a complete and up to date view of the network as it is operating. There is little point in having a clear view of part of the infrastructure then little or no understanding of other parts when one can so heavily impact the operation of the other.
Relationships and collective behavior are key to understanding how the network behaves: maintaining a list of devices (that may or may not be complete) and some data points about those devices may be useful but does not give the full picture
Network behavior is on the whole deterministic: a set of devices with specific state and connectivity should always behave the same way. So the data model must be based on facts collected from the network devices themselves, analysed and modelled as behavior - rather than being formed from conjecture, opinion or correlation of related events (the best you can expect from that is a general indication of direction)

A true Network Digital Twin has all of these characteristics:

ai-alone-wont-deliver-the-autonomous-network-img-2

SAVE THIS ARTICLE AND READ LATER

A Network Source of Truth system has some of these, but misses the key aspect of understanding network behavior end-to- end. For example, consider:

A change is required to enable Internet users gain access to an application hosted in a private DC. The external firewall is updated with NAT rules and policy changes to provide access; DNS changes are made; and routing is checked from the DC to ensure that traffic can be forwarded from user to frontend and back. But it still fails when the changes are pushed, because the security policy applied in the DC fabric only allows testing from internal hosts. The coordinated effort across multiple domains (read AI agents) has failed due to an incomplete view of the service dependencies.
A DR exercise is under way, causing applications to be switched from one location to another. Load balancing rules are being changed to facilitate that and the virtual IP successfully moves traffic flows to the new location. Two of the four servers in the load balancing pool are working fine, so the pool is up and being serviced, but not at full capacity. While the remaining servers are up and the correct services are running, routing from the load balancer to those servers is not correct: using the Digital Twin this end-to-end behavior can be diagnosed in advance and remediation carried out to fix this before live traffic is diverted through this path.

AI is going to change the way we operate networks. But in order to deliver its true potential, it needs not only to be able to deliver automated process, but to be fed real understanding of the networks it will operate in order to validate that it is doing what it needs to.

ai-alone-wont-deliver-the-autonomous-network-img-3

If today’s article got you thinking about how to move fromtalking about automationto actually building it, you’ll want to check out our brand new release:Network Automation Cookbook, Second Edition.

This updated edition is packed with over 100 hands-on recipes showing how to use Ansible, AWX, Nautobot, and Terraform to automate both on-prem and cloud networks. It’s written for engineers who want practical workflows, not just theory, and every recipe comes with reproducible labs so you can practice safely.

As a CloudPro reader, you can grab it at30% offfor the next 72 hours.

GET THE BOOK

📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.

If you have any comments or feedback, just reply back to this email.

Thanks for reading and have a great day!