News: 1770702465

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Dijkstra’s algorithm won’t be replaced in production routers any time soon

(2026/02/10)


Systems Approach Last year a couple of people forwarded to me the same article on a [1]new method of finding shortest paths in networks .

The [2]underlying research claims to improve on the classic approach pioneered by [3]Dijkstra that is taught in most networking textbooks (including [4]ours ). I was initially a bit skeptical, much as I would be if I read that the [5]Riemann Hypothesis had been proved.

Dijkstra is a legend in computer science and his algorithm, which he published in 1959, predates packet switching by a few years. The [6]specification for OSPF (Open Shortest Path First (OSPF), one of two dominant link-state routing protocols (Intermediate System-to-Intermediate System, aka IS-IS, is the other).

[7]

Guidance to implementors of OSPF is unusually detailed and basically tells them to use Dijkstra’s algorithm. And that is what most implementations have done for decades, with a few minor improvements through the years to speed up the implementation but no real change to the fundamentals.

[8]

[9]

The new algorithm is no minor tweak to Dijkstra’s but a significantly different approach. Its headline claim is that, whereas Dijkstra requires a sorting operation, and thus is only able to perform as well as the best sorting algorithm, this new approach “breaks the sorting barrier”. That is, it avoids the need for sorting and is able to deliver better bounds on performance than Dijkstra.

While I don’t consider myself qualified to evaluate the paper that describes the new algorithm, it has passed peer-review at a top-tier conference ( [10]ACM Symposium on the Theory of Computing ) and received enough scrutiny that I don’t doubt that the theory works. The question I want to discuss here is: does it matter?

[11]

Two main issues came immediately to mind when I tried to assess the implications of a theoretical improvement in algorithmic performance. The first is, what are the actual scaling limits that we need to address in a real routing system. For example, the running time of Dijkstra’s algorithm is on the order of ( n log n + m) for a network of n vertices (routers) and m edges (links). The new approach claims order ( m log 2/3 n) which is clearly going to be less for large enough n . (I had to take a refresher course on log notation before I could even write that sentence with any confidence.) The problem with assessing the scaling properties of anything is you have to wonder how big n must get before it makes a difference. There are constant factors that might differ between the two approaches; at small n, a “less scalable” approach might actually display better performance.

What scaling limit?

One of my first jobs was part of the team building a [12]scalable packet switch based on arrays of small switching elements. (This is where I got to build the [13]accidental SmartNIC ). We had papers to show that we could build switches with thousands of ports at 155 Mbps, in an era when shared Ethernet ran at 10 Mbps and had not yet been overtaken by switched Ethernet.

While we at Bellcore were investing lots of time and money to put together a set of 32-port prototype switches, [14]FORE systems delivered commercially successful 16-port switches. Almost certainly they were not as scalable as our design, but it turned out that n =16 was a pretty useful capacity for switches with 155 Mbps links in the 1990s. We felt quite sad that our research seemed to have been overtaken by commercial products. My takeaway was that scalability was a good thing to strive for but that sometimes you might achieve a good result with a less scalable solution. One of my favorite text books, “ [15]Principles of Computer Systems Design ”, uses the example of stopping a [16]supertanker to demonstrate the scaling limits of a system. The fact that supertankers have a scaling limit doesn’t prevent them from being the backbone of oil shipping. You just need to avoid making them too big.

What’s a large value for n in SPF calculations? I checked in with a couple of colleagues to update my recollection of how many routers you might find in a big OSPF or IS-IS backbone. In my memory it was in the hundreds; for the largest service provider networks today it seems to be in the small number of thousands. So that’s not tiny but it’s small compared to, say, the number of prefixes carried in BGP.

And it doesn’t seem to be limited by the performance of the SPF calculation, as I will explain below.

[17]TLS 1.3 includes welcome improvements, but still allows long-lived secrets

[18]Networking students need an explanation of the internet that can fit in their heads

[19]BGP’s security problems are notorious. Attempts to fix that are a work in progress

[20]DNS security is important but DNSSEC may be a failed experiment

The many facets of performance

Another memory I have from my time working for Big Router was an analysis of all the things that go into the performance of routing protocols. I was working on MPLS in its early days, and we were pretty excited about a technology called “fast reroute” (FRR) which uses MPLS to divert packets around a failed link without waiting for routing to converge after the failure. But at the same time, the people responsible for routing protocol development were hard at work improving routing convergence times. One of the most important things, it turns out, for both MPLS and standard routing, was simply detecting failure faster . For example, if you have to wait for tens of seconds of missing OSPF Hello packets before you declare a link down, it doesn’t really matter if you can calculate the shortest path in a fraction of a second. This was the reasoning behind the creation of [21]BFD (Bidirectional Forwarding Detection) : a fast mechanism, independent of routing, by which link failures could be detected for any type of link.

Beyond fast failure detection, other factors playing into routing convergence include: the time to send a new link state packet out a link and propagation delay across the network; time to receive such a packet and dispatch it to the right process in the operating system; SPF time; time to update the routing information base; time to calculate the impact on the forwarding table; time to push any forwarding table updates out to the line cards (on big routers that have such things); time to flood the link state packet out to neighbors. All of these steps have been analyzed and optimized over the years, to the point where sub-second routing convergence is now routine. As early as 2003 the improvements to all the steps above had enabled sub-second convergence, as [22]this NANOG presentation shows. Yes, you couldn’t afford to spend 10 seconds doing an SPF calculation if you wanted fast convergence, but that was already a solved issue by then. Optimizing all the other parts was at least as important as improving the SPF calculation time.

[23]

Finally, I chatted with another colleague about this topic and he reminded me of a reason that Dijkstra’s algorithm remains the implementation method of choice: it can be understood by the people who have to write code. Dijkstra himself puts it well in a 2001 interview:

The publication is still readable, it is, in fact, quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities.

In other words, [24]Keep It Simple, Stupid . I would certainly rather point an engineer at the [25]OSPF spec than send them off to understand the [26]hybrid Bellman-Ford/Dijkstra approach that might shave a few milliseconds off the non-bottleneck part of routing convergence. Maybe one day someone will write the explanation of the novel SPF approach that is as clear as Dijkstra’s paper and the OSPF spec. And the hybrid algorithm might be great for large mapping applications. But I don’t see Dijkstra’s algorithm being replaced in production routers any time soon. ®

Larry Peterson and Bruce Davie are the authors behind [27]Computer Networks: A Systems Approach and the related [28]Systems Approach series of books. All their content is open source and available for free on [29]GitHub . You can find them on [30]Mastodon , their newsletter [31]right here , and past The Register columns [32]here .

Get our [33]Tech Resources



[1] https://www.quantamagazine.org/new-method-is-the-fastest-way-to-find-the-best-routes-20250806/

[2] https://arxiv.org/pdf/2504.17033

[3] https://en.wikipedia.org/wiki/Edsger_W._Dijkstra

[4] https://book.systemsapproach.org/internetworking/routing.html#link-state-ospf

[5] https://en.wikipedia.org/wiki/Riemann_hypothesis

[6] https://datatracker.ietf.org/doc/html/rfc2328

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aYsP1OQwGnFUsOJROnhctAAAAAM&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aYsP1OQwGnFUsOJROnhctAAAAAM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aYsP1OQwGnFUsOJROnhctAAAAAM&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[10] https://acm-stoc.org

[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aYsP1OQwGnFUsOJROnhctAAAAAM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[12] https://ieeexplore.ieee.org/document/105175

[13] https://systemsapproach.org/2021/03/15/the-accidental-smartnic/

[14] https://en.wikipedia.org/wiki/FORE_Systems

[15] https://ocw.mit.edu/courses/res-6-004-principles-of-computer-system-design-an-introduction-spring-2009/pages/online-textbook/

[16] https://en.wikipedia.org/wiki/Oil_tanker

[17] https://www.theregister.com/2025/12/04/tls_13_includes_welcome_improvements/

[18] https://www.theregister.com/2025/11/13/networking_students_need_an_explanation/

[19] https://www.theregister.com/2025/08/27/systems_approach_securing_internet_infrastructure/

[20] https://www.theregister.com/2025/07/25/systems_approach_column_dns_security/

[21] https://www.rfc-editor.org/rfc/rfc5880

[22] https://archive.nanog.org/meetings/nanog29/presentations/filsfils.pdf

[23] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aYsP1OQwGnFUsOJROnhctAAAAAM&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[24] https://book.systemsapproach.org/internetworking/routing.html#key-kiss

[25] https://www.rfc-editor.org/info/rfc2328

[26] https://arxiv.org/pdf/2504.17033

[27] https://book.systemsapproach.org/

[28] https://www.systemsapproach.org/

[29] https://github.com/SystemsApproach

[30] https://discuss.systems/@SystemsAppr

[31] https://systemsapproach.org/newsletter/

[32] https://www.theregister.com/Tag/Systems%20Approach

[33] https://whitepapers.theregister.com/



A Non e-mouse

Dijkstra’s algorithm is only run when there is a change in the network topology: It isn't computed for each packet. Unless there is a step-wise improvement in speed or ease of coding of the new algorithim, there's little incentive to replace it.

Lee D

There's no incentive, because like all these things you can come up with an "any" answer quite quickly, while the optimal answer can follow later.

It really doesn't matter how long the algorithm takes to run so long as you can form *some* kind of correct answer quickly enough if necessary, and then you have all the time in the world to find an optimal answer.

Exactly the reason why RSTP exists, for instance. STP might take a while to get an answer (and "a while" is very subjective here, usually just seconds on even the largest network). So you have RSTP to find *any* valid answer and operate on that assumption for the meantime, while you wait for STP to finish doing its job and then flip to what STP considered the optimal solution in a minute or so.

Especially when the routing is complicated by things like VLANs and slower links and all sorts, it really doesn't matter how long you take (in practical terms, it can be seconds or even minutes), so long as you do eventually provide a decent answer.

Nobody is providing any guarantee that the answer will always be immediate and correct, because that would be dumb. And networking just doesn't work that way.

Michael H.F. Wilkinson

Quite apart from the very valid practical issues raised here, one should bear in mind shortest path algorithms have their uses elsewhere. Certain graph-cut methods seek a lowest cost path for e.g. image segmentation. This is a very different situation where n runs into tens of millions or billions. In this case, the new algorithm may show benefits. Besides, a new algorithm like this might inspire new algorithms in entirely different areas.

KISS

Pascal Monett

Amen to that.

I shudder to think what Redmond damage Redmond could do if it decided to generalize this new-fangled algorythm.

I know I'm getting old (hell, this year I'm 60), but honestly I've spent enough of my life in IT to know that If It Works, Don't Fix It.

Analogous to TCP/IP

Harry Kiri

Over the decades we've been involved in many analyses of TCP performance and have been really surprised just how good TCP is in pretty much all use-cases. Back in the day we looked at WAP as an alternative for mobile phones, VoIP performance, even replacing the TCP stack with XTP for better satcom performance (needed bespoke tuning as well). In all cases TCP came in such a close second it wasn't worth the faff of a bespoke solution to address niche corner cases when the existing thing was good enough (I think the only time TCP really drops off is if there are high bit errors on the bearer which is seen rarely).

Out of academic interest, I wonder where the overlap is in terms of routing over ad-hoc meshnets (fairly frequent route updates) and the trade-off of routeing optimisation there...

Re: Analogous to TCP/IP

Lee D

Because TCP is a bit like a theoretical Turing machine - it's communication boiled down to the lowest form in which it can still operate.

Sure, there are even alternatives like SCTP etc. but, honestly what you say still holds. TCP will be "good enough" for whatever you need for almost everything, and it's so ubiquitous and needs no special handling that it tends to win out generally.

In fact, several extensions to bulk out TCP actually failed - things like ECN can be a nightmare (modern kit should be fine, but that's how the world works). Or even jumbo frames. Trying to configure jumbo frames to work everywhere in all circumstances flawlessly is almost impossible.

Sure, there's some legacy in there and it was designed for certain purposes to which it is well suited but, honestly, if the world ended you could run everything you ever needed on TCP if it came to it, and it'd be a damn sight easier to find kit and software supporting it.

There are variants of Dijkstra, etc. that I remember from decades ago when I studied Maths and CS at university. If you know you are going to update information often, you can tweak the algorithms to be optimised towards that and such changes propagate a change to only those nodes/paths affected by such a change, etc. Rather than "here's the entire network, calculate the one true way. Whoops. No. It changed. Start all over again from scratch.", they operate more like "Okay, this tiny link has changed in this huge existing network, please ripple these changes around so that within X amount of time we have settled the network back to the optimal configuration". It's a different way of making things work, so the maths / graph theory is radically different, but they exist and work just as well. You start with a "blank" network, and you just ripple changes one at a time and keep doing that forever until you know the whole network and can compensate for the best paths in a short time.

I would be shocked if those things weren't already being used in everything from Wifi to 5G to even simple logistics operations planning. It's all just graph theory.

log⅔ (n) ?

Bebu sa Ware

I assume by analogy with sin²(x) that log⅔ (n) = (log (n))⅔ = ³√([log (n))²] ?

No yak too dirty; no dumpster too hollow.