Crafting the Web: Tips, Tools, and Trends for Developers Advertise with Us|Sign Up to the Newsletter @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} } @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} } WebDevPro #143 Why Your Microservices Need More Than Round-Robin DNS Meet the author This article draws on insights from Magnus Larsson, an IT industry veteran who has worked in the field since 1986 and has consulted for major Swedish companies such as Volvo, Ericsson, and AstraZeneca. Earlier in his career, Magnus experienced firsthand the challenges of building distributed systems. Today, many of those challenges can be addressed with open-source tools such as Spring Cloud, Kubernetes, and Istio. Over the past eight years, he has helped customers adopt these technologies and shared his expertise through presentations and blog posts. There's a moment in almost every distributed systems project where someone asks a completely reasonable question: why can't we just use DNS? Every instance of a service registers under the same hostname, the DNS server hands back a list of IPs, clients cycle through them. It sounds elegant. It sounds like a solved problem. And for a while, especially in early-stage systems, it kind of works. Then you scale up. Instances start crashing and restarting. Network partitions happen. You add a health check somewhere and realize DNS has no idea whether the IP it just returned belongs to a process that's been dead for forty seconds. The elegant solution starts showing its seams. This article is about why DNS-based service discovery breaks down in distributed systems, what the failure modes actually look like in practice, and how client-side discovery handles the chaos that DNS was never designed for. Before we dig deeper into this, here's a TL;DR you'll need: 🔺 Angular v22 is Here ⚡ VoidZero Joins Cloudflare 🛠️ Copilot SDK is Now GA 📘 TypeScript Tips Everyone Should Know 🎬 CSS vs. JavaScript Animations The DNS Round-Robin Promise The idea behind round-robin DNS is straightforward. Multiple service instances register under the same DNS name. When a client resolves that name, the DNS server returns a list of IP addresses, one per instance, and the client works through them sequentially. The first request goes to the first IP, the second to the next, and so on. Load is distributed. Everyone goes home happy. The problem is that this model is built on an assumption that quietly breaks in dynamic environments: that the list of IP addresses returned by DNS accurately reflects the set of healthy, reachable instances right now. In a microservices environment, that assumption fails constantly. What Actually Happens When You Try It To make this concrete, consider a scenario where you scale a service to two instances and ask a dependent service what IPs it sees. DNS returns both. But when you start sending real traffic, you notice something strange: requests keep going to the same instance. The load balancing you expected isn't happening. This reveals one of DNS's fundamental limitations as a load balancer: DNS clients don't do per-request round-robin. They ask a DNS server once, get a list, try the first address that works, and then hold onto it. Caching is baked into how DNS works. It's a feature for performance, but a liability when you need live instance awareness. Once a client finds a working address, it stops asking. That's problem one. Problem two is what happens when an instance disappears. The Staleness Problem DNS records have a time-to-live (TTL), and until that TTL expires, clients are working from a cached snapshot of the world. If an instance crashes, DNS doesn't know. If a new instance starts up, DNS doesn't know that either, not until someone updates the record and the TTL rolls over. In a system where instances come and go every few minutes, even a 30-second TTL creates meaningful windows of failure. More critically, DNS has no concept of health. An IP address in a DNS response is just an IP address. It carries no information about whether the process listening there is actually ready to handle requests, whether it's mid-restart, or whether it's responding to health checks but silently dropping traffic due to a downstream issue. DNS cannot answer the question "is this instance okay?" because it was never designed to. The Challenges DNS Can't Address When you enumerate what a robust service discovery mechanism actually needs to handle, the gap becomes clear. New instances can appear at any time and need to be made available to clients quickly, not after a TTL expires. Existing instances can fail at any time, and failed instances need to be removed from rotation just as quickly. Some instances that fail temporarily might recover and should rejoin rotation; others won't and should be permanently deregistered. New instances often have startup time as well. They can accept TCP connections before they're actually ready to serve traffic, which means "is the port open" is a poor proxy for "is this instance ready." And unintended network partitions, where a client loses connectivity to some instances but not others, can cause cascading failures if the discovery layer doesn't account for them. None of these are edge cases. They're the normal operating conditions of a distributed system at any meaningful scale. Client-Side Discovery: A Different Model The approach taken by systems like Netflix Eureka flips the model. Rather than relying on DNS as a passive lookup table, it introduces an active registry that instances communicate with continuously. When a service instance starts, it registers itself with the discovery server, not just its address but information about what it is. On a regular interval, it sends a heartbeat to signal that it's still alive and healthy. If the heartbeats stop, the discovery server removes the instance from the registry after a configurable window. Clients, meanwhile, periodically fetch the current registry from the discovery server and cache it locally. When a client needs to make a call, it already has a fresh list of available instances and can select one without making a synchronous request to the discovery service for every call. This architecture gets several things right that DNS doesn't. Instance registration is active, not passive. A service has to deliberately register itself, and it has to keep proving it's alive through heartbeats. An instance that crashes stops sending heartbeats and gets removed from the registry. There's no waiting for a TTL. Readiness is separable from availability. A service can control when it registers itself, after initialization is complete, after database connections are established, after whatever startup work needs to happen. The registry reflects intent, not just the existence of a listening port. Clients are participants, not observers. Because clients cache the registry locally and refresh it on a schedule, they can make load-balancing decisions with reasonably current information, and they do so per-request rather than per-connection. This is what produces actual round-robin behavior in practice. The system degrades gracefully. If the discovery server itself goes down, clients continue operating from their local cache. They can still reach instances that were registered before the outage. New instances can't register and deregistered instances won't be removed, but existing traffic keeps flowing. If a DNS server goes down, resolution fails entirely. The Propagation Question One nuance worth understanding is that client-side discovery still has propagation delay. It's just controlled and predictable rather than dependent on TTLs set by infrastructure you may not own. When an instance spins up and registers, clients that have already fetched the registry won't know about it until their next refresh cycle. Similarly, when an instance goes down, there's a window between the last heartbeat and the next client cache refresh during which they might try to call an address that no longer works. This is why production systems built on client-side discovery pair it with retry logic and circuit breakers. The discovery layer reduces the failure surface significantly, but it doesn't eliminate the need for resilience patterns at the call level. The important difference from DNS is that these windows are tunable. In a development environment, you might configure clients to refresh every five seconds and instances to send heartbeats just as frequently. In production, you'd balance freshness against the load of constant registry polling. DNS gives you no such control. What This Means in Practice The practical implication of all this is that service discovery isn't primarily a load balancing problem. It's a membership problem. The question being answered is "who is currently in this service's pool of healthy instances?" and DNS was built for a world where membership changes on a timescale of hours or days, not seconds or minutes. Client-side discovery systems treat membership as a live, continuously updated data structure. Instances opt in by registering and staying registered through heartbeats. Clients subscribe to changes in that membership by periodically refreshing their local view. The discovery server is the source of truth, but it's a source of truth that expects the world to change constantly and is designed accordingly. When you start thinking about service discovery through this lens, as a membership system rather than a name resolution system, the limitations of DNS become obvious. DNS is a remarkably well-engineered solution to the problem it was designed to solve. Service discovery in distributed systems is a different problem entirely. Takeaways DNS round-robin doesn't actually round-robin. Clients cache working addresses and stick with them. The per-request load distribution you expect doesn't happen in practice. DNS has no health awareness. An IP in a DNS response carries no signal about whether that instance is alive, ready, or functional. Health and availability are entirely separate concerns that DNS has no mechanism to represent. Instance registration needs to be active, not passive. Robust service discovery requires instances to continuously prove they're alive, not just appear in a record once. Heartbeat-based registration is the mechanism that makes deregistration automatic and timely. Propagation delay exists in all discovery systems, but client-side discovery makes it controllable. Understanding the refresh window and pairing discovery with retry logic is the path to resilient service-to-service communication. The discovery server is not a single point of failure if clients cache. Because clients maintain local copies of the registry, discovery server downtime degrades gracefully rather than catastrophically, a significant practical advantage over centralized DNS in high-availability requirements. Service discovery isn't glamorous infrastructure. It's the kind of thing that works invisibly when it's right and causes deeply confusing failures when it's wrong. Getting the model right from the start saves a lot of painful debugging later. This Week in the News ⚡ VoidZero Joins Cloudflare: Big news: Cloudflare has acquired VoidZero, the team behind Vite, Vitest, Rolldown, and Oxc, and has pledged $1 million to an independent Vite ecosystem fund. Evan You was frank about the reason: monetizing open-source tooling has proven extremely hard, despite Vite's enormous adoption. Vite stays MIT-licensed and vendor-neutral, but Cloudflare now has its hands on the plumbing of a large chunk of the modern web. Worth watching closely. 🔺 Angular v22 is Here: The signal-first era is no longer a roadmap promise. Signal Forms and resources are now stable, OnPush is now the default change detection strategy, and the HTTP client uses Fetch by default. This is a consolidation release: the experiments are over, and you're looking at the Angular team's considered view of how production apps should be built in 2026. If you've been waiting for the right time to migrate, that time is now. ☠️ Red Hat's npm Namespace Got Backdoored: A compromised Red Hat employee GitHub account was used to inject malicious workflows into three RedHatInsights repositories, with OIDC tokens publishing backdoored package versions that carried valid SLSA provenance attestations, making them look completely legitimate. The worm self-propagates using stolen npm tokens, bypassing 2FA to republish backdoored versions of other packages autonomously. This one is nasty. If you ran npm install on anything under @redhat-cloud-services after June 1st, rotate everything. 🧡 What's New in Svelte: June 2026: This month brings better forms, new long-lived remote query APIs, and TypeScript 6 support in language-tools. The standout addition is .live(...), a new query function that makes pulling real-time server data dramatically cleaner. Svelte's changelog keeps getting better without the drama. Quiet excellence. 🤖 State of AI 2026: Devographics surveyed 7,258 developers on AI in their workflows, and the numbers are striking. The average proportion of AI-generated code has jumped from 28% in 2025 to 54% this year, with the 75%+ segments seeing the highest growth. ChatGPT leads in raw usage, but Claude tops the charts for positive sentiment and is the model developers are most willing to actually pay for. The full dataset is worth digging into; pull up the interactive charts. 🛠️ Copilot SDK is Now GA: You can now embed GitHub Copilot's agentic engine directly into your own applications and developer tools, with access to planning, tool invocation, file edits, streaming, and multi-turn sessions, no need to build your own orchestration layer. Support spans Node.js, Python, Go, .NET, Rust, and Java. This is less a Copilot story and more a platform story; GitHub is positioning itself as the runtime for AI-native dev tooling. Beyond the Headlines 🧠 The Orchestration Tax: Addy Osmani names something most of us are quietly struggling with: spinning up more agents doesn't mean you're doing more. Your cognitive bandwidth doesn't parallelize. All the judgment to actually steer agents and merge the code they produce still has to route through exactly one serial processor, which is you. Feeling busy is not the same as being productive. 📘 TypeScript Tips Everyone Should Know: This is a compact, no-nonsense reference of TypeScript patterns worth bookmarking. Solid for both onboarding newer devs and as a quick refresh, the kind of thing that pays dividends when you actually internalize it rather than skim it once and forget. 🌍 A Functional Taxonomy of World Models: Fei-Fei Li and the World Labs team cut through the noise around "world models" and actually define what the term means across different fields. Computer vision, robotics, reinforcement learning, and generative AI all claim to be building world models, and each means something quite different: renderers, simulators, planners. If you care about where AI is actually heading beyond LLMs, this is essential reading. ⏳ The Best Loading States Are No Loading States: Applications end up with skeletons, spinners, shimmer effects, suspense fallbacks, UI whose only job is to occupy the space where data should eventually appear. We're all spending a surprising amount of time solving the same problem, and none of it is really product work. This is a sharp essay that argues we've been thinking about this backwards, and the web already had the answer before SPAs came along. 🎬 CSS vs. JavaScript Animations: Josh Comeau compares the same animations built across several different strategies and examines the performance implications firsthand, and there's some interesting nuance in the results. Not the take you might expect. This is a required reading before you reach for a JS animation library by default. 🔁 When AI Builds Itself: The Anthropic Institute published something worth sitting with: a look at their actual progress toward recursive self-improvement, with internal data included. Anthropic engineers today ship 8x as much code per quarter as they did between 2021 and 2025, and a growing share of the AI development cycle is now being delegated to AI systems themselves. They're careful to say recursive self-improvement isn't inevitable, but the framing is unusually candid. The Developer Toolbox ✍️ Hocuspocus If you need to add real-time collaborative editing to an app, Hocuspocus from the Tiptap team is the cleanest path there. It's a plug-and-play collaboration backend based on Y.js, handling conflict resolution via CRDT so you don't have to think about merge logic. Works offline, syncs on reconnect, and pairs naturally with Tiptap, but it's flexible enough for other editors too. That’s all for this week. Have any ideas you want to see in the next article? Hit Reply! Cheers! Editor-in-chief, Kinnari Chohan 👋 Advertise with us Interested in sponsoring this newsletter and reaching a highly engaged audience of tech professionals? Simply reply to this email, and our team will get in touch with the next steps. 📢 Important: WebDevPro is Moving to Substack WebDevPro will soon move to Substack. Future issues will come from packtwebdevpro@substack.com so please add it to your contacts or whitelist it to keep receiving the newsletter without interruption. SUBSCRIBE FOR MORE AND SHARE IT WITH A FRIEND! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;display:none;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.social_block .social-table{display:inline-block!important}} @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} } @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }
Read more