Controlling Kubernetes Network Traffic
Ingress NGINX is retiring and it got me thinking about how convoluted network traffic control has become in Kubernetes. You've got your CNI for connectivity, network policies for security, ingress controllers or Gateway API for north-south routing, maybe a service mesh for east-west traffic, and honestly most apps don't need all of this. The real decision most people face is simpler: ingress controller vs Gateway API.
Here's the thing: if you just need basic HTTP/HTTPS routing and you're already comfortable with nginx or Traefik, stick with ingress controllers. They work, they're stable, tooling is mature. Gateway API makes sense if you need advanced stuff like protocol-agnostic routing, cross-namespace setups, or you're running multi-team environments where role separation matters. All three clouds (AWS ALB Controller, Azure AGIC, GKE Ingress) have solid managed options for both approaches now. Gateway API is clearly the future, but "future-proof" doesn't mean you need to migrate today.
Network jobs roundup: AI certs pay, skills gap persists, mixed employment signals
The network jobs market is weird right now. AI certifications are commanding 12% higher pay year-over-year while overall IT skills premiums dropped 0.7%. CompTIA just launched AI Infrastructure and AITECH certs, Cisco added wireless-only tracks (CCNP/CCIE Wireless launching March 2026). Meanwhile unemployment for tech workers sits at 2.5-3% depending on who's counting, but large enterprises keep announcing layoffs while small/midsize companies are actually hiring.
Skills gap is real though- 68% of orgs say they're understaffed in AI/ML ops, 65% in cybersecurity. Telecom lost 59% of positions to automation, and survey data shows 18-22% of IT workforce could be eliminated by AI in the next 5 years. But demand for AI/ML, cloud architecture, and security skills keeps growing. The takeaway: upskill in AI and automation or get left behind, especially if you're in support, help desk, or legacy infrastructure roles.
Three Lessons from the Recent AWS and Cloudflare Outages
AWS US-EAST-1 went down for 15 hours in October (DNS race condition in DynamoDB), Cloudflare ate it in November (oversized Bot Management config file crashed proxies globally). Both followed the same pattern: small defect in one subsystem cascaded everywhere. The lessons are obvious but worth repeating: design out single points of failure with multi-region/multi-cloud by default, use AI-powered monitoring to correlate signals and automate rollback (monitoring without automated response is just expensive alerting), and actually practice your DR plan regularly because you fall to the level of your practice, not rise to your runbook.
The deeper point: complexity keeps growing with every new region and service, multiplying ways a small change can blow up globally. The answer is designing for failure: limit blast radius, decouple planes, automate validation. No provider is immune, so your architecture needs to assume failures will happen and route around them automatically.
Test your DR plan with chaos engineering, not hope- Google SRE Practice Lead
Google's SRE team wrote a piece on why your disaster recovery plan probably doesn't work and how chaos engineering proves it. The premise: systems change constantly (microservices, config updates, API dependencies), so that DR doc you wrote last quarter is already outdated. Chaos engineering lets you run controlled experiments—simulate database failovers, regional outages, resource exhaustion, and measure if you actually meet your SLOs during the disaster.
It's not about breaking things randomly. You define steady state, form a hypothesis (like "traffic will failover to secondary region in 3 minutes with <1% errors"), inject a specific failure, and measure what happens. The key insight is connecting chaos to SLOs. Traditional DR drills might "pass" because backup systems came online, but if it took 20 minutes and burned your entire error budget, customers saw you as down. Start small with one timeout or retry test, build confidence, scale from there.
Stelvio: AWS for Python devs
Stelvio is a Python framework that lets you define AWS infrastructure in pure Python with smart defaults handling the annoying bits. Run stlv init, write your infra in Python (DynamoDB tables, Lambda functions, API Gateway routes), hit stlv deploy and you're done. No Terraform, no CDK yaml hell, no mixing infrastructure code with application code.