Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

CloudPro

58 Articles
Shreyans from Packt
01 Nov 2024
Save for later

A hard look at GuardDuty shortcomings

Shreyans from Packt
01 Nov 2024
Cloudflare adopted OpenTelemetry for logging pipeline CloudPro #71: A hard look at GuardDuty shortcomings ⭐Masterclass: From Docker Compose to Kubernetes Manifests A hard look at GuardDuty shortcomings Streamlining Keycloak in Kubernetes The hater’s guide to Kubernetes A skeptic's first contact with Kubernetes 🔍Secret Knowledge: Enhancing Bitnami Helm Charts Security Cloudflare adopted OpenTelemetry for logging pipeline Josh Grose on LinkedIn: I spent the last 3 yrs outside of observability Did you know the CNCF has an actual cookbook? Not metaphorically! Unfashionably secure: why we use isolated VMs 🛠️HackHub: Best Tools for the Cloud Web tool for database management The devs are over here at devzat, chat over SSH! CloudFormation_To_Terraform Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster. Kubernetes network solution Cheers, Shreyans Singh Editor-in-Chief Forward to a Friend 🔍Secret Knowledge: Learning Resources Related Titles Enhancing Bitnami Helm Charts Security Bitnami enhanced the security of its Helm charts using Kubescape, an open-source Kubernetes security tool that identifies misconfigurations by comparing configurations to industry best practices. By integrating Kubescape into their build pipelines, Bitnami made significant improvements such as eliminating group root dependencies, configuring immutable filesystems, and reducing misconfigured resources. Cloudflare adopted OpenTelemetry for logging pipeline Cloudflare recently transitioned its logging pipeline from syslog-ng to OpenTelemetry Collector to enhance performance, maintainability, and telemetry insights. This move allowed the team to leverage Go, a language more familiar to their engineers, and integrate better observability through Prometheus metrics. Despite challenges like minimizing downtime during the switch and ensuring compatibility with existing infrastructure, the migration has opened up opportunities for further improvements, such as better log sampling and migration to the OpenTelemetry Protocol (OTLP). Josh Grose on LinkedIn: I spent the last 3 yrs outside of observability Josh Grose (ex-Principal PM, Splunk), after three years away from the observability space, was surprised to find that despite companies spending around 30% of their cloud budgets on monitoring, reliability hasn't improved significantly. He observed that even when Service Level Agreements (SLAs) are met, it often comes at the cost of developer productivity and experience. Engineering leaders are frustrated with the high costs and limited improvements in key metrics like Mean Time to Recovery (MTTR) and development speed, leading to the perception that observability has become an expensive and ineffective necessity. Did you know the CNCF has an actual cookbook? Not metaphorically! The "Cloud Native Community Cookbook" is a unique collection of recipes put together by the CNCF and Equinix Metal, born out of the increased time people spent at home during the COVID-19 pandemic. Instead of focusing on cloud technologies, this cookbook brings together food recipes shared by members of the Cloud Native community, originally exchanged in Equinix Metal's Slack channel. Unfashionably secure: why we use isolated VMs While modern cloud architectures often favor shared, multi-tenant environments for efficiency and scalability, Thinkst Canary opts for a less trendy but highly secure approach by using isolated virtual machines (VMs) for each customer. This choice prioritizes security by ensuring that each customer's data and services are completely separated, reducing the risk of cross-customer data breaches. Although this method comes with higher operational costs and complexity, it provides a stronger security boundary, making it easier to manage risks and sleep better at night. ⚡TechWave: Cloud News & Analysis How Figma Migrated onto K8s in Less Than 12 months Figma completed its migration to Kubernetes in under a year by meticulously planning and executing a well-scoped transition. Initially running services on AWS's ECS, Figma faced limitations such as complex stateful workloads and limited auto-scaling. The decision to move to Kubernetes (EKS) was driven by its broader functionality, including support for StatefulSets, Helm charts, and advanced scaling options from the CNCF ecosystem. By Q1 2024, Figma had migrated most core services with minimal impact on users, resulting in enhanced reliability, reduced costs, and a more flexible compute platform. Github Copilot Autofix: Secure code 3x faster Copilot Autofix, now available in GitHub Advanced Security, is an AI-powered tool designed to help developers fix code vulnerabilities more than three times faster than manual methods. It analyzes vulnerabilities, explains their significance, and offers code suggestions for quick remediation. This accelerates the fixing process for both new vulnerabilities and existing security debt, significantly reducing the time and effort required for secure coding. Copilot Autofix is included by default for GHAS customers and also available for open source projects starting in September. New Kubernetes CPUManager Static Policy: Distribute CPUs Across Cores Kubernetes v1.31 introduces a new alpha feature called "distribute-cpus-across-cores" for the CPUManager's static policy. This option aims to enhance performance by spreading CPUs more evenly across physical cores, rather than clustering them on fewer cores. This reduces contention and resource sharing between CPUs on the same core, which can boost performance for CPU-intensive applications. To use this feature, users need to adjust their Kubernetes configuration to enable it. Currently, it cannot be combined with other CPUManager options, but future updates will address this limitation. Announcing mandatory multi-factor authentication for Azure sign-in Microsoft is making multi-factor authentication (MFA) mandatory for all Azure sign-ins to enhance security and protect against cyberattacks. Starting in the latter half of 2024, Azure users will need to use MFA to access the Azure portal and admin centers, with broader enforcement for other Azure tools like CLI and PowerShell set for early 2025. MFA, which adds an extra layer of security by requiring more than just a password, is shown to block over 99% of account compromises. GitHub scales on demand with Azure Functions GitHub faced scalability issues with its internal data pipeline, which struggled to handle the massive amount of data it collects daily. To address this, GitHub partnered with Microsoft to use Azure Functions' new Flex Consumption plan, which allows serverless functions to scale dynamically based on demand. This solution has enabled GitHub to efficiently process up to 1.6 million events per second, addressing their growth challenges and improving performance with minimal overhead. 🛠️HackHub: Best Tools for Cloud commandprompt/pgmanage PgManage is a modern graphical database client for PostgreSQL, focusing on management features and built on the now-dormant OmniDB project. quackduck/devzat Devzat is a chat service accessible via SSH that replaces the traditional shell prompt with a chat interface, allowing you to connect from any device with SSH capabilities. aperswal/CloudFormation_To_Terraform The CloudFormation to Terraform Converter is a tool that automates the migration of AWS CloudFormation templates to Terraform configuration files. bloomberg/goldpinger Goldpinger monitors Kubernetes networking by making calls between its instances and providing Prometheus metrics for visualization and alerts. ZTE/Knitter Knitter is a Kubernetes CNI plugin that supports multiple network interfaces for pods, allowing custom network configurations across various cloud environments. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 13390

Shreyans from Packt
25 Oct 2024
Save for later

Building Lightweight Kubernetes Dev Ephemeral Environments

Shreyans from Packt
25 Oct 2024
EC2 Image Builder now supports building and testing macOS imagesCloudPro #70: Building Lightweight Kubernetes Dev Ephemeral EnvironmentsOur Exclusive 2-for-1 Sale is LIVE!For the next 24 hours only, you can secure 2 seats for the price of 1 at Generative AI in Action (Nov 11-13)!📅 Sale ends tomorrow at 10 AM ETBring a colleague, friend, or your team and dive into everything this conference has to offer—from expert insights and hands-on sessions to valuable networking opportunities.Act now. This deal won’t last long!⏳Today we will talk about:⭐MasterclassBuilding Lightweight Kubernetes Dev Ephemeral EnvironmentsFrom which Kubernetes pod (and namespace!) is this process that I see on my host?Argo Workflows: Simplify parallel jobs: Container-native workflow engine for KubernetesUsing SimKube 1.0: Comparing Kubernetes Cluster Autoscaler and KarpenterI've joined a company that has an AKS cluster whose version is completely outdated (1.21). I need to upgrade it to version 1.30 without any downtime and have a rollback plan in place🔍Secret KnowledgeLike Heroku, but You Own ItMulti-Metric ScalingGoran Opacic on X: "After years of using @awscloud Aurora, we are moving back to dedicated hardware. MySQL K8s operators are great, storage is cheap, memory is cheap, cpu is cheap, I can run 5.7 as much as I like and no AI. I'll miss database cloning and instant read replicasPolicy as Code in TerraformBehind the scenes of the OpenTelemetry Governance Committee⚡TechwaveEC2 Image Builder now supports building and testing macOS imagesUpgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon BedrockGrafana 11.3 release: Scenes-powered dashboards, visualization and panel updates, and moreSonar Details OpenAPI Generator Flaw That Creates Source Code VulnerabilityHashiCorp Updates Terraform; Wider Cloud Infrastructure Developer Toolsets🛠️Hackhubkubectl-guard: Accidentally modifying production instead of a local cluster? kubectl-guard helps prevent such critical mistakes.kubesafe: Safely manage multiple Kubernetes clusters by defining safe contexts and protected commands.Tfreveal:An open-source tool that enhances Terraform plan visibility by showing all resource and output differences, including sensitive values.SyncLite:A low-code platform for relational data consolidation, ideal for building data-intensive apps across edge, desktop, and mobile environments.pg_replicateCheers,Shreyans SinghEditor-in-Chief⭐MasterClass: Tutorials & GuidesBuilding Lightweight Kubernetes Dev Ephemeral EnvironmentsKardinal is an open-source tool for creating lightweight, temporary development environments on Kubernetes clusters. It’s designed to minimize resource usage by deploying only the services you need for testing while reusing existing resources when possible. Kardinal introduces “flows”—ephemeral environments that can be spun up for specific features or testing needs, which saves time and costs by avoiding redundant deployments.From which Kubernetes pod (and namespace!) is this process that I see on my host?To find which Kubernetes pod and namespace a process on your host belongs to, you can use crictl along with cgroups. First, get the process ID (PID) of the containerized process, then find its cgroup ID, which will contain the container’s unique identifier. Once you have that ID, use crictl inspect with a formatted output to get the pod’s namespace and name. This retrieves both the namespace and pod name directly from crictl using go-template formatting.Argo Workflows: Simplify parallel jobs: Container-native workflow engine for KubernetesIn this guide, the focus is on Argo Workflows, an open-source tool designed to manage complex workflows in Kubernetes environments by orchestrating parallel tasks in containers. Each step of a workflow is run within a container, making it ideal for complex pipelines like data processing or machine learning. Argo Workflows integrates with Kubernetes services (e.g., volumes, secrets, and RBAC) and uses Directed Acyclic Graphs (DAGs) to sequence tasks. This setup explains deploying Argo on Amazon EKS and integrating it with Argo Events to handle data-driven tasks triggered by messages from Amazon SQS, creating a scalable, event-driven Spark job processing platform on Kubernetes.Using SimKube 1.0: Comparing Kubernetes Cluster Autoscaler and KarpenterSimKube 1.0, a Kubernetes simulator, was used to test two popular cluster autoscaling solutions: Kubernetes Cluster Autoscaler (KCA) and Karpenter. Both tools add nodes to a Kubernetes cluster based on workload demands, but they differ significantly in approach. KCA, originally designed for homogeneous clusters, must be configured with specific instance types, which can make it slower when there are many options. Conversely, Karpenter, designed by AWS, optimizes across all available EC2 instances by default and uses both a "fast" loop for quick scheduling and a "slow" loop for optimization, which made it faster in this simulation.I've joined a company that has an AKS cluster whose version is completely outdated (1.21). I need to upgrade it to version 1.30 without any downtime and have a rollback plan in placeUpgrading an outdated AKS cluster from version 1.21 to 1.30 without downtime requires a careful approach, especially since rolling back AKS upgrades isn't possible. A Blue-Green deployment is a good option here but is complex at the cluster level. One way to approach it is to create a new cluster with AKS version 1.30, deploy and test the application there, and then redirect production traffic to the new cluster via DNS or load balancer once confirmed stable. First, validate the application’s compatibility with version 1.30 in your QA environment and ensure no critical API changes break functionality. If creating a new cluster is challenging due to resource limitations, consider a controlled maintenance window with a staged upgrade (e.g., from 1.21 to 1.22, then to 1.24, and so on) but remember that the direct upgrade might carry risks due to skipped deprecation changes and other breaking updates.🔍Secret Knowledge: Learning ResourcesLike Heroku, but You Own ItDokku is an open-source platform as a service (PaaS) that lets you turn a virtual private server (VPS) into a serverless platform, similar to Heroku, but with more control and no subscription costs. It allows easy deployment of web apps using Docker containers, GitHub Actions, or simple git commands. With features like auto-scaling, built-in SSL from Let’s Encrypt, and password protection, Dokku is ideal for hosting both applications and static sites from private repositories. Additionally, it offers flexible deployment options and can integrate with Cloudflare for HTTPS if needed, making it a powerful, budget-friendly solution for personal or small-scale app hosting.Multi-Metric ScalingYelp has implemented multi-metric autoscaling on its PaaSTA platform, enabling services to scale based on multiple factors (like CPU and request load) rather than just one, improving stability and quicker recovery during high-demand periods. Since PaaSTA is an 11-year-old platform on Kubernetes, updating it safely was challenging. The team spent weeks understanding the codebase, gathering input, and defining a clear, gradual update plan. They used snapshot testing and strict validation to confirm stability at each step, made minimal yet crucial API adjustments, and improved monitoring through Grafana. Ultimately, the update rolled out smoothly, enhancing scaling options without causing any service interruptions.Goran Opacic on X: "After years of using @awscloud Aurora, we are moving back to dedicated hardware. MySQL K8s operators are great, storage is cheap, memory is cheap, cpu is cheap, I can run 5.7 as much as I like and no AI. I'll miss database cloning and instant read replicasPolicy as Code in TerraformPolicy as Code (PaC) allows organizations to enforce rules and guidelines on infrastructure automatically by defining policies as code, ensuring resources meet security, compliance, and operational standards. Tools like HashiCorp Sentinel and Open Policy Agent (OPA) are popular frameworks for PaC, working with infrastructure as code (IaC) tools like Terraform. Unlike traditional IaC, which configures infrastructure, PaC sets up policy rules that are enforced whenever infrastructure changes are proposed. This approach helps maintain a secure, compliant cloud environment by preventing risky configurations.Behind the scenes of the OpenTelemetry Governance CommitteeThe OpenTelemetry Governance Committee (GC) guides the OpenTelemetry project strategically, ensuring its growth as a vendor-neutral observability framework. While the Technical Committee (TC) focuses on technical aspects, the GC's role includes setting project goals, updating policies, and overseeing SIG (Special Interest Group) sponsorships, ensuring alignment with community needs. GC members also represent OpenTelemetry at events, mediate conflicts, and check in with SIG maintainers to address challenges and gather feedback.⚡TechWave: Cloud News & AnalysisEC2 Image Builder now supports building and testing macOS imagesAWS EC2 Image Builder now supports creating macOS images, enabling users to streamline their image management and automate the creation of "golden images" (customized bootable OS images) for macOS in addition to Windows and Linux. This is particularly helpful for developers using macOS tools like Xcode and Fastlane, which are essential in CI/CD pipelines. With Image Builder, users can create components for specific tools, define a recipe for a base macOS image, configure infrastructure (like EC2 Mac Dedicated Hosts), and set up pipelines that automatically test and validate each image.Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon BedrockAnthropic's latest updates to the Claude 3.5 model family in Amazon Bedrock include an upgraded Claude 3.5 Sonnet, which enhances the model’s ability to handle complex software engineering tasks, knowledge-based Q&A, data extraction, and task automation at the same cost as previous versions. Additionally, a new "computer use" feature, available in public beta, allows Claude 3.5 Sonnet to interact with computer interfaces, like opening applications, typing, and clicking, opening up possibilities for AI-driven automation in software testing and administrative workflows. Lastly, the upcoming Claude 3.5 Haiku will offer faster response times paired with strong reasoning abilities, ideal for applications requiring both speed and intelligence, such as customer service and data processing in sectors like finance and healthcare.Grafana 11.3 release: Scenes-powered dashboards, visualization and panel updates, and moreGrafana 11.3 introduces a range of new features and improvements, with a highlight on the new Scenes-powered dashboards, enhancing stability, flexibility, and organization of dashboard elements. This release also includes visual and functional updates, like a redesigned inspect feature for table cells, enabling quick data analysis, and the new "Actions" option, allowing users to trigger API calls directly from elements on canvas panels. The update further enhances alerting with simplified rule creation and RBAC for notifications, and Explore Logs is now a default feature, making log troubleshooting more accessible.Sonar Details OpenAPI Generator Flaw That Creates Source Code VulnerabilitySonar recently identified a vulnerability in the OpenAPI Generator, a popular tool for creating API libraries, that could allow attackers to read or delete files in certain directories. Although a patch has been released, many existing APIs built with older, unpatched versions might still be at risk, requiring DevSecOps teams to locate and update them. This vulnerability underscores the challenge of detecting security flaws in auto-generated code, where developers may be less involved in the underlying code creation process. With cybercriminals actively searching for such vulnerabilities, DevSecOps teams must prioritize remediating high-risk code while balancing limited resources.HashiCorp Updates Terraform; Wider Cloud Infrastructure Developer ToolsetsHashiCorp, now under IBM's ownership, announced significant updates to Terraform at HashiConf, focusing on streamlining multi-cloud infrastructure management. Terraform's new "stacks" feature allows developers to manage complex, interdependent infrastructure configurations, making it easier to scale and control cloud resources across multiple environments. Additionally, HCP Waypoint provides a structured portal for internal development, using templates to standardize application deployment and updates. Other enhancements include new lifecycle management capabilities for HCP Vault, GPU resource sharing in Nomad, and an automation tool for migrating Terraform workflows, all designed to optimize and automate infrastructure in an increasingly complex cloud landscape.🛠️HackHub: Best Tools for Cloudkubectl-guard: Accidentally modifying production instead of a local cluster? kubectl-guard helps prevent such critical mistakes.To set up *kubectl-guard*, first create a file named *kubectl-guard* for the script, then make it executable by running `chmod +x kubectl-guard`. Next, open your shell configuration file (e.g., `~/.zshrc`) in a text editor, and add an alias with the command `alias kubectl='full-path-to/kubectl-guard'`, replacing "full-path-to" with the actual path where the script is saved. Save and close the file, then restart your terminal session for changes to take effect. This setup will help ensure safety by requiring the production cluster name to include "prod," though you can adjust this by modifying the `PROD_IDENTIFIER` variable.kubesafe: Safely manage multiple Kubernetes clusters by defining safe contexts and protected commands.*Kubesafe* is a tool designed to help you avoid running risky commands on the wrong Kubernetes cluster by marking certain contexts as "safe" and defining commands that need confirmation before execution. It works with any Kubernetes CLI tool (like `kubectl` or `helm`) by wrapping the command to add this layer of protection. For instance, running `kubesafe kubectl delete pod my-pod` will prompt for confirmation if the context is marked as protected. You can set up aliases, such as `alias kubectl='kubesafe kubectl'`, to automatically use Kubesafe each time you run a command.Tfreveal:An open-source tool that enhances Terraform plan visibility by showing all resource and output differences, including sensitive values.*tfreveal* is an open-source tool that lets you see all changes, including sensitive values, in Terraform plan files, enhancing transparency in infrastructure updates. While Terraform hides sensitive data by default, tfreveal unearths these details, which is particularly useful for detecting drift between Terraform state and actual infrastructure. Typically, sensitive data can only be viewed through complex JSON outputs, making it hard to read, especially when changes are in large encoded values. tfreveal simplifies this by displaying clear diffs, showing all values. To use, generate a plan file with `terraform plan -out plan.out`, then pipe it to tfreveal via `terraform show -json plan.out | tfreveal`.SyncLite:A low-code platform for relational data consolidation, ideal for building data-intensive apps across edge, desktop, and mobile environments.SyncLite is an open-source, low-code platform for creating data-intensive applications that seamlessly consolidate and synchronize data across edge, desktop, and mobile environments. It supports real-time, transactional data replication from various sources, like embedded databases (e.g., SQLite, DuckDB) and IoT message brokers, and integrates with popular data destinations, such as databases, data warehouses, and data lakes.pg_replicate`pg_replicate` is a Rust library designed to help developers quickly set up data replication from PostgreSQL to various data systems. It simplifies the use of PostgreSQL’s logical streaming replication protocol, letting users focus on building data pipelines without dealing with protocol details. To get started, users create a PostgreSQL publication, run the stdout example to replicate data to standard output, and connect using simple commands.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 10414

Shreyans from Packt
15 Nov 2024
Save for later

Unlock Kubernetes Savings with Kubecost’s Automated Actions

Shreyans from Packt
15 Nov 2024
Red Hat Enterprise Linux AI Now Generally AvailableCloudPro #73: Unlock Kubernetes Savings with Kubecost’s Automated ActionsShouldn't GenAI be doing all the cyber crap jobs by now?Learn about the latest in GenAI for vulnerability management, exposure management and cyber-asset security when you attend the CyberRisk Summit. This free, virtual event on Wednesday, Nov. 20 includes expert speakers from Yahoo, Wells Fargo, IBM, Vulcan Cyber and more. This is the ninth, semi-annual CyberRisk Summit. Attendees can request CPE credits, and all registrants get access to the session recordings. Join us!Register for free⭐MasterclassThe Kubernetes gap in CNAPPUnlock Kubernetes Savings with Kubecost’s Automated ActionsHow WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondHow to migrate an observability platform to open-source and cut costs🔍Secret KnowledgeImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureComplete Guide to Logging in Golang with slogScaling Prometheus with ThanosAutomated container CVE and vulnerability patching using Trivy and CopaceticSelf-signed Root CA in Kubernetes with k3s, cert-manager and traefik🛠️HackhubProduction-ready Kubernetes distribution for both public and private cloudApplication Performance Monitoring SystemGraceful shutdown and Kubernetes readiness / liveness checks for any Node.js HTTP applicationsToolkit for Integrating with your kubernetes dev environment more efficientlyBackup your Kubernetes Stateful ApplicationsCheers,Shreyans SinghEditor-in-ChiefREGISTER FOR FREEProtect Your .NET Applications with Dotfuscator: Stop Reverse Engineering and Secure Your IPYour .NET applications face constant threats from reverse engineering, leaving your proprietary code, sensitive logic, and IP exposed. But with Dotfuscator by PreEmptive, you can safeguard your software. Dotfuscator’s advanced obfuscation features—like renaming, control flow obfuscation, and string encryption—harden your code against tampering, unauthorized access, and IP theft.Take control of your application’s security and keep your code and intellectual property secure. Empower your development process with Dotfuscator today—because your .NET apps deserve protection that lasts.Start Free Trial⭐MasterClass: Tutorials & GuidesThe Kubernetes gap in CNAPPInitially, CNAPPs focused on integrating various cloud security tools and supporting enterprises during early cloud adoption. As a result, their Kubernetes protection often lacks depth and focuses mainly on surface-level issues like container vulnerabilities, without addressing the complexities of Kubernetes clusters, such as control plane security or runtime policies. This has led to a false sense of security in cloud environments, as CNAPPs fail to offer robust Kubernetes-specific features.Unlock Kubernetes Savings with Kubecost’s Automated ActionsKubecost's new automated actions help users save money in their Kubernetes environments by optimizing resource usage with minimal effort. With features like automated request sizing, cluster turndown, and namespace turndown, Kubecost identifies inefficiencies like over-provisioned containers and shuts down unused clusters or namespaces. Users can set schedules for automating these actions, reducing waste and freeing up resources.How WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondWebAssembly (Wasm) components enable Kubernetes to extend seamlessly across multi-cloud, edge, and other distributed environments by providing a lightweight, portable way to run applications across any architecture. Wasm components, similar to containers, can be written in various languages and connected through shared APIs, allowing for greater flexibility and efficiency. By integrating with Kubernetes through wasmCloud, a Wasm-native orchestrator, organizations can enhance their cloud-native setups without changing existing infrastructure.How to migrate an observability platform to open-source and cut costsMigrating an observability platform to open-source can significantly reduce costs while maintaining control over telemetry data, but it requires careful planning and execution. This process involves identifying essential telemetry data, selecting an open-source stack for logs, metrics, and traces, conducting proofs-of-concept (POCs) across different systems, and ensuring compatibility with various architectures, such as microservices. The migration also includes reconfiguring alerts and dashboards, validating the new setup, and updating related systems like notification and incident management tools.🔍Secret Knowledge: Learning ResourcesImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureThis book provides practical guidance on using GitOps to automate and manage Kubernetes deployments in cloud-native environments like AWS and Azure. It explains core GitOps principles, tools like Argo CD and Flux, and strategies for implementing CI/CD pipelines. The book also covers infrastructure automation with Terraform, security best practices, and observability while addressing cultural transformations in IT for GitOps adoption. By the end, readers will have skills to apply GitOps in scaling, monitoring, and securing Kubernetes deployments efficiently.Complete Guide to Logging in Golang with slogIn Golang, structured logging can be efficiently implemented using the `slog` package, introduced in version 1.21. `slog` allows for more organized and detailed log entries by formatting logs as key-value pairs, making them easier to search, filter, and analyze. The package provides flexibility with logging levels (like Debug, Info, Warn, and Error) and supports both text-based and JSON-formatted output. Key components include Loggers, Records, and Handlers, which define how logs are created, stored, and processed.Scaling Prometheus with ThanosScaling Prometheus with Thanos allows for long-term storage, cost savings, and a global view of metrics in large environments. While Prometheus is great for short-term monitoring, it struggles with long-term storage and querying across multiple clusters. Thanos extends Prometheus by using components like Thanos Query, Sidecar, and Store Gateway to enable scalable, highly available storage through object stores, reducing Prometheus's resource consumption. It also supports downsampling to optimize storage and query performance.Automated container CVE and vulnerability patching using Trivy and CopaceticAutomating container vulnerability patching with Trivy and Copacetic (copa) helps protect your applications from potential attacks by scanning and patching container images automatically. Trivy scans container images for vulnerabilities, generating a report in JSON format, while Copacetic reads this report and patches the container image based on detected vulnerabilities. Once patched, the image is rebuilt and rescanned to ensure all vulnerabilities have been fixed.Self-signed Root CA in Kubernetes with k3s, cert-manager and traefikIn Kubernetes with k3s, cert-manager, and Traefik, you can create a self-signed root Certificate Authority (CA) to manage TLS certificates locally, useful when your cluster isn't exposed to the internet (e.g., no Let's Encrypt). The process involves setting up cert-manager to automate the issuance, renewal, and secret management of these certificates. You first create a self-signed root CA, which then signs an intermediate CA, and that intermediate CA signs leaf certificates for your services. This setup allows your services to have trusted certificates locally.🛠️HackHub: Best Tools for Cloudlabring/sealosSealos is a cloud operating system built on the Kubernetes kernel, designed to simplify managing cloud-native applications. It offers quick deployment of distributed applications and high-availability databases like MySQL, PostgreSQL, and MongoDB.apache/skywalkingApache SkyWalking is an open-source Application Performance Monitoring (APM) system designed for microservices, cloud-native, and container-based architectures. It offers end-to-end distributed tracing, service observability, and diagnostic tools, supporting various programming languages like Java, .NET, PHP, and Python.godaddy/terminusTerminus is a Node.js package that helps manage graceful shutdowns and Kubernetes health checks for HTTP applications. Terminus also provides readiness and liveness checks to inform Kubernetes about the service’s health status.alibaba/kt-connectKT-Connect is a tool that helps developers efficiently connect, redirect, and expose local applications to Kubernetes clusters for easier testing and development.stashed/stashStash by AppsCode is a cloud-native backup and recovery solution for Kubernetes workloads, making it easier to back up and restore data like volumes and databases in dynamic Kubernetes environments. It simplifies the backup process using tools like restic and Kubernetes CSI Driver VolumeSnapshotter.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0
  • 10192
Subscribe to Packt _CloudPro
Our mission is to bring you the freshest updates in Cloud, Identity and Access Management, CI/CD, DevSecOps, Cloud Security, and adjacent domains.

Shreyans from Packt
06 Sep 2024
Save for later

Google Cloud has launched Memorystore for Valkey

Shreyans from Packt
06 Sep 2024
Red Hat Enterprise Linux AI Now Generally AvailableCloudPro #63: Google Cloud has launched Memorystore for Valkey200+ hours of research on AI-led career growth strategies & hacks packed in 3 hoursThe only AI Crash Course you need to master 20+ AI tools, multiple hacks & prompting techniques in just 3 hoursYou’ll save 16 hours every week & find remote jobs using AI that will pay you upto $10,000/moGet It Here⭐Masterclass[Sponsored] 200+ hours of research on AI-led career growth strategies & hacks packed in 3 hoursThe Kubernetes gap in CNAPPUnlock Kubernetes Savings with Kubecost’s Automated ActionsHow WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondHow to migrate an observability platform to open-source and cut costs🔍Secret KnowledgeImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureComplete Guide to Logging in Golang with slogScaling Prometheus with ThanosAutomated container CVE and vulnerability patching using Trivy and CopaceticSelf-signed Root CA in Kubernetes with k3s, cert-manager and traefik⚡TechwaveRed Hat Enterprise Linux AI Now Generally AvailableKubernetes 1.31: Streaming Transitions from SPDY to WebSocketsGoogle Cloud has launched Memorystore for ValkeyPalo Alto Networks acquires IBM QRadar SaaS assetsBroadcom Adds On-Premises Edition of Project Management Application🛠️HackhubProduction-ready Kubernetes distribution for both public and private cloudApplication Performance Monitoring SystemGraceful shutdown and Kubernetes readiness / liveness checks for any Node.js HTTP applicationsToolkit for Integrating with your kubernetes dev environment more efficientlyBackup your Kubernetes Stateful ApplicationsCheers,Shreyans SinghEditor-in-ChiefLive Webinar: The Power of Data Storytelling in Driving Business Decisions (September 10, 2024 at 9 AM CST)Data doesn’t have to be overwhelming. Join our webinar to learn about Data Storytelling and turn complex information into actionable insights for faster decision-making.Click below to check the schedule in your time zone and secure your spot. Can't make it? Register to get the recording instead.REGISTER FOR FREEForward to a Friend⭐MasterClass: Tutorials & GuidesThe Kubernetes gap in CNAPPInitially, CNAPPs focused on integrating various cloud security tools and supporting enterprises during early cloud adoption. As a result, their Kubernetes protection often lacks depth and focuses mainly on surface-level issues like container vulnerabilities, without addressing the complexities of Kubernetes clusters, such as control plane security or runtime policies. This has led to a false sense of security in cloud environments, as CNAPPs fail to offer robust Kubernetes-specific features.Unlock Kubernetes Savings with Kubecost’s Automated ActionsKubecost's new automated actions help users save money in their Kubernetes environments by optimizing resource usage with minimal effort. With features like automated request sizing, cluster turndown, and namespace turndown, Kubecost identifies inefficiencies like over-provisioned containers and shuts down unused clusters or namespaces. Users can set schedules for automating these actions, reducing waste and freeing up resources.How WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondWebAssembly (Wasm) components enable Kubernetes to extend seamlessly across multi-cloud, edge, and other distributed environments by providing a lightweight, portable way to run applications across any architecture. Wasm components, similar to containers, can be written in various languages and connected through shared APIs, allowing for greater flexibility and efficiency. By integrating with Kubernetes through wasmCloud, a Wasm-native orchestrator, organizations can enhance their cloud-native setups without changing existing infrastructure.How to migrate an observability platform to open-source and cut costsMigrating an observability platform to open-source can significantly reduce costs while maintaining control over telemetry data, but it requires careful planning and execution. This process involves identifying essential telemetry data, selecting an open-source stack for logs, metrics, and traces, conducting proofs-of-concept (POCs) across different systems, and ensuring compatibility with various architectures, such as microservices. The migration also includes reconfiguring alerts and dashboards, validating the new setup, and updating related systems like notification and incident management tools.🔍Secret Knowledge: Learning ResourcesImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureThis book provides practical guidance on using GitOps to automate and manage Kubernetes deployments in cloud-native environments like AWS and Azure. It explains core GitOps principles, tools like Argo CD and Flux, and strategies for implementing CI/CD pipelines. The book also covers infrastructure automation with Terraform, security best practices, and observability while addressing cultural transformations in IT for GitOps adoption. By the end, readers will have skills to apply GitOps in scaling, monitoring, and securing Kubernetes deployments efficiently.Complete Guide to Logging in Golang with slogIn Golang, structured logging can be efficiently implemented using the `slog` package, introduced in version 1.21. `slog` allows for more organized and detailed log entries by formatting logs as key-value pairs, making them easier to search, filter, and analyze. The package provides flexibility with logging levels (like Debug, Info, Warn, and Error) and supports both text-based and JSON-formatted output. Key components include Loggers, Records, and Handlers, which define how logs are created, stored, and processed.Scaling Prometheus with ThanosScaling Prometheus with Thanos allows for long-term storage, cost savings, and a global view of metrics in large environments. While Prometheus is great for short-term monitoring, it struggles with long-term storage and querying across multiple clusters. Thanos extends Prometheus by using components like Thanos Query, Sidecar, and Store Gateway to enable scalable, highly available storage through object stores, reducing Prometheus's resource consumption. It also supports downsampling to optimize storage and query performance.Automated container CVE and vulnerability patching using Trivy and CopaceticAutomating container vulnerability patching with Trivy and Copacetic (copa) helps protect your applications from potential attacks by scanning and patching container images automatically. Trivy scans container images for vulnerabilities, generating a report in JSON format, while Copacetic reads this report and patches the container image based on detected vulnerabilities. Once patched, the image is rebuilt and rescanned to ensure all vulnerabilities have been fixed.Self-signed Root CA in Kubernetes with k3s, cert-manager and traefikIn Kubernetes with k3s, cert-manager, and Traefik, you can create a self-signed root Certificate Authority (CA) to manage TLS certificates locally, useful when your cluster isn't exposed to the internet (e.g., no Let's Encrypt). The process involves setting up cert-manager to automate the issuance, renewal, and secret management of these certificates. You first create a self-signed root CA, which then signs an intermediate CA, and that intermediate CA signs leaf certificates for your services. This setup allows your services to have trusted certificates locally.Developing for iOS? Setapp's 2024 report on the state of the iOS market in the EU is a must-seeHow do users in the EU find apps? What's the main source of information about new apps? Would users install your app from a third-party app marketplace?Set yourself up for success with these and more valuable marketing insights in Setapp Mobile's report iOS Market Insights for EU.Get Insights free⚡TechWave: Cloud News & AnalysisRed Hat Enterprise Linux AI Now Generally AvailableRed Hat Enterprise Linux (RHEL) AI is now available, providing an open-source platform for developing and running generative AI models across hybrid cloud environments. It combines efficient models, such as the Granite LLM family, and tools like InstructLab to help align models with specific business needs. RHEL AI allows domain experts, not just data scientists, to contribute to AI models, making them more accessible and cost-effective.Kubernetes 1.31: Streaming Transitions from SPDY to WebSocketsIn Kubernetes 1.31, the default streaming protocol used by kubectl has shifted from the outdated SPDY protocol to the more modern and widely supported WebSocket protocol. Streaming protocols in Kubernetes enable persistent, real-time communication between the client and server, which is useful for operations like running commands inside a container. The switch to WebSockets improves compatibility with modern proxies and gateways, ensuring commands like `kubectl exec`, `kubectl cp`, and `kubectl port-forward` function smoothly across different environments.Google Cloud has launched Memorystore for ValkeyGoogle Cloud has launched Memorystore for Valkey, a fully managed, high-performance key-value service that is 100% open-source. Valkey 7.2 is compatible with Redis 7.2 and offers features like zero-downtime scaling, persistence, and integration with Google Cloud. It's designed to meet the demand for open-source data management, providing users with an alternative to Redis for use cases like caching and session management. Valkey is gaining popularity due to its performance and scalability, and Google Cloud plans to expand its capabilities further with Valkey 8.0, which promises even better performance and reliability.Palo Alto Networks acquires IBM QRadar SaaS assetsPalo Alto Networks has acquired IBM's QRadar SaaS assets to enhance their joint AI-powered security solutions, aiming to help organizations strengthen their cybersecurity operations. This partnership will simplify threat detection, improve security automation, and deliver next-generation security operations at scale. IBM will support seamless migrations to Palo Alto's Cortex XSIAM platform.Broadcom Adds On-Premises Edition of Project Management ApplicationAt VMware Explore 2024, Broadcom introduced an on-premises version of its Rally project management application, called Rally Anywhere, to give organizations more control over their data. This version is especially valuable for industries with strict regulations or concerns about ransomware targeting SaaS platforms. Rally Anywhere offers an alternative to Atlassian’s Jira, which is discontinuing its on-premises option, and helps organizations meet data sovereignty requirements.🛠️HackHub: Best Tools for Cloudlabring/sealosSealos is a cloud operating system built on the Kubernetes kernel, designed to simplify managing cloud-native applications. It offers quick deployment of distributed applications and high-availability databases like MySQL, PostgreSQL, and MongoDB.apache/skywalkingApache SkyWalking is an open-source Application Performance Monitoring (APM) system designed for microservices, cloud-native, and container-based architectures. It offers end-to-end distributed tracing, service observability, and diagnostic tools, supporting various programming languages like Java, .NET, PHP, and Python.godaddy/terminusTerminus is a Node.js package that helps manage graceful shutdowns and Kubernetes health checks for HTTP applications. Terminus also provides readiness and liveness checks to inform Kubernetes about the service’s health status.alibaba/kt-connectKT-Connect is a tool that helps developers efficiently connect, redirect, and expose local applications to Kubernetes clusters for easier testing and development.stashed/stashStash by AppsCode is a cloud-native backup and recovery solution for Kubernetes workloads, making it easier to back up and restore data like volumes and databases in dynamic Kubernetes environments. It simplifies the backup process using tools like restic and Kubernetes CSI Driver VolumeSnapshotter.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 9811

Shreyans from Packt
18 Oct 2024
Save for later

AI agents invade observability: snake oil or the future of SRE?

Shreyans from Packt
18 Oct 2024
I created DevOps Interview Preparation Lab based on Interviews from Microsoft, Airbnb, AccentureCloudPro #69: AI agents invade observabilityJoinGenerativeAI InActionnow withaFull Event Pass for just $239.99—40% off the regular price—with codeFLASH40.BOOK TODAY AT $239.99 $399.99Three Reasons Why You Cannot Miss This Event:-Network with 25+ Leading AI Experts-Gain Insights from 30+ Dynamic Talks and Hands-On Sessions-Engage with Experts and Peers through 1:1 Networking, Roundtables, and AMAsAct fast—this FLASH SALE is only for a limited number of seats!CLAIM NOW- LIMITED SEATSToday we will talk about:⭐MasterclassAI agents invade observability: snake oil or the future of SRE?I created DevOps Interview Preparation Lab based on Interviews from Microsoft, Airbnb, Accenture, and othersQA's Dead: Where Do We Go From Here?Convert OpenTelemetry Traces to Metrics using SpanMetrics ConnectorReduce Network Traffic Costs in Your Kubernetes Cluster🔍Secret KnowledgeSQLite on RailsJust use PostgresWhy I still Self-Host my ServersEssays on programming I think about a lotA detailed guide to cron jobs⚡TechwaveHow Google fine-tuned Gemma model for FlipkartAWS has launched Console to Code: tool that generates codeBring your conversations to WhatsApp with AWS End User Messaging SocialIntroducing pipe syntax in BigQuery and Cloud LoggingGCloud Database Center: AI-powered, unified fleet management solution preview now open to all customers🛠️Hackhubagnost-gitops: Open source GitOps platform running on Kubernetes clusterskube-downscaler: Scale down Kubernetes deployments after work hoursAWS Mine: honey token system designed to generate AWS access keysTinyStatus:A simple, customizable status page generator that monitors and displays the status of services on a responsive web page.Litecli:A command-line client for SQLite databases, featuring auto-completion and syntax highlighting.Cheers,Shreyans SinghEditor-in-ChiefLooking to build, train, deploy, or implement Generative AI?Meet Innodata — offering high-quality solutions for developing and implementing industry-leading generative AI, including:With 5,000+ in-house SMEs and expansion and localization supported across 85+ languages, Innodata drives AI initiatives for enterprises globally.Learn More⭐MasterClass: Tutorials & GuidesAI agents invade observability: snake oil or the future of SRE?This article explores how AI, particularly agentic AI, is transforming the field of observability and monitoring. Traditional monitoring tools use dashboards, alerts, and data insights to help developers and operators manage system health, but new AI agents are designed to act more like team members. These agents, powered by large language models (LLMs), can analyze operational data and automate tasks like incident response and maintenance.I created DevOps Interview Preparation Lab based on Interviews from Microsoft, Airbnb, Accenture, and othersThis hands-on lab is designed to help you prepare for DevOps interviews by walking you through key tools like Python web apps, Docker, Kubernetes, Helm Charts, GitHub Actions for CI/CD, and Ingress Controllers. It's practical, not theory-based, and helps you build a project from scratch through containerization, deployment, and CI/CD setup.QA's Dead: Where Do We Go From Here?The concept of traditional QA (Quality Assurance) has evolved, shifting responsibility for software quality from a separate QA team to developers themselves. In the old model, QA was a distinct stage that came after development, causing delays, inefficiencies, and higher costs due to late bug detection. Now, with agile methodologies and advanced tooling, testing is integrated throughout the development process. Developers take ownership of quality, using tools like automated testing, CI/CD pipelines, and instant feedback mechanisms. QA isn't dead; instead, it has become an essential part of every developer's role, with QA professionals either moving into technical automation roles or higher-level strategic positions.Convert OpenTelemetry Traces to Metrics using SpanMetrics ConnectorThe SpanMetrics Connector in OpenTelemetry converts trace data into actionable metrics, which is useful when robust tracing is in place but metrics instrumentation is lacking. It works by extracting metrics from spans (units of trace data) and aggregating them into key performance indicators like request counts, errors, and durations. This unified approach simplifies observability by reducing the need for separate instrumentation for traces and metrics. By configuring the connector, developers can easily generate custom metrics, optimize system performance, and enhance monitoring without increasing overhead or complexity.Reduce Network Traffic Costs in Your Kubernetes ClusterTo reduce network traffic costs in a Kubernetes cluster, it's important to minimize cross-availability zone (AZ) traffic, which can increase latency and lead to higher data transfer costs. Strategies to reduce this include intelligent node placement, ensuring related pods are located in the same AZ to avoid unnecessary data transfer. Topology-aware routing ensures traffic is directed within the same AZ, while using local persistent volumes keeps data close to the pods accessing it. Pod topology spread constraints help evenly distribute pods across zones, further minimizing cross-AZ communication and improving both performance and cost-efficiency.🔍Secret Knowledge: Learning ResourcesSQLite on RailsRunning SQLite on Rails can provide good performance, but out-of-the-box it isn’t optimized for high-concurrency production environments. This is mainly due to SQLite’s single-write locking mechanism, which can cause errors and bottlenecks when multiple threads attempt to write at the same time. However, by fine-tuning configurations—like setting immediate transactions, adjusting busy timeouts, and managing connection pools—Rails apps can achieve resilient performance. Advanced techniques, such as using custom busy handlers and write-ahead logging (WAL), further enhance concurrency and minimize delays, making SQLite on Rails a viable production option.Just use PostgresWhen building a new application requiring persistent storage, Postgres should be your default choice. It highlights why other databases might not be ideal: SQLite is great for single-machine apps but limited for distributed systems, NoSQL databases like MongoDB require rigid access patterns, and newer databases like XTDB pose long-term risks. Postgres offers flexibility, scalability, and a rich ecosystem of tools, making it a reliable and efficient choice for most web applications without the trade-offs of other databases.Why I still Self-Host my ServersTwo reasons: independence and learning. Hosting own services lets the author stay free from corporate control and subscriptions while teaching valuable skills that benefit his career as a software engineer. From managing a Proxmox cluster and Pi-Hole DNS servers to troubleshooting outages and hardware issues, the experience forces him to dive deeper into the technical aspects of system administration. This continuous learning has proven useful in handling complex distributed systems at work. Despite the challenges, like hardware failures and occasional crashes, the lessons learned make it worthwhile.Essays on programming I think about a lotThis passage highlights several key programming essays that have deeply impacted the author's thinking and engineering approach. These essays cover various topics, from understanding complex systems, choosing stable technology, and managing abstractions, to hiring strong engineering teams and designing scalable distributed systems. The recurring theme is thoughtful, pragmatic decision-making in software engineering, advocating for simplicity, clear abstraction boundaries, and understanding the deeper layers of technology. Each essay provides timeless insights, shaping the author's work habits, and the list invites others to explore and reflect on these ideas for themselves.A detailed guide to cron jobsA cron job is a scheduled task or command in Unix-based systems, like Linux and macOS, that automates repetitive processes such as backups, email sending, or database updates. Cron jobs use a specific time-based syntax to determine when and how often the task should run. This guide explains how to set up, edit, and manage cron jobs, including the syntax, adding new jobs, and checking their logs. It also covers methods for monitoring cron jobs, such as using logs, monitoring tools, and email alerts to ensure tasks run as expected without system issues.⚡TechWave: Cloud News & AnalysisHow Google fine-tuned Gemma model for FlipkartThe blog describes the process of fine-tuning Gemma, an instruction-tuned AI model, for a conversational shopping assistant. It starts with data preparation using a subset of Flipkart’s product catalog, filtering for clothing items and generating Q&A pairs based on product details. Fine-tuning was achieved using LoRA, a parameter-efficient method, with multiple iterations on both pre-trained and instruction-tuned models. The fine-tuning was scaled using multi-GPU setups on Google Kubernetes Engine (GKE). Hyperparameter tuning was also crucial to optimize model performance, ensuring the chatbot provides accurate, contextual responses.AWS has launched Console to Code: tool that generates codeAWS has launched "Console to Code," a tool that simplifies the process of moving from prototyping in the AWS Management Console to writing production-ready code. This tool automatically captures actions taken in the console and generates code in formats like CLI, CloudFormation, and CDK, following AWS best practices. It helps users quickly create reusable, automation-friendly code without needing to manually write it, streamlining the transition from console use to Infrastructure-as-Code (IaC). This service is available for key AWS services like EC2, VPC, and RDS.Bring your conversations to WhatsApp with AWS End User Messaging SocialAWS has introduced "End User Messaging Social," allowing developers to send messages to their users on WhatsApp, the world’s most popular messaging app. With this tool, developers can create rich, interactive messaging experiences that include multimedia content. WhatsApp can now be used alongside SMS and Push notifications, giving businesses multiple ways to reach their audience. Setting up WhatsApp messaging is easy, with options to create a new WhatsApp Business Account or link an existing one, all within the AWS console.Introducing pipe syntax in BigQuery and Cloud LoggingGoogle Cloud has introduced a new "pipe syntax" in BigQuery and Cloud Logging, designed to simplify log data queries. This new syntax uses a pipe symbol (|>) to break down complex SQL queries into clear, easy-to-read steps, improving the readability and writability of log analysis tasks. With this innovation, users can quickly filter, aggregate, and explore log data, making it easier to extract insights. BigQuery’s enhanced performance features, like faster numeric search indexes and better handling of JSON data, further streamline log analysis. Pipe syntax is now available in preview.GCloud Database Center: AI-powered, unified fleet management solution preview now open to all customersGoogle Cloud has launched Database Center, an AI-powered solution that simplifies managing large, complex database fleets. It provides a unified interface for monitoring and optimizing databases like Cloud SQL, AlloyDB, and Spanner. Database Center helps businesses detect and address performance and security issues with proactive recommendations, ensuring smoother operations and better compliance with industry standards. It also includes AI-powered chat for quick troubleshooting and optimization insights, allowing users to improve performance, reduce costs, and strengthen security across their entire database landscape.🛠️HackHub: Best Tools for Cloudagnost-gitops: Open source GitOps platform running on Kubernetes clustersAgnost GitOps is an open-source platform for continuous deployment (CD) on Kubernetes clusters. It automates the process of building, deploying, and managing applications by connecting your GitHub, GitLab, or Bitbucket repository. When you push new code, Agnost builds a Docker image using Kaniko and deploys it to your Kubernetes cluster.kube-downscaler: Scale down Kubernetes deployments after work hoursKube-downscaler is a Kubernetes tool designed to automatically scale down or pause workloads (like Deployments, StatefulSets, and HorizontalPodAutoscalers) during non-work hours, helping organizations save on cloud costs. It operates based on a configurable schedule of uptime and downtime, using Kubernetes annotations or command-line options.AWS Mine: honey token system designed to generate AWS access keysThe "aws-mine" project is a honey token system designed to generate AWS access keys that can be strategically placed in various locations to lure and detect potential attackers. If someone attempts to use these keys, the system sends a notification within about four minutes, allowing you to investigate the source and assess whether the asset has been compromised.TinyStatus:A simple, customizable status page generator that monitors and displays the status of services on a responsive web page.It checks the status of HTTP endpoints, pings hosts, and monitors open ports, displaying results on a clean and responsive web page. The system is configured using YAML files, and it supports both light and dark themes, as well as incident history tracking.Litecli:A command-line client for SQLite databases, featuring auto-completion and syntax highlighting.Upon first use, LiteCLI generates a configuration file that can be customized for user preferences. It streamlines database interactions by predicting commands and formatting output, enhancing the command-line experience for SQLite users.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 9230

Shreyans from Packt
30 Aug 2024
Save for later

Kubernetes 1.31: Fine-grained SupplementalGroups control

Shreyans from Packt
30 Aug 2024
Announcing Terraform Google Provider 6.0.0 CloudPro #62: Kubernetes 1.31 Fine-grained SupplementalGroups control Quick Start Kubernetes Understand what Kubernetes is and why it's essential Learn the inner workings of Kubernetes architecture Get hands-on with deploying and managing applications Set up Kubernetes and containerize applications GET IT FOR $18.99 $12.99 ⭐Masterclass: Unlock the Full Potential of Kubernetes for Scalable Application Management Kubernetes pod and container restarting Better Kubernetes YAML Editing with (Neo)vim Monitoring kubernetes events with kubectl and Grafana Loki Practical Logging for PHP Applications with OpenTelemetry Using 1Password with External Secrets Operator in a GitOps way 🔍Secret Knowledge: Build your own SQS or Kafka with Postgres Revealing the Inner Structure of AWS Session Tokens An Opinionated Ramp Up Guide to AWS Pentesting Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs Application Availability Depends on Dependencies ⚡Techwave: Kubernetes 1.31: Fine-grained SupplementalGroups control Announcing Terraform Google Provider 6.0.0 New capabilities in VMware Private AI Foundation with NVIDIA GitLab Announces the General Availability of GitLab Duo Enterprise Grafana 11.2 release: new updates for data sources, visualizations, transformations, and more 🛠️HackHub: Best Tools for the Cloud PostgreSQL cloud native High Availability and more Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates Runs and manages databases, message queues, etc on K8s Powerful workflow engine and end-to-end pipeline solutions implemented with native Kubernetes resources configure kubernetes objects on multiple clusters using jsonnet Cheers, Shreyans Singh Editor-in-Chief Mobile Banking Apps: Secure SDKs Aren’t Enough (Webinar) Is your mobile banking app truly secure? Join our webinar to learn why relying solely on protected SDKs leaves your app vulnerable. Discover real-world scenarios where emerging vulnerabilities can compromise your app despite using a protected SDK. We'll cover multi-layered protection strategies and practical solutions to guard against reverse engineering, tampering, and malware. Gain actionable insights on using obfuscation, data encryption, and real-time application self-protection (RASP) to safeguard your app. Equip yourself with practical solutions to ensure comprehensive app security and safeguard your business from financial and regulatory risks. REGISTER NOW Forward to a Friend ⭐MasterClass: Tutorials & Guides Kubernetes pod and container restarting In Kubernetes, a Pod is the smallest deployable unit, often containing one or more containers. When a container or pod needs to be restarted due to errors or updates, Kubernetes offers several methods to do so. For example, you can restart a Pod by deleting it, and Kubernetes will automatically recreate it if it’s part of a Deployment. Alternatively, you can restart a specific container within a Pod using commands like `kubectl exec` for more precise control. These features allow Kubernetes to maintain high availability and resilience in a cloud environment. Better Kubernetes YAML Editing with (Neo)vim Editing Kubernetes YAML files can be tricky, but using Neovim, a modern version of Vim, can make it much easier. Neovim is lightweight, highly customizable, and integrates well with your terminal, making it ideal for DevOps and platform engineers. By configuring Neovim specifically for YAML files, you can set up features like auto-indentation, syntax highlighting, folding, and autocompletion, all of which help reduce errors and improve efficiency. Monitoring kubernetes events with kubectl and Grafana Loki In Kubernetes, monitoring events is crucial for understanding the status and issues related to Pods, WorkerNodes, and other components. You can use `kubectl` to view these events directly, or you can enhance your monitoring setup by integrating Kubernetes events with Grafana Loki. By capturing events as logs using a tool like the `k8s-event-logger`, which listens to the Kubernetes API, you can store them in Loki, create metrics with RecordingRules, and visualize them in Grafana. Practical Logging for PHP Applications with OpenTelemetry Practical logging for PHP applications using OpenTelemetry involves instrumenting your PHP code to collect and correlate log data with other observability signals like traces and metrics. This approach is particularly useful in microservices-based architectures, where understanding the interactions between different services is crucial for maintaining system stability. By using OpenTelemetry, developers can standardize how telemetry data is collected and exported, reducing complexity. Using 1Password with External Secrets Operator in a GitOps way To manage secrets securely in a GitOps environment using Kubernetes, you can integrate 1Password with the External Secrets Operator. This setup allows you to automatically fetch and inject secrets stored in 1Password into your Kubernetes cluster. By using tools like ArgoCD, Helm, or FluxCD, you can deploy and manage this integration efficiently. The External Secrets Operator pulls secrets from 1Password via 1Password Connect, a proxy that ensures availability and reduces API requests. PACKT TITLES FOR YOU Buy now at $16.99 $10.99 Buy now at $39.99 $27.98 Buy now at $24.99 $16.99 🔍Secret Knowledge: Learning Resources Build your own SQS or Kafka with Postgres You can build your own version of SQS (Simple Queue Service) or Kafka using PostgreSQL by setting up tables and queries that mimic the functionality of these popular message queues and streams. For SQS, you create a table to store messages, with columns that help manage message visibility, delivery attempts, and order. You can then write queries to insert messages, retrieve them while respecting visibility timeouts, and delete them after processing. For Kafka, you expand this setup by storing messages persistently and keeping track of where each consumer group is in the message stream, allowing multiple consumers to process messages independently and in parallel, similar to Kafka's partitioning system. Revealing the Inner Structure of AWS Session Tokens By reverse engineering these tokens, the research team developed tools to analyze and modify them programmatically. This allowed them to uncover previously unknown details about AWS's cryptography and authentication protocols. Their findings showed that while AWS's security measures are robust, understanding the structure of these tokens can help defenders better protect against potential attacks. Additionally, the research raises questions about the privacy and integrity of these tokens. An Opinionated Ramp Up Guide to AWS Pentesting) Lizzie Moratti's "Opinionated Ramp Up Guide to AWS Pentesting" offers a detailed roadmap for becoming proficient in AWS pentesting, emphasizing practical experience over certifications. The guide is tailored for those with a foundational understanding of networking and security, and it stresses the importance of broad knowledge before delving into deeper cloud-specific skills. The guide also touches on industry pitfalls, such as reliance on automated tools and the challenges of cloud pentesting in a fast-evolving environment. Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs AWS Batch now supports multi-node parallel (MNP) jobs for Amazon EKS, allowing you to gang schedule pods across multiple nodes for tasks that require extensive computation, like machine learning or weather forecasting. Previously, MNP jobs were only available on Amazon ECS. With this update, you can use AWS Batch on EKS to run distributed processing jobs, such as those with Dask, a Python library for parallel computing. The setup involves defining job configurations that include a main node running the scheduler and worker nodes executing the tasks. This approach ensures efficient communication and scaling across nodes, streamlining complex computations in a managed environment. Application Availability Depends on Dependencies Modern applications depend on various services and components, meaning their reliability is tightly linked to the uptime of these dependencies. For example, if an application like Tekata.io needs to maintain 99.9% uptime, but it relies on several services with only 99.9% uptime each, the combined effect could reduce Tekata.io’s overall availability. To hit the desired uptime, dependencies need to have even higher availability. The formula \( A = U^N \) shows that if your application’s target uptime is 99.9% and it has 7 dependencies, each dependency must have an uptime of 99.99% to meet that target. ⚡TechWave: Cloud News & Analysis Kubernetes 1.31: Fine-grained SupplementalGroups control In Kubernetes 1.31, a new feature called `supplementalGroupsPolicy` was introduced to give better control over how supplementary group IDs are handled in Pods. Previously, Kubernetes automatically included group memberships defined in the container’s `/etc/group` file, which could lead to unexpected group IDs being applied and potentially cause security or access issues. With this update, you can now specify a `Strict` policy that only includes the group IDs explicitly set in the Pod's manifest, excluding any additional groups defined in the container image. Announcing Terraform Google Provider 6.0.0 The Terraform Google Provider 6.0.0 introduces several enhancements for better management of Google Cloud resources. Key updates include the option to opt-out of a default label ("goog-terraform-provisioned") that identifies Terraform-managed resources, improved protection against accidental resource deletion with new deletion protection fields, and increased flexibility with longer name prefixes for resources. New capabilities in VMware Private AI Foundation with NVIDIA Key updates in VMware Private AI include a Model Store for secure LLM management, a streamlined deployment process, and new NVIDIA capabilities like NIM Agent Blueprints for custom AI workflows. Future updates will include better GPU management, advanced data indexing and retrieval services, and tools for building AI agents. GitLab Announces the General Availability of GitLab Duo Enterprise GitLab has launched GitLab Duo Enterprise, an AI-powered add-on designed to enhance the software development lifecycle for DevSecOps teams. Priced at $39 per user per month, this tool integrates advanced AI features to improve code generation, security vulnerability detection, and team collaboration. It builds on the capabilities of GitLab Duo Pro by adding enterprise-focused tools like vulnerability resolution, root cause analysis, and AI impact dashboards. Grafana 11.2 release: new updates for data sources, visualizations, transformations, and more Notable additions include support for new data sources like Yugabyte and Amazon Managed Service for Prometheus, updates to visualizations such as standardized tooltips and pagination for state timelines, and improvements in transformations like data transposing and enhanced template variable support. The release also includes better alerting features, integration improvements for OAuth and SAML providers, and a migration assistant for easier transition to Grafana Cloud. 🛠️HackHub: Best Tools for Cloud sorintlab/stolon Stolon is a cloud-native tool designed to manage PostgreSQL databases with high availability, making it suitable for deployment in various environments including Kubernetes and traditional infrastructures. It leverages PostgreSQL's streaming replication and integrates with cluster stores like etcd, Consul, or Kubernetes for leader election and data storage. keel-hq/keel Keel is a lightweight tool for automating updates to Kubernetes deployments without needing complex command-line interfaces or APIs. It integrates directly with Kubernetes and Helm, using labels and annotations to manage updates based on semantic versioning policies. apecloud/kubeblocks KubeBlocks is an open-source tool designed to simplify the management of multiple database types on Kubernetes using a unified set of APIs. Instead of dealing with different operators for each database, KubeBlocks provides a single control plane to manage various databases such as PostgreSQL, Redis, and Kafka. It offers a standardized approach to database lifecycle management, day-2 operations, and observability, with support for backup, recovery, and monitoring. caicloud/cyclone Cyclone is a workflow engine built for Kubernetes that manages end-to-end pipelines without requiring extra dependencies. It operates across various Kubernetes environments, including public, private, and hybrid clouds. Cyclone offers features like DAG graph scheduling, flexible parameterization, and integration with external systems. It supports triggers, multi-cluster execution, multi-tenancy, and automatic resource cleanup. splunk/qbec Qbec is a CLI tool designed for managing Kubernetes objects across multiple clusters or namespaces using jsonnet, a data-templating language. It simplifies Kubernetes configuration management by allowing users to define and deploy objects in various environments efficiently. Qbec is similar to tools like kubecfg and ksonnet. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0
  • 6934

Shreyans from Packt
19 May 2025
Save for later

Kubernetes v1.33 Fixes a 10-Year-Old Image Pull Loophole

Shreyans from Packt
19 May 2025
The Lost Fourth Pillar of ObservabilityCloudPro #92Sponsored: Most GenAI projects die in the proof-of-concept stage. This session by Rubrik shows you how to push past that👇Save Your SpotThis week’s CloudPro issue has got a bunch of things I’ve either run into myself or seen others get tripped up by:📌AWS defaults that quietly expose more than they should📌a Kubernetes bug that’s been around for ten years📌GitHub Actions setups that look fine until someone finds a way inThere’s also a couple posts you'll find helpful, like building a CI/CD pipeline that’s actually fast, or understanding how containers really run under the hood.Hope a few of these come in handy when you need them.Also: we’re planning another special issue for next week. Any ideas on what we should dive into, or an expert you’d love to hear from? Just reply back to this email, I’d really like to hear what you think.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityAmazon GuardDuty Malware Protection for EC2 now available in AWS GovCloud (US) RegionsAmazon has released malware protection for EC2 in AWS GovCloud (US) regions. It scans EBS volumes attached to EC2 instances and container workloads to detect potential malware. The system supports both automatic scans based on suspicious behavior and manual scans using the EC2 instance's ARN. It works without adding any new software and does not impact workload performance.Amazon VPC adds CloudTrail logging for VPC resources created by defaultAmazon VPC now logs creation and deletion of default resources—like Security Groups, Route Tables, and Network ACLs, when a VPC is created or deleted. Previously, CloudTrail only captured explicitly created resources, making audits harder. This update helps teams improve governance and track changes more easily.Guardrails for Your Cloud: A Simple Guide to OPA and TerraformThis post shows how to use OPA to block risky Terraform changes like unencrypted S3 buckets or open security groups. It explains how to write Rego policies, run checks on Terraform plans, and enforce standards like required tags and deployment restrictions. Helpful for adding policy-as-code guardrails to IaC workflows.Shadow Roles: AWS Defaults Can Lead to Service TakeoverThis research shows how default AWS service roles, like those for SageMaker, Glue, and EMR, often come with overly broad S3 permissions, such as AmazonS3FullAccess. Attackers can abuse these defaults to escalate privileges and compromise other services. Real-world scenarios include model-based attacks via Hugging Face and cross-service takeovers through default IAM roles.Hardening GitHub Actions: Lessons from Recent AttacksTwo recent supply chain attacks exploited weak GitHub Actions workflows, compromising popular repos via over-permissive settings and exposed secrets. The report urges tighter defaults: set tokens to read-only, limit third-party Actions, avoid risky triggers like pull_request_target, and never expose secrets to forks. It also warns self-hosted runners can be dangerous if shared or persistent.Build Your Own AI Agents Over The WeekendJoin the live "Building AI Agents Over the Weekend" Workshop starting on June 21st and build your own agent in 2 weekend. In this workshop, the Instructors will guide you through building a fully functional autonomous agent and show you exactly how to deploy it in the real world.BOOK NOW AND SAVE 35%Use Code AGENT35 at checkout⚙️ Infrastructure & DevOpsRedis Is Open Source AgainRedis has shifted back to an open source license (AGPLv3) for Redis 8 after a year under more restrictive licenses meant to block cloud providers from monetizing it freely. The pivot follows the rise of the Valkey fork, backed by AWS and Google, and a recognition that Redis had lost favor with parts of the developer community.37signals Says Goodbye to AWS: Full S3 Migration and $10M in Projected Savings37signals has fully migrated 18 PB of data off AWS S3 to its own Pure Storage-based infrastructure, ending over a decade on the platform. AWS waived the $250K egress fee, aligning with EU Data Act requirements. The company expects to cut infrastructure costs from $3.2M to under $1M annually, saving over $10M in five years.Docker Explained: Finally Understand Containers Without Losing Your Mind (Probably)This post explains how Docker packages your code and dependencies into isolated containers that run the same everywhere. It covers Dockerfiles, images, layers, and containers with clear examples. Useful for devs struggling with environment issues during deployment.How I Tuned My CI/CD Pipeline To Be Done in 60 SecondsA solo developer reduced their GitHub Actions CI/CD pipeline from over 5 minutes to under 60 seconds using parallel jobs, caching, and Makefile tuning. They optimized builds, tests, and linting while managing GitHub's billable minutes. The result: fast, repeatable deploys with zero YAML debugging overhead.Ultimate DevOps Roadmap 2025: Learn Automation, ContainerizationThis guide lays out a step-by-step DevOps learning plan for 2025, covering scripting, cloud, CI/CD, Kubernetes, IaC, and AIOps. It includes timelines, open-source tools, and free resources for each topic. Useful for engineers building a modern, automation-driven skillset from scratch.📦 Kubernetes & Cloud NativeKubernetes v1.33 Fixes a 10-Year-Old Image Pull LoopholeKubernetes v1.33 closes a decade-old loophole that let pods reuse cached private images without valid pull credentials. With a new Kubelet flag, image access is now authorized even if the image already exists on the node. This improves security in multi-tenant clusters using private registries.Announcing etcd v3.6.0The first etcd minor release in four years adds full downgrade support, better memory efficiency, and removes the deprecated v2store. It introduces Kubernetes-style feature gates, livez/readyz probes, and SIG-etcd governance under Kubernetes. A 50% memory drop and ~10% throughput boost make it the most optimized and robust release to date.Kubernetes API Groups Explained Like You’re 5: Why They Matter (With Real Examples)This post simplifies Kubernetes API groups using familiar YAML examples like apps/v1 and rbac(.)authorization(.)k8s(.)io/v1. It breaks down how resources are grouped and versioned to help engineers better navigate manifests. A useful primer for anyone confused by Kubernetes API structure.Kubernetes Production ChecklistThis post offers a detailed checklist of proven Kubernetes production best practices—from health checks and autoscaling to RBAC, secrets, and observability. It covers what really matters for keeping systems secure, resilient, and scalable in real-world environments.Building Kubernetes (a lite version) from scratch in GoThis project walks through building a simplified Kubernetes clone in Go, recreating the control plane, scheduler, and kubelet logic using HTTP APIs and in-memory storage. It’s a hands-on way to demystify how reconciliation loops and pod lifecycles work under the hood.🔍 Observability & SREIntroducing the OTTL Playground for OpenTelemetryElastic has launched OTTL Playground, a browser-based tool for testing OpenTelemetry Transformation Language (OTTL) statements in real time. It lets users run processors like transform and filter, view diffs, logs, and JSON outputs, and safely test transformations without affecting production. It’s built with WebAssembly and offers shareable config links for easier collaboration.Last9 MCP Server: Fix Production Issues in Your Local EnvironmentLast9 has launched MCP Server, a tool that brings real production exceptions (with full context) into your local dev environment. It captures stack traces, request parameters, and environment variables so bugs can be reproduced and fixed precisely where you're coding. It integrates with AI agents in editors like Claude (via Cursor, Windsurf) to auto-suggest fixes, cutting debug time by over 35%.The Lost Fourth Pillar of ObservabilityCloudQuery argues that configuration data, unlike logs, metrics, and traces, offers crucial insights without needing instrumentation. It’s high-cardinality, API-collected, and best stored relationally. Monitoring config data helps track security posture, compliance, cost leaks, and infrastructure drift. Integrating it with traditional observability sharpens root cause analysis and preemptive alerting.A tcpdump Tutorial with ExamplesDaniel Miessler’s tutorial breaks down tcpdump into 50 real-world examples for capturing and analyzing network traffic. From filtering by IP, port, and protocol to saving captures and flag-specific filters, it’s a compact field guide for security engineers and SREs. Great for fast, precise troubleshooting from the command line.How Kubernetes Runs Containers : A Practical Deep DiveThis tutorial breaks down how Kubernetes runs containers by tracing a pod’s lifecycle on a Linux VM using k3s, crictl, and pstree. It shows how pods are just Linux processes isolated by namespaces and cgroups, with container runtimes like containerd managing their lifecycle. This clarity helps engineers debug resource limits, network issues, and process isolation at a low level.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 6552

Shreyans from Packt
26 May 2025
Save for later

Bash vs Python in Cloud Infrastructure

Shreyans from Packt
26 May 2025
By Donald TevaultCloudPro #93This week’s CloudPro is a guest special from Donald Tevault, the author of The Ultimate Linux Shell Scripting Guide.He’s written today's newsletter on Bash vs Python in Cloud Infrastructure, and why shell scripting still wins in a bunch of real-world cases. He compares real tasks in both languages and shows how often Bash just gets you there faster, with less setup and fewer surprises.If you want to go deeper, CloudPro readers get 40% off the ebook for the next 72 hours. Just use the code CLOUDPRO at checkout.Cheers,Shreyans SinghEditor-in-ChiefGET 40% OFF on eBOOKBash vs Python in Cloud Infrastructure,by Donald TevaultPython is a great programming language, and you can do a lot of awesome stuff with it.However, you might at times find that Python is more than you need, or more than you can easily learn. For such jobs, you might want to consider shell scripting instead.Let’s look at some specific reasons:To begin with, Python might not be installed on every workstation, server, IoT device, or container that you need to administer.On the other hand, every Linux, Unix, or Unix-like system that you’ll encounter has a shell already installed. Apart from bash, you’ll find Bourne Shell on BSD-type systems, and lightweight shells such as ash or dash on Linux for IoT devices.Now, let’s say that you have a Linux-based IoT device and you need to parse through its webserver logs. The tools you’ll need to do this with shell scripting are already there, but Python likely isn’t.Shell Script Portability versus Python PortabilityThe difference between bash and the other shells that I mentioned is that bash has some advanced features that the other shells lack. If you know that you’ll only need to run your scripts in a bash environment, then you can definitely take advantage of the extra bash features.But, by avoiding the bash-specific features, you can create shell scripts that will run on a wide variety of shells, including bash, ash, dash, or Bourne Shell. Fortunately, that’s not as hard as it would seem. For example, you can create variable arrays in bash, but not in the other shells. If you need cross-shell portability but also need the benefits of using an array, you can easily create a construct that simulates an array, and that has the same functionality. It’s easy-peasy once you know how.One portability problem with Python involves Python’s use of programming libraries that might or might not be installed on every device on which the Python script needs to run. In fact, you might have encountered this problem yourself if you’ve ever downloaded a Python script from GitHub. If you’re not a Python expert, it might take you a while to figure out how to install all of the required libraries.With shell scripting, you don’t need to worry about libraries, because shell scripts use the command-line utilities that already come installed on pretty much every Linux, Unix, or Unix-like system. Another problem is that scripts that were created for the old Python 2 aren’t always compatible with the new Python 3.Next, let’s talk about something that’s especially important to me personally.The Shell Scripting Learning Curve versus the Python Learning CurveIf you’re a DevOps person, you’ve likely already mastered Python. But, if you’re more into systems administration, there’s a good chance that you haven’t had much experience with Python. Fear not, because even if you’re lousy with learning programming languages, as I am, you can still learn how to do some awesome things with old-fashioned shell scripting. Even if you do know Python, you might find that certain jobs can be accomplished more quickly and easily with shell scripting than with Python.For example, here’s the script that I use to update and shut down the OpenMandriva workstation that I’m using right now:#!/bin/bashdnf -y distro-sync && shutdown now All this shell script contains is just the commands that I would normally run from the command line. With shell scripting, no coding skill at all is required for this.Working with text files is way easier with shell scripting. Let’s take this text file with a listing of classic automobiles:plymouthplymouthplymouthchevyfordvolvofordchevybmwbmwhondafordtoyotachevyfordjeepedselfordsatellitefurybreezemalibumustangs80thunderbirdmalibu325i325iaccordtaurusrav4impalaexplorerwranglercorsairgalaxie19701970199620001965199820031999198519852001200420021985200320031958196415473116604510215501156030101808525544712860025004300300010000985035003500450100060001700075015509500160075060The fields in this file represent the make, model, year, mileage in thousands of miles, and U.S. dollar value of each car. Now, let’s say we want to sort this file alphabetically and save the output to a new file. Here’s how you could do it with Python:#!/usr/bin/pythondef sort_file_content(in_path, out_path): lines = [] with open(in_path) as in_f: for line in in_f: lines.append(line) lines.sort() with open(out_path, 'w') as out_f: for line in lines: out_f.writelines(line)if __name__ == "__main__": input_file = "autos.txt" output_file = "sorted_autos.txt" sort_file_content(input_file, output_file)Here’s how you’d do it with a shell script:#!/bin/bashsort autos.txt > sorted_autos.txtEither way, we get the same results, which look like this:bmwbmwchevychevychevyedselfordfordfordfordfordhondajeepplymouthplymouthplymouthtoyotavolvo325i325iimpalamalibumalibucorsairexplorergalaxiemustangtaurusthunderbirdaccordwranglerbreezefurysatelliterav4s8019851985198519992000195820031964196520042003200120031996197019702002199811560855060472512845101530541167315418010245010001550350030007509500601000017000350060001600430025006007509850I think you see that doing this with shell scripting is way faster and easier. Finally, let’s see how shell scripting can help us with cloud operations.Shell Scripting for Cloud Operations Let’s say that you have a web server that’s running on either a VPS or a remote IoT device, and you want a list of IP addresses of clients that have accessed it, along with status codes and number of bytes transferred. Here’s a Python script that you might use for that:#!/usr/bin/pythonimport sysfrom dataclasses import dataclass@dataclass(frozen = True)class LogEntry: ip_address : str n_bytes : int status_code : intdef main(args): file_path = args[0] entries = parse_log_file(file_path) for e in entries: print(e)def parse_log_file(file_path): try: with open(file_path) as log_file: return [parse_log_line(line) for line in log_file] except OSError: abort(f'File not found: {file_path}')def parse_log_line(line): try: xs = line.split() return LogEntry(xs[0], int(xs[9]), int(xs[8])) except IndexError: abort(f'Invalid log file format: {file_path}')def abort(msg): print(msg, file = sys.stderr) exit(1)if __name__ == '__main__': main(sys.argv[1:])Here’s a bash script that does the same thing:#!/bin/bashecho "ip address, status code, number of bytes"cut -d" " -f 1,10,9 /var/log/httpd/access_logThat’s right. A simple, two-line shell script can take the place of that entire Python script. At any rate, the output of the shell script will look something like this:ip address, status code, number of bytes192.168.0.20 403 5760192.168.0.20 403 199192.168.0.20 200 4194192.168.0.20 200 5714192.168.0.20 404 196192.168.0.20 403 5760192.168.0.18 403 5760192.168.0.18 403 199192.168.0.18 200 4194192.168.0.18 200 5714192.168.0.18 404 196You can also create shell scripts to a`utomate management of your cloud services. For example, here’s a script that can start or stop an EC2 instance on Amazon Web Services:#!/bin/bashread -p "Enter the EC2 instance ID: " INSTANCE_IDread -p "Do you want to start or stop the instance? (start/stop): " ACTIONif [[ "$ACTION" == "start" ]]; then echo "Starting instance $INSTANCE_ID..." aws ec2 start-instances --instance-ids $INSTANCE_IDelif [[ "$ACTION" == "stop" ]]; then echo "Stopping instance $INSTANCE_ID..." aws ec2 stop-instances --instance-ids $INSTANCE_IDelse echo "Oops! Please type 'start' or 'stop'."fiWhen you run the script, just type in the instance ID at the first prompt, and then type either start or stop at the second prompt. This is a lot easier than typing the entire aws command every time you need to start or stop an instance. You can automate almost any other aws task in the same manner.ConclusionTo be sure, shell scripting has its limitations. For large, complex programs that require high performance, Python, or perhaps even a compiled language such as C, would be much better. But as I’ve just demonstrated, there are many times when bash scripting is definitely a much better choice.BASHPYTHONMore portableNo library installation requiredAlways availableBest for quick jobsNot the best for complex problemsGood performance for small jobs, but python is better for large jobsBetter for complex programmingVery flexibleGood performanceSteeper learning curvePortability problems between Python 2 and Python 3Dealing with libraries can be problematicNot always availableTo learn more about shell scripting, check out The Ultimate Linux Shell Scripting Guide by Donald:GET 40% OFF on eBOOKHi again, Shreyans here.Big thanks to Donald for putting this together. If you liked the piece, you’ll love his YouTube channel where he walks through practical Linux topics.He’s also written two other books worth your time:Linux Service Management Made Easy with systemdMastering Linux Security and HardeningThat’s all for now. Hope you found something useful in this issue!Cheers,ShreyansWhat did you think of this special issue📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 3841

Shreyans from Packt
15 Sep 2025
Save for later

Learn AI Platform Engineering

Shreyans from Packt
15 Sep 2025
From MongoDB's Director of Engineering CloudPro #107: Special Issue I'm interrupting our regular newsletter schedule today because something came up that I genuinely think you need to know about. I want to tell you about our event on September 27th that could be a game-changer for how you think about platform engineering. MongoDB's Director of Engineering shows you how to build AI-powered developer platforms that actually work at scale Exclusive 40% Off for CloudPro Subscribers Use code CLOUDPRO Here's why I'm personally excited about this: We've got George Hantzaras from MongoDB leading a 5-hour intensive on AI-Powered Platform Engineering. And when I say intensive, I mean it – this isn't another surface-level "AI is the future" talk. George is the Director of Engineering at MongoDB, speaks at Kubecon and HashiConf, and he's going deep into the practical stuff that actually matters. Agenda for the workshop: Self-Service Golden Paths – build workflows that reduce friction while keeping developer flexibility Knowledge as a Platform Capability – embed organizational knowledge with AI (RAG, context modeling) Intelligent Developer Portals – natural language interfaces and scaffolding services that understand developer needs AI-Driven Operations – anomaly detection, observability, and incident triage beyond traditional monitoring Why this matters for your daily work: If you’re working with monitoring stacks like Prometheus or Grafana, George’s approach to integrating runbooks, standards, and service catalogs into developer workflows will feel directly applicable. Exclusive 40% Off for CloudPro Subscribers Use code CLOUDPRO Our Panelists: We're not just doing sessions. We've also put together a panel: Ajay Chankramath – Founder, Platformetrics Dr. Gautham Pallapa – Principal Director, Cloud, Scotiabank Max Körbächer – Founder, Liquid Reply Together, they’ll unpack the real-world challenges and production patterns they’re seeing across industries. How You’ll Leave Prepared George is ending the day with something I've never seen at these events – a structured workshop to draft your actual 90-day pilot plan. You're walking out with a personalized roadmap, not just ideas. Why This Event is Different Focuses on implementation, not hype Gives you time to go deep (5 hours, not 50 minutes) Ends with an actionable plan, not just slide decks Exclusive 40% off for CloudPro subscribers Exclusive 40% Off for CloudPro Subscribers Use code CLOUDPRO Best, Shreyans Editor-in-Chief, CloudPro 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Shreyans from Packt
08 Sep 2025
Save for later

Batch Scoring on Azure ML

Shreyans from Packt
08 Sep 2025
5 Knobs That Save You from Nightly HeadachesCloudPro #106Hunt Threats, Recover Fast: Next-Gen Cyber Resilience for Google CloudJoin Hunt Threats, Recover Fast: Next-Gen Cyber Resilience for Google Cloud, a virtual event about going beyond traditional backup.You'll see:- Real-time ransomware detection and automated threat hunting for Google Cloud- Turbo Threat Hunting in action to trace attack paths and accelerate incident response- Streamlined recovery workflows that simplify protecting your Google Cloud workloadsSave Your SpotToday’s CloudPro is about the five batch-scoring knobs most engineers overlook. If you’ve ever watched a job stretch from minutes to hours and wondered why, this is where you start.This article is adapted fromChapter 5 ofHands-On MLOps on Azure. In that chapter, author Banibrata De dives into the gritty details of model deployment: batch scoring, real-time services, and the YAML settings that make the difference between smooth pipelines and midnight firefights.(The book goes much further, covering CI/CD pipelines, monitoring, governance, and even LLMOps across Azure, AWS, and GCP. CloudPro readers can grab it at the end of this piece with an exclusive discount.)Cheers,Shreyans SinghEditor-in-ChiefGET THE BOOKSAVE THIS ARTICLE AND READ LATERTuning Batch Jobs on Azure ML: 5 Knobs Every Engineer Should KnowSHARE THIS ARTICLEIt’s late. The batch run you trusted starts crawling. Dashboards spike, Slack pings light up, and you’re debating whether to kill the job or ride it out. You don’t need a re-platform. You need to tune the controls Azure ML already gives you.Below are thefive knobsthat tame throughput, flakiness, and costs. They live in your batch deployment YAML, and they work.1) mini_batch_size: The throttle for your workloadBatch jobs in Azure ML process data in chunks.mini_batch_sizecontrols how big each chunk is. Push it too high, and you’ll hit memory or I/O bottlenecks; keep it too low, and you’ll waste time on overhead. Think of it like loading a truck: too few boxes and you’re underutilizing space, too many and you risk breaking the axle. Getting this balance right often cuts hours off long-running jobs.2)max_concurrency_per_instance: How many cooks in the kitchenEach compute node can process tasks in parallel, but how many at once depends on its resources.max_concurrency_per_instanceis that dial. If you pack too much onto a single node, CPU and memory will thrash, and everything slows down. Start low, then gradually raise it while watching system metrics. The goal is steady throughput, not chaos.SAVE THIS ARTICLE AND READ LATER3)instance_count: Scale out, don’t just scale upEven with tuned concurrency, sometimes one node just isn’t enough. That’s whereinstance_countcomes in. It decides how many nodes you’ll spread the workload across. It’s the knob you turn when you need predictable completion times. For example, making sure the nightly run finishes before business hours. More nodes mean more cost, but also fewer late-night surprises.4)retry_settings: Resilience for the real worldIn batch jobs, things fail: a network hiccup, a corrupted file, a transient storage timeout. Without retries, the whole job can collapse because of one small blip.retry_settingslets you say, “Try again a few times before giving up.” Set sensible timeouts and retries per mini-batch so small failures don’t derail the entire pipeline.5)error_threshold: Fail smart, not earlyWhat happens if some data records are bad? By default, too many errors can abort the run. Witherror_threshold, you control how many you’ll tolerate. Setting it to-1tells Azure ML to ignore errors completely. For messy real-world datasets, this is a lifesaver: you can still ship 99% of results and deal with the outliers later, instead of losing the entire batch.Extra sanity checksRespect the contract:Batch jobs are built forfiles/blobs in, files/blobs out. Don’t try to wrap them around per-record HTTP calls.Keep scripts separate:Usebatch_score.pyfor batch andonline_score.pyfor real-time. Different handlers, different expectations.Watch metrics that matter:Throughput, per-batch latency, error rate, and CPU/GPU/memory use. Wire alerts so you’re not caught off-guard at 2 a.m.TakeawayBatch scoring doesn’t have to be a black box. Azure ML gives you the levers. You just have to use them. Tune these five settings, keep batch and online flows separate, and you’ll get faster, more reliable runs without babysitting every night.This walkthrough is pulled straight from Chapter 5 ofHands-On MLOps on Azure. The full book expands on everything here: deployments, monitoring, alerting, governance, pipelines, and operationalizing large language models responsibly.For the next48 hours, CloudPro readers get35% off the ebookand20% off print. If Azure ML is part of your stack, or about to be, this is the reference worth keeping open on your desk.GET THE BOOK📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Shreyans from Packt
01 Sep 2025
Save for later

AI alone won’t deliver the autonomous network

Shreyans from Packt
01 Sep 2025
By Daren FulwellCloudPro #1053rd Sept: Why Cloud, Why Now? Join Forrester & Atlassian to understand costs, risks, and AI opps.Learn MoreToday’s CloudPro is written by Daren Fulwell, Field CTO at IP Fabric. Daren has over 25 years of experience in networking, and he is well-known for his work mentoring engineers, and helping organizations bridge the gap between design fundamentals and modern automation.In today’s article, he explores why AI on its own won’t deliver true network autonomy. Instead, Daren shows how the real path forward lies in combining automation, AI agents, and a network digital twin, so teams can move beyond hype and build networks that are transparent, predictable, and genuinely autonomous.And if you’re looking to put these ideas into practice, we’ve just released theNetwork Automation Cookbook, Second Edition. Packed with over 100 hands-on recipes, it shows you how to automate network devices and cloud platforms using Ansible, AWX, Nautobot, and Terraform. It’s 30% off exclusively for CloudPro readers, for the next 72 hrs. Grab your copy and start building real-world automation workflows today.Cheers,Shreyans SinghEditor-in-ChiefSAVE THIS ARTICLE AND READ LATERAI alone won’t deliver the autonomous networkBy Daren FulwellSHARE THIS ARTICLEIn the network engineering community right now - as in most areas of IT - you can't escape the AI hype. We've been working to understand how network automation will change the way we operate our infrastructure, and agentic AI is being proposed as the missing piece of the puzzle. Folks in the know have been experimenting and making their results available in blog posts and Youtube videos for the rest of the world to salivate over. Finally, it looks like we have taken the right turn towards the self-driving network.Or have we? Are a handful of small-scale experiments with limited scope and even more limited capability proving anything? At best, there is a lot more investigation required, at worst the experiments that we don't see are proving that AI is not to be trusted with our critical infrastructure yet.Networks aren't just collections of individual devices that we configure and then they do what we tell them: they are interconnected, propagating their world view to their immediate neighbors and beyond, to create a "hive mind" behavior for the whole system. And in most cases, our networks are actually networks of networks - interconnected and sharing state information to extend that collective view from user to workload.In traditional network operations, this meant having multiple teams - with their own documentation and subject matter experts in the technologies and platforms - who all needed to interface to provide end-to-end service. Maintenance of the infrastructure required deep collaboration between teams and across silos. A thorough understanding of the networking technologies needed to be applied to tooling and documentation to ensure change impacts were tracked and understood.In the agentic AI world, this is taken to the next level. The work is divided up for agents to be given small, carefully-defined scopes to work within, making specific types of change or reporting on specific behaviors. But due to the distributed, interconnected nature of networking, none of those agents can work independently of the others: the effects caused by one will potentially be felt by them all. Without true collaboration between the agents, it becomes impossible for us to trust that they will give us the desired outcomes without humans (with an understanding of the infrastructure) manually checking everything they do.In short, AI agents cannot operate the network autonomously without some collective understanding of the end-to-end network.The Sources of Truth that we have been building for our network automation processes seem to fulfil at least elements of this need. But they alone are not enough as they really represent the desired state of the network, not its current operating state.Consider these four key requirements for that source of knowledge:We need a view of the end-to-end network in the form of structured data, with a well-documented schema, and able to be accessed over clearly defined APIs or protocols to provide a consistent end-to-end view to all agentsIt must be a complete and up to date view of the network as it is operating. There is little point in having a clear view of part of the infrastructure then little or no understanding of other parts when one can so heavily impact the operation of the other.Relationships and collective behavior are key to understanding how the network behaves: maintaining a list of devices (that may or may not be complete) and some data points about those devices may be useful but does not give the full pictureNetwork behavior is on the whole deterministic: a set of devices with specific state and connectivity should always behave the same way. So the data model must be based on facts collected from the network devices themselves, analysed and modelled as behavior - rather than being formed from conjecture, opinion or correlation of related events (the best you can expect from that is a general indication of direction)A true Network Digital Twin has all of these characteristics:SAVE THIS ARTICLE AND READ LATERA Network Source of Truth system has some of these, but misses the key aspect of understanding network behavior end-to- end. For example, consider:A change is required to enable Internet users gain access to an application hosted in a private DC. The external firewall is updated with NAT rules and policy changes to provide access; DNS changes are made; and routing is checked from the DC to ensure that traffic can be forwarded from user to frontend and back. But it still fails when the changes are pushed, because the security policy applied in the DC fabric only allows testing from internal hosts. The coordinated effort across multiple domains (read AI agents) has failed due to an incomplete view of the service dependencies.A DR exercise is under way, causing applications to be switched from one location to another. Load balancing rules are being changed to facilitate that and the virtual IP successfully moves traffic flows to the new location. Two of the four servers in the load balancing pool are working fine, so the pool is up and being serviced, but not at full capacity. While the remaining servers are up and the correct services are running, routing from the load balancer to those servers is not correct: using the Digital Twin this end-to-end behavior can be diagnosed in advance and remediation carried out to fix this before live traffic is diverted through this path.AI is going to change the way we operate networks. But in order to deliver its true potential, it needs not only to be able to deliver automated process, but to be fed real understanding of the networks it will operate in order to validate that it is doing what it needs to.If today’s article got you thinking about how to move fromtalking about automationto actually building it, you’ll want to check out our brand new release:Network Automation Cookbook, Second Edition.This updated edition is packed with over 100 hands-on recipes showing how to use Ansible, AWX, Nautobot, and Terraform to automate both on-prem and cloud networks. It’s written for engineers who want practical workflows, not just theory, and every recipe comes with reproducible labs so you can practice safely.As a CloudPro reader, you can grab it at30% offfor the next 72 hours.GET THE BOOK📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Shreyans from Packt
25 Aug 2025
Save for later

Why LXD beats VMs and Docker for Ubuntu dev

Shreyans from Packt
25 Aug 2025
By Ken VanDine CloudPro #104 In today's CloudPro, he looks at how developers can use LXD containers on Ubuntu to keep their setups clean, spin up secure environments quickly, and avoid the headaches of traditional VMs. Cheers, Shreyans Singh Editor-in-Chief Share This Article! Save This Article for Later Have you ever broken your Ubuntu setup by installing conflicting packages, or wasted an entire afternoon waiting for a virtual machine to boot just to test one library? These little frustrations are all too common for developers. Containers on Ubuntu offer a cleaner way forward, and with LXD, you can build fast, secure, and reproducible environments that don’t weigh down your system. Streamlining and Securing Development with Containers on Ubuntu Developers often add tools and services directly onto their Linux desktops because it feels quick and convenient. The problem is that every new service increases your system’s attack surface, especially if something opens a network port in the background. A safer, more efficient approach is to run your development environments in containers. You spin them up only when you need them, and they stay isolated from your daily workflow. In this article, we’ll discuss LXD, the Linux Container Daemon, due to its outstanding integration with Ubuntu. Conceptually, this would apply to other container technologies as well, but the usage would vary. LXD stands out as a powerful solution for developers using Ubuntu, providing a lightweight and flexible approach to containerization, offering a compelling alternative to traditional virtual machines and other container technologies. Why LXD instead of Docker or VMs? If you’ve worked with Docker or virtual machines, you may wonder why LXD matters. Here’s a quick comparison to put it into perspective: LXD vs Docker vs VMs at a glance That last row is the key: LXD gives you a lightweight “virtual machine–like” environment, without the VM bloat. Why Choose LXD for Development? LXD containers are specifically designed to meet the needs of developers, offering a unique set of benefits: Lightweight and Efficient: Unlike resource-intensive VMs that require full OS installations, LXD containers share the host kernel. This minimizes overhead, leading to significantly faster boot times, a reduced memory footprint, and improved overall performance. This efficiency is crucial for rapid iteration and testing across various environments without performance penalties. Image-Based Management: LXD's reliance on images for container creation transforms environment management. It enables effortless sharing, versioning, and reproducibility of development environments, ensuring consistency across different machines and simplifying collaboration. This approach streamlines workflows, allowing developers to spin up new environments with specific configurations and dependencies quickly. Security Fortified: LXD provides robust isolation through kernel namespaces and advanced security features, safeguarding the host system and other containers from potential vulnerabilities. This secure environment allows developers to focus on their code with peace of mind. Scalability and Flexibility: LXD excels in scalability, allowing developers to easily create multiple isolated environments for different projects, branches, or feature implementations. This fosters a highly organized and efficient development process, enabling rapid switching between environments without impacting other projects. Seamless Ubuntu Integration: LXD's tight integration with Ubuntu leverages the operating system's robust package management system and offers access to a vast repository of pre-built images and tools. This streamlines development and ensures compatibility with a wide range of software and libraries. Quick use case: Let’s say you want to try out PostgreSQL 17 without risking your workstation setup. Launch an LXD container, install PostgreSQL inside it, test it safely, and if things break, just roll back to a snapshot. Your main system stays untouched. Getting Started with LXD on Ubuntu Installing and configuring LXD on Ubuntu is straightforward, as it's available directly from the Snap Store. ken@monster:~$ sudo snap install lxd ken@monster:~$ sudo usermod -aG lxd "$USER" ken@monster:~$ newgrp lxd ken@monster:~$ lxd init --auto The lxd init --auto command initializes LXD with recommended settings. For more control, omit --auto to go through an interactive configuration process, allowing you to choose storage backends (like ZFS for advanced features or LVM for flexibility), configure network settings (bridge interfaces or NAT), and set up image remotes to access pre-built images. Essential LXD Container Management Commands LXD provides a comprehensive command-line interface (CLI) for managing containers: lxc launch <image> <name>: Creates and starts a new container from an image. lxc list: Displays a list of all running containers. lxc start/stop/restart <name>: Manages container lifecycle. lxc exec <name> -- <command>: Executes commands within a running container. lxc file push/pull <local_path> <remote_path>: Transfers files between the host and a container. For development, it's often more convenient and secure to run as an ordinary user with your home directory mapped into the container: ken@monster:~$ lxc launch ubuntu:25.04 plucky-devel -c raw.idmap="both $UID 1000" ken@monster:~$ lxc config device add plucky-devel homedir disk source=$HOME path=/home/ubuntu ken@monster:~$ lxc exec plucky-devel -- su -l ubuntu This configuration launches a container, maps your user ID, mounts your home directory, and provides a login shell as the ubuntu user inside the container, allowing you to use your favorite editor on your host system while executing code within the isolated container. Unlocking Advanced Features LXD offers features that further enhance development workflows: Remote Access: Manage containers remotely via the secure REST API. Networking Mastery: Configure virtual networks to isolate containers and simulate complex network topologies for testing. Storage Management: Optimize storage performance with different backends like ZFS or LVM. Profiles for Reusability: Define reusable profiles to simplify container creation with consistent configurations. Snapshots and Rollbacks: Capture container states to revert to previous working configurations, ideal for experimentation quickly. Moving and Migrating Containers: Easily move or migrate running containers between LXD hosts or even to different cloud providers. Pro tip: If you often create containers with similar settings, use profiles. They’ll save you from repeating the same config steps over and over. The Ultimate Ubuntu Handbook shows how to build reusable profiles for real projects. The Future of LXD LXD continues to evolve, with ongoing efforts to integrate with Kubernetes for seamless orchestration and improved virtualization support for demanding workloads. Enhanced security features and a developing web-based user interface (sudo snap set lxd ui.enable=true && sudo snap restart lxd) are also on the horizon, making LXD even more accessible and powerful for developers. The bottom line? LXD is a game-changer for developers on Ubuntu, offering a compelling blend of efficiency, security, and flexibility. By embracing LXD, developers can create efficient, reproducible, and secure environments that streamline workflows, enhance collaboration, and accelerate innovation. Share This Article! Ken’s walkthrough is just a glimpse into what’s possible when you bring containers into everyday development on Ubuntu. If you’d like to go deeper, his book The Ultimate Ubuntu Handbook is full of practical examples, from building secure containers to streamlining workflows and preparing for production deployments. It’s a guide designed to stay useful long after the first read. Get The Book 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Shreyans from Packt
18 Aug 2025
Save for later

The Hidden Platform Lesson Behind Airbnb’s Data Quality Framework

Shreyans from Packt
18 Aug 2025
A Blueprint for Smarter Platform Engineering CloudPro #103 Cheers, Shreyans Singh Editor-in-Chief Share This Article! AI Powered Platform Engineering Workshop Most platform teams hit the same wall: tools pile up, self-service breaks down, and platforms get rebuilt every 18 months. On September 16, join CNCF Ambassador Max Körbächer for a 3-hour live workshop on how to design internal platforms with AI baked in. LAUNCH OFFER: 40% OFF for 48 Hours Use Code: LIMITED40 The Hidden Platform Lesson Behind Airbnb’s Data Quality Framework A Blueprint for Smarter Platform Engineering Share This Article! A few years back, Airbnb hit a painful truth: a single data bug could quietly poison dashboards, mislead teams, and steer decisions the wrong way. To deal with it, Airbnb launched a Data Quality Initiative. The company rolled out Midas, a certification process for critical datasets, and made checks for accuracy, completeness, and anomalies mandatory. It sounded good on paper. But in practice, it quickly turned into a mess. Every team wrote checks in their own way: Hive, Presto, PySpark, Scala. There was no central view of what was covered, and updating rules meant editing code in a dozen different places. Teams duplicated effort, each building their own half-complete frameworks to run checks. And pipelines grew heavy: every check was an Airflow task, DAG files ballooned, and false alarms could block production jobs. Airbnb needed a better path. So, they built Wall, a single framework for checks. Instead of custom code, engineers wrote checks in YAML configs. An Airflow helper ran them, keeping logic separate from pipelines. Wall added support for blocking vs. non-blocking checks, so minor issues didn’t stop critical flows. And instead of burying results in logs, it sent them into Kafka for other systems to consume. The results were dramatic. Some pipelines shed more than 70% of their Airflow code. Teams stopped reinventing the wheel. Data-quality checks went from fragile and inconsistent to a paved path everyone could rely on. What Platform Engineers Can Learn from Airbnb 1. Standardize Pipelines and Checks In Mastering Enterprise Platform Engineering, authors Mark and Gautham makes a simple point: reliable AI doesn’t come from choosing the perfect model. It comes from the rails underneath: pipelines, integration layers, automation, and guardrails. The numbers are striking: clean pipelines alone can improve model performance by 30%, audits can cut errors by 90%, flexible integration can halve deployment time, and predictive automation can reduce downtime by 50%. Reliable AI starts with consistent pipelines and repeatable checks. When pipelines are clean and quality checks are routine, the entire system gets more predictable. That’s why standardized checks and audits can boost accuracy so dramatically. Airbnb learned this the hard way. Each team had its own approach to checks, spread across different engines. The duplication and inconsistency created constant drag. Wall fixed it by moving to YAML-based checks in a single framework. Suddenly, teams were speaking the same language. Some pipelines saw their DAGs shrink by more than 70%. 2. Decouple Checks from Workflow Code One of the biggest risks in complex systems is tangling logic together. When validation lives inside workflow code, every change increases fragility. By pulling checks out into their own layer, you gain flexibility, reuse, and resilience. Wall embodied this. Instead of clogging DAGs with checks, it made them independent services. Results flowed into Kafka, where other systems could consume them. Checks weren’t bound to pipelines anymore; they became a decoupled, reusable rail. 3. Close the Loop with Automation Validation without action is just noise. The real value comes when checks automatically trigger responses: scaling a service, blocking a bad job, or notifying the right team. This kind of predict→act loop is where platform engineering proves its worth. Wall pushed Airbnb in this direction. By publishing results as Kafka events, checks could plug into downstream tools that acted immediately. Instead of waiting for humans to parse dashboards, the system closed the loop itself. 4. Build Guardrails, Not Just Tests Not every failed check should bring the system down. The right approach is to design guardrails: rules that let you decide what’s a blocker and what’s not. This keeps the platform safe without making it brittle. Wall introduced blocking vs. non-blocking checks to solve this. Critical issues stopped the flow; minor ones didn’t. That simple design choice turned fragile pipelines into resilient ones. Guardrails, not tests, are what kept the system trustworthy. 5. Standardize Tools to Reduce Friction Every extra framework, every redundant tool, adds friction. Engineers spend more time maintaining and less time building. The fix is to standardize on a common set of rails, even if it means trade-offs. Airbnb saw this firsthand. With every team writing their own frameworks, they were duplicating effort and missing features. Wall gave them a single standard, cutting out wasted work. Once engineers stopped arguing about how to check data, they could focus on using it. Share This Article! This walkthrough was adapted from Mastering Enterprise Platform Engineering and connects to Packt’s AI-Powered Platform Engineering Workshop on September 16. It’s a live, 3-hour session led by CNCF Ambassador Max Körbächer, focused on how to build internal platforms with AI baked in: platforms that stay usable, scalable, and sustainable. CloudPro readers get 40% off tickets for the next 48 hours. LAUNCH OFFER: 40% OFF for 48 Hours Use Code: LIMITED40 Sponsored: Staying sharp in .NET takes more than just keeping up with release notes. You need practical tips, battle-tested patterns, and scalable solutions from experts who’ve been there. That’s exactly what you’ll find in .NETPro, Packt’s new newsletter, with a free eBook waiting for you as a welcome bonus. Sign Up. Join Christoffer Noring (Senior Advocate at Microsoft, GDE, Oxford tutor) for a hands-on 2.5h MCP workshop. Go beyond theory: build and deploy a real MCP client and server in Python, get a free MCP eBook, and leave with a certificate of completion. Reserve your spot. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Shreyans from Packt
02 Jun 2025
Save for later

How AWS Lambda Handles Billions of Async Requests Without Breaking a Sweat

Shreyans from Packt
02 Jun 2025
How Netflix stores 140 million hours of viewing data per dayCloudPro #94[Sponsored] Learn how your app could evolve automatically, leaving reverse engineers behind with every release.Register NowThis week’s CloudPro has a bunch of things that made me pause and go, “Wait, that’s possible?”📌A GitHub token leak that kicked off a supply chain attack targeting 100K+ repos📌Git tools quietly leaking your credentials with just a newline📌Kubernetes Ingress-NGINX bugs that might be hiding in your setup without you knowingThere’s also some great deep dives, like how Netflix handles 140 million hours of data every day, a homegrown Python bot that auto-heals K8s IP issues, and a hands-on post about cutting a $10K Glue bill down to $400 using Airflow.Hope a few of these help you solve something annoying or spark a weekend project.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityMultiple Vulnerabilities Found in Kubernetes Ingress-NGINXSeveral security flaws (CVEs) were found in the Kubernetes ingress-nginx controller. These issues do not affect Amazon EKS directly because EKS doesn’t include this controller by default. However, if customers manually installed it, they should update to the latest version. AWS has already alerted affected users.How a Leaked GitHub Token Sparked a Widespread Supply Chain Attack Targeting Coinbase and 100,000+ ReposAttackers pulled off a stealthy supply chain attack by leaking a GitHub token from a SpotBugs project, then using it to compromise other GitHub actions like reviewdog and tj-actions. They injected malicious code that silently spread through CI/CD workflows, eventually targeting Coinbase’s open-source project.GitHub Finds Critical ruby-saml Flaws Letting Attackers Bypass SSO and Hijack AccountsGitHub found two serious bugs in the ruby-saml library that let attackers bypass SAML authentication and potentially log in as any user. The problem came from how different XML parsers (REXML and Nokogiri) interpret the same data differently, letting attackers sneak in fake but valid-looking login info.Git Tools Exposed: Bugs in GitHub Desktop, LFS, and CLI Let Attackers Steal User CredentialsA security researcher found that several Git-related tools, including GitHub Desktop, Git Credential Manager, Git LFS, and GitHub CLI, had flaws that let attackers trick them into leaking stored credentials (like tokens or passwords) to malicious servers. Most issues stemmed from how these tools handled special characters like carriage returns or newlines in URLs, causing credentials meant for GitHub to be sent elsewhere.Microsoft Expands Security Copilot with AI Agents to Tackle Phishing, Insider Risks, and Shadow AI ThreatsMicrosoft has upgraded Security Copilot with AI agents that can now handle tasks like phishing detection, insider risk alerts, and vulnerability patching: automatically. These agents help security teams work faster and smarter, especially as cyberattacks become too complex and frequent for humans alone.Web Devs: Turn Your Knowledge Into IncomeBuild the knowledge base that will enable you to collaborate AI for years to come💰 Competitive Pay Structure⏰ Ultimate Flexibility🚀 Technical Requirements (No AI Experience Needed)Weekly payouts + remote work: The developer opportunity you've been waiting for!The flexible tech side hustle paying up to $50/hourApply Now⚙️ Infrastructure & DevOpsAWS Launches Amazon Q Scenarios in QuickSight to Bring Forecasting and What-If Analysis to EveryoneAWS has launched the new "scenarios" feature in Amazon Q for QuickSight, letting users analyze data trends, forecast outcomes, and run what-if simulations, all through simple natural language. You don’t need to be a data expert or use spreadsheets anymore. This tool helps teams make smarter decisions faster.How AWS Lambda Handles Billions of Async Requests Without Breaking a SweatWhen functions are called asynchronously, Lambda queues them, processes them later, and manages retries. For small apps, a single queue may be enough, but for massive scale, AWS uses smart techniques like consistent hashing and shuffle-sharding to separate workloads and reduce the risk of “noisy neighbors” affecting others.AWS CodeBuild Adds Parallel Test Execution to Drastically Speed Up CI PipelinesAWS just made it possible to run tests in parallel using CodeBuild, which means instead of testing code one piece at a time, you can test many pieces at once. This massively cuts down the time it takes for developers to know if their code works, making software updates much faster and less frustrating.How I reduced $10000 monthly AWS Glue bill to $400 using AirflowAkash and his team were spending $10,000/month running data pipelines on AWS Glue, but much of that cost came from paying for idle time. To fix it, they moved all those jobs to Apache Airflow running on EC2 and ECS, using Terraform to manage everything. It was tough—especially setting up workers, Redis, and autoscaling—but they pulled it off and slashed their bill to just $400/month.How to run Firecracker without KVM on cloud VMsNormally, to run lightweight virtual machines (like Firecracker microVMs), you need special hardware features (KVM) or expensive bare-metal cloud servers. But a new method called PVM (Pagetable Virtual Machine)—developed by Ant Group and Alibaba—lets you run Firecracker without KVM, even on cheaper cloud VMs that don’t support nested virtualization.📦 Kubernetes & Cloud NativeKubernetes launches kube-scheduler-simulatorWhen Kubernetes decides where to run an app (called a Pod), it uses a complex component called the scheduler. But understanding why the scheduler makes certain decisions has always been hard. It’s like a black box. This new tool, kube-scheduler-simulator, opens up that black box. It lets you simulate a real cluster and see exactly how the scheduler makes its choices.Kubernetes Launches JobSet to Simplify Large-Scale AI and HPC WorkloadsAs AI models get bigger, training them requires splitting the work across thousands of GPUs or TPUs spread over many servers. Kubernetes can help manage this, but its current tools aren't built to easily handle these complex, multi-part jobs. So, the Kubernetes team introduced JobSet, a new tool that makes it easier to run these distributed training jobs.Kubernetes 1.32 Unlocks Smarter, Safer Linux Swap SupportEarlier, Kubernetes completely disabled swap because it couldn't track memory usage well when swap was involved. But now, after years of progress, Kubernetes 1.32 is finally adding proper support for Linux swap memory, which lets systems use disk space as extra RAM to avoid crashes during memory spikes.How One Home Kubernetes User Beat ISP IP Changes with an Auto-Healing Python BotThe author runs a home Kubernetes setup and relies on a dynamic IP address from their internet provider, which can unexpectedly change. Since IP changes can break things like firewall rules or service configurations, they built a Python program that constantly monitors their IPs. If the IP changes, it automatically updates firewall settings and Kubernetes resources to keep everything running smoothly.Devtron + Argo CD: Enhancing GitOps without disruptionTeams are shipping code faster thanks to AI tools like GitHub Copilot, but their deployment systems, especially Argo CD, can’t keep up. Instead of replacing Argo CD, Devtron now integrates directly with it. This gives users more powerful deployment features like multi-cluster control, better security, and advanced rollout strategies, without breaking or migrating their existing setup.🔍 Observability & SREBuilding a Searchable, Structured Logging System for Real-World DebuggingThe author built a better logging system to help debug issues in a complex app. Instead of messy, inconsistent logs, they used structured logs that are easy to search, and even “canonical” logs that summarize everything about a request in one line. They sent these logs to tools like Loki and Clickhouse, so they could ask smart questions and actually learn from the data.How Netflix stores 140 million hours of viewing data per dayNetflix collects an enormous amount of viewing data every day: from what you watch to when you pause. As this data exploded, their original system started to slow down. So they redesigned it: recent data is stored fast and uncompressed, older data is compressed and moved to long-term storage, and less important data (like short previews) is filtered out.How to build the ultimate March Madness dashboard in GrafanaA techie March Madness fan built a real-time basketball tracking dashboard in Grafana that pulls live NCAA data, like scores and player stats, directly from public APIs. Using Grafana’s Infinity and Canvas plugins, they turned raw JSON into a jumbotron-style scoreboard that updates without refreshes.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 1
  • 0

Shreyans from Packt
04 Aug 2025
Save for later

ReplicaSet ≠ High Availability (Until You Test This)

Shreyans from Packt
04 Aug 2025
Pods fail, nodes go down. This walkthrough shows what actually happens, and how to fix it.CloudPro #102ReplicaSet ≠ High Availability (Until You Test This)30 second summary of today's CloudPro for you:Running your app in Kubernetes doesn’t automatically make it highly available. This walkthrough shows how ReplicaSets handle pod failures, node loss, and unhealthy containers, and what really happens behind the scenes when things go wrong. Adapted from The Kubernetes Bible.> 8-minute read> Hands-on commands included> Bonus at the end for readers like youCheers,Shreyans SinghEditor-in-ChiefShare This Article!The Problem: One Dead Pod, and Your App StallsLet’s say you’ve got a stateless NGINX app deployed in a multi-node Kubernetes cluster using a ReplicaSet. You think you’re covered because there are 4 replicas. But then you:delete a pod manuallydrain one of the nodessimulate a container failureIn all three cases, you’re expecting automatic recovery. But it’s not magic. It's ReplicaSet (and sometimes liveness probes) doing the heavy lifting.Let’s walk through all three failure modes and see what Kubernetes does.Pod Deletion? No Problem.This scenario demonstrates how a ReplicaSet restores deleted pods to maintain the desired number of replicas.Here's a step-by-step walkthrough:1. Define the ReplicaSet manifest: Save the following YAML as nginx-replicaset-example.yaml:apiVersion: apps/v1kind: ReplicaSetmetadata: name: nginx-replicaset-example namespace: rs-nsspec: replicas: 4 selector: matchLabels: app: nginx environment: test template: metadata: labels: app: nginx environment: test spec: containers: - name: nginx image: nginx:1.17 ports: - containerPort: 802. Create the namespace: This ensures all your resources are scoped properly.kubectl create -f ns-rs.yaml3. Deploy the ReplicaSet: The manifest defines a ReplicaSet with 4 NGINX pods.kubectl apply -f nginx-replicaset-example.yaml4.Delete a pod manually: Simulate a pod failure by deleting one of the running pods.kubectl delete pod <pod-name> -n rs-ns5.Verify that the ReplicaSet restores the pod: The controller detects the change and automatically spins up a new pod to maintain the desired count.kubectl get pods -n rs-nskubectl describe rs/nginx-replicaset-example -n rs-nsWithin seconds, the ReplicaSet controller notices the missing pod and recreates it to meet the declared replica count.Takeaway:ReplicaSets automatically maintain the number of desired pods, making recovery from manual deletions fast and hands-free.2. Node Failure? Here's What Actually HappensThis scenario demonstrates how ReplicaSets maintain high availability when a node goes down by rescheduling pods onto available nodes:Here's a step-by-step walkthrough:1. Expose your app with a Service:kubectl apply -f nginx-service.yamlThis creates a service to access your app across pods.2. Forward traffic from your local machine to the Kubernetes Service:kubectl port-forward svc/nginx-service 8080:80 -n rs-nscurl localhost:8080This confirms your service is working and traffic is flowing to the pods.3. Check where the pods are currently running:kubectl get pods -n rs-ns -o wideThis shows which node each pod is scheduled on.4. Simulate node failure by cordoning and draining the node:kubectl cordon kind-workerPrevents new pods from being scheduled on this node.kubectl drain kind-worker --ignore-daemonsetsEvicts all running pods from the node while ignoring daemonsets.kubectl delete node kind-workerRemoves the node from the cluster to simulate a full node failure.Within moments, the ReplicaSet detects the missing pods and spins up new ones on the remaining healthy nodes. Your Service automatically reroutes traffic to these new pods.5. Verify that everything is still working:kubectl get pods -n rs-ns -o widecurl localhost:8080You’ll see that traffic still flows, and the app remains accessible without downtime.Takeaway:The ReplicaSet ensures that the desired number of pod replicas is always maintained, even when a node goes offline. It handles pod rescheduling automatically, as long as there's sufficient capacity in your cluster.3. Unhealthy Container? Probes Save the DayLet’s see how Kubernetes handles an unhealthy container using liveness probes.Here's a step-by-step walkthrough:1. Add the following liveness probe to your ReplicaSet pod spec. It instructs the kubelet to check container health after 2 seconds and repeat every 2 seconds:livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 2 periodSeconds: 2Apply your updated ReplicaSet manifest and wait for the pod to be up and running.Simulate a container failure by deleting the default NGINX index file:kubectl exec -it <pod-name> -- rm /usr/share/nginx/html/index.htmlCheck what happens by describing the pod:kubectl describe pod <pod-name>You’ll see Liveness probe failed events, followed by automatic container restarts.Takeaway: The kubelet, not the ReplicaSet, manages container health. But when used with ReplicaSets, probes help create a resilient system that self-heals when a container goes bad.CleanupYou can delete the ReplicaSet and its pods:kubectl delete rs/nginx-replicaset-livenessprobe-exampleOr just delete the controller, leaving pods untouched:kubectl delete rs/nginx-replicaset-livenessprobe-example --cascade=orphanKey TakeawaysReplicaSets guarantee pod replication and replacement—not health checkingLiveness probes enable kubelet to restart broken containersNode failure recovery works if your cluster has enough capacity and replicas are spreadHA = ReplicaSets + Probes + Services, working in tandem👋This walkthrough was adapted from just one chapter of The Kubernetes Bible, Second Edition: a 720-page, hands-on guide to mastering Kubernetes across cloud and on-prem environments.If you’re tackling real production workloads or preparing for certs like CKA/CKAD/CKS, the book dives deeper into everything from ReplicaSets and Deployments to StatefulSets, autoscaling, Helm, traffic routing, and advanced security practices.For the next 72 hours, CloudPro readers get 30% off the ebook and 20% off print.Order NowSponsored:Curious how AI is changing secure coding? Join Sonya Moisset from Snyk on Aug 28 to explore real-world strategies for protecting your AI-driven SDLC and earn a CPE credit while you're at it. Register now.Want faster builds and better mobile apps? Learn proven CI/CD tips from Bitrise and Embrace experts to speed up development and ship higher-quality apps. Register here.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Success Subscribed successfully to !
You’ll receive email updates to every time we publish our newsletters.
Modal Close icon
Modal Close icon