aaaaIntroductionMost search systems still rely on keyword matching. It works, until users describe intent in natural language and the right answer does not share the same words.Semantic search closes that gap by ranking results based on meaning. It converts both documents and queries into vector embeddings, then retrieves the closest matches using K-Nearest Neighbor search.In this article, I will cover how OpenSearch supports semantic search in practice. That includes the KNN plugin, ML Commons for managing models, and the ingest and search pipelines used to generate and apply embeddings. I will also touch on accuracy versus latency trade-offs and how to evaluate relevance.Semantic searchWe define semantic search as using vector embeddings for the corpus of documents, to fi nd mathematically close neighbors to a vector embedding computed for the user’s query text. Semantic search contrasts with lexical search, in which the query matches terms from the user’s query text to terms in the index. We call this process of finding mathematically close neighbors K-Nearest-Neighbors (KNN).OpenSearch provides KNN through its KNN plugin. The plugin supports storage engines and matching algorithms that index vector embeddings that you provide in your documents. The KNN plugin provides additional Query DSL query APIs for you to send a vector embedding as part of or as a whole query.The KNN plugin supports both exact KNN and approximate KNN. When you use exact KNN, OpenSearch compares the query vector to every embedding on every document that matches the query’s filters and scores and sorts the results based on vector distance. Approximate KNN uses various techniques to reduce the number of vector comparisons. We’ll cover KNN in the following sections. In the next section, you’ll learn about OpenSearch’s ml_commons plugin, which manages local-to-the-cluster Machine Learning (ML) models and provides a connector framework for ML models hosted on other platforms such as Amazon SageMaker or Amazon Bedrock. ML Commons and ML modelsInteraction with ML models is mediated by OpenSearch’s ml_commons plugin. You use ML Commons for loading models onto nodes in your cluster, and for connecting to models in third-party providers who provide APIs for calling them.The history of search includes many machine-learning techniques, and ML Commons supports models that perform different tasks.K-means: An iterative clustering algorithm that builds centroids for similar points.Linear regression: Fits a line to a set of inputs/outputs.Random Cut Forest (RCF): An unsupervised algorithm that learns normal variation in a series of points. OpenSearch’s Anomaly Detection (AD) uses RCF.RCF Summarize: Employs a hierarchical clustering technique that successively merges close points to create a collection of centroids.Localization: Detects anomalous or interesting points, such as spikes, over aggregate data.Logistic regression: Takes an input variable and predicts which of a class of outputs fits that variable.Metrics correlation (experimental): Detects correlations in groups of metric data, where there are anomalies closely related in time.Large language and multi-modal embedding models: These models create vector embeddings for text and images. OpenSearch can deploy several open source embedding models from Hugging Face and can access embedding models hosted on third-party platforms.Text generation models: Th ese models take an input prompt and generate a text response for that prompt.To use OpenSearch’s ML Commons plugin, you’ll set some cluster-wide settings that define where the plugin is deployed, which nodes perform ML processing, which features are enabled, and limits on memory consumed. Refer to os_client_factory.py, in the ch10 folder of the book’s repository to see the settings we used for the examples:allow_registering_model_via_url: Set True when you load models onto the cluster.only_run_on_ml_node: Set True when you want to run ml_commons on dedicated ML nodes. The book’s Docker compose file creates an ML node and sets this setting to True.In the next section, we cover OpenSearch’s Neural Search plugin, which helps simplify accessing and using ML models.OpenSearch’s Neural Search pluginThe Neural Search plugin provides a connector framework that enables communication with ML models, whether they are hosted on an ml node in the OpenSearch cluster, or they are hosted in a third-party model host such as Amazon SageMaker or Amazon Bedrock. The Neural Search plugin works with OpenSearch ingest pipelines and search pipelines, using a model ID that abstracts the underlying location and connection to the model itself.To use the Neural Search plugin for creating vector embeddings, you first define an ingest pipeline. Ingest pipelines are a native OpenSearch construct that enables you to chain together processors for data as OpenSearch ingests that data. There are a variety of processors (40 out of the box, and with additional plugins you can install) that perform functions such as mutating documents to add or remove fields, dropping documents, enriching documents with Geographic data, extracting text PDF documents or Microsoft Word documents, and more. In this chapter, you’ll use the text_embedding processor and the sparse_encoding processor.Ingest pipelines, ingest nodes, and Data PrepperIngest pipelines are a convenient way to mutate and enrich your data as OpenSearch processes that data. By default, OpenSearch will use data nodes as the compute resource to perform this processing. Data nodes have their own important use—responding to your queries! To avoid overloading your data nodes, consider deploying nodes with the ingest node role, separate from your data nodes. Th at way, you will localize the resource demands to those nodes. Even better, consider using Data Prepper, a component provided by the OpenSearch project. Data Prepper is a standalone component that can c onnect to many data sources, and perform Extract, Transform, Load (ETL) on your data. Amazon OpenSearch Service provides a managed version of Data Prepper, called Amazon OpenSearch Ingestion, that can greatly simplify collecting, processing, and loading data to OpenSearch or Amazon OpenSearch Service. At the time of writing, Data Prepper does not have a processor that can call out to ML model hosts, so this chapter uses ingest pipelines.Search pipelines similarly provide processors that you can use to orchestrate your query processing. Query processors act on the user’s query to perform tasks such as filtering or generating embeddings. Result processors act on the query’s result set to perform tasks such as normalizing and combining results for hybrid queries, or re-ranking queries.As you work through the examples in the chapter, you’ll see ingest and search processors in action. You’ll also see a new feature of OpenSearch 2.19—ingest and search workflows that let you define ingest and query processing with a simple, declarative syntax.In the next section, you’ll work through exact KNN queries to put the theory into practice.Exact K-Nearest-NeighborWhen you need to get the most accurate results, use exact KNN to query your data. An exact KNN query computes the distance between the search vector and each possible vector result. As with all queries, exact KNN queries scale latency with matches. Exact KNN is great for generating a gold judgment set, but not fast enough to be the right algorithm for large (million+) datasets.Precision, accuracy, and measuring the quality of search resultsTo figure out how successful the retrieval and ranking are, relevance engineers measure and compare precision and accuracy. Precision is the proportion of search results that are relevant to the query. Accuracy is the proportion of correct results the query returns. Another measure, cumulative gain, measures the aggregate relevance of the results. Most practitioners use a version of cumulative gain, Normalized Discounted Cumulative Gain (NDCG). NDCG includes the position of each result in the overall result, relative to a human-juried, ideal ordering. NDCG is usually relative to the number of results, so you’ll see metrics such as NDCG@10, meaning the gain at result 10. Relevance measures require that you have a “golden set” – a set of queries and results that you know to be correct. You use this golden set as a yardstick and compare your query results to fi gure out how good (or bad) they are. As we mentioned, you can use exact KNN search to prepare a golden set so that you can measure the accuracy you lose when you employ an approximate nearest neighbor search.Code walk-throughTo make the code easier to understand, and to support the various examples in this chapter, we’ve provided several utility modules that we’ll reuse throughout to support the examples. Take a moment to examine the ch10 folder. You’ll find the following:os_client_factory.py: Wraps the authentication and connection details the OpenSearch Python client needs to make API calls.cleanup.py: Use with caution! This utility cleans up models (--models), indices (--indices), and connectors (--connectors) that the example code creates. This can be helpful to reset your cluster, but beware of side effects or accidentally deleting these resources.movie_source.py: Provides a Python generator that produces a line at a time, with data cleaning and enrichment, from the movies file, and a generator that produces a batch that the OpenSearch Python client’s bulk helper sends to OpenSearch.index_utils.py: Provides a mapping and a function that creates an index in OpenSearch.model_utils.py: Provides definitions for models that OpenSearch can host locally, along with code to register and deploy models to the OpenSearch cluster.connection_utils.py: Provides definitions and code to deploy connectors to model hosts.auto_incrementing_counter.py: Provides a convenience class that increments a counter when cast to a string. Used for indicating progress when indexing the movies.Take a moment to examine these scripts. Next, we’ll dive into the code and explain the implementation of exact matching.Exact KNN searchThe exact.py file is the main entry point for the exact KNN example. If you look at that file, you’ll find a number of constants that define the model to use, the index to use, and the ingest pipeline that enables OpenSearch’s Neural Search plugin to create embeddings automatically. Its main function deploys the model to OpenSearch’s ml node and gets it ready for use. Then, it loads the movies in batches of 1,000 to OpenSearch.We’ve employed OpenSearch’s Neural Search plugin to facilitate the creation of vector embeddings for each movie. When you use the Neural Search plugin, you create an OpenSearch ingest pipeline, with a text_embedding processor that invokes the model for each document you ingest. The alternative is to use an offline batch process to transform the source data, adding vector embeddings. If you already have embeddings, you can simply load the data to your OpenSearch index with normal bulk API calls. The Neural Search plugin automates the process of creating embeddings at ingestion and search time.ingest_pipeline_definition = { "description": "Embedding pipeline", "processors": [ { "text_embedding": { "model_id": "", "field_map": { EMBEDDING_SOURCE_FIELD_NAME: EMBEDDING_FIELD_NAME }}}]}This pipeline definition specifies a blank model_id (the code fills that in when it has a model ID) and a field_map, which tells the text_embedding ingest pipeline processor the source field for the text to compute an embedding, and the destination field for the embedding.pipeline_definition = deepcopy(ingest_pipeline_definition) pipeline_definition['processors'][0]['text_embedding'][ 'model_id'] = model_id os_client.ingest.put_pipeline( id=PIPELINE_NAME, body=pipeline_ definition)The code adds the model ID to the pipeline definition and uses the Python client to create the pipeline. To engage the pipeline, you define a default_pipeline in the index’s settings. The call to delete_then_create_index takes the pipeline name and adds it to the index settings in index_utils.py.[exact.py] index_utils.delete_then_create_index( os_client=os_client, index_name=INDEX_NAME, ingest_pipeline_name=PIPELINE_NAME, additional_fields=KNN_FIELDS ) [Index_utils.py] settings['settings']['default_pipeline'] = ingest_pipeline_nameExecute the exact.py script:python exact.pyThe script downloads the Hugging Face all-MiniLM-L12-V2 model to the cluster’s ml node and deploys it. It uses the model ID to create an ingest pipeline that builds encodings for the movie data, using the title, plot, and genre fields, concatenated in the embedding_source field. The indexing takes about five minutes on a MacBook Pro, running Docker Desktop with 16 GiB of dedicated RAM. It then creates an embedding for and executes a hardcoded query, Sci-fi about the force and jedis. You can use the --query command-line argument to run your own queries. Each time you run the script, it deletes and recreates the index. To avoid this lengthy process, use the --skip-indexing command-line argument.Examine script_query. It is a script_score query that uses the built-in knn_score function to compare the query vector (embedded in the script) with the embedding on the document. The score is based on the cosinesimil metric, which compares the cosines of the vectors. The query itself is a match_all query, scoring the vector distance against every document.Even though you didn’t query for star wars, the results include six of the Star Wars franchise movies in the results. That’s because “the force” and “jedis” are frequent terms in Star Wars movie titles and plots. As we highlighted in the Semantic search section of this chapter, these words are common contexts for those movies. You’ll also see that Ninja Strike Force is the third result, an interesting connection from the term “force.” You can imagine that jedis and ninjas are closely related in context as well.In the next section, we cover the combination of filters and vector search. In the section following that, you’ll apply filters to try to retrieve more relevant results.Exact KNN queries with filtersAs the last example illustrated, you sometimes want to filter out irrelevant documents, or filter in relevant documents. The vector embeddings don’t have a direct correlation with the fields of your document that you don’t send to the LLM for embedding. Fields such as the genres field in the movie data have values with their own semantic meanings that only loosely correlate with the plot field. Instead, you apply a filter to a knn query for those fields.There are three methods for applying filters in OpenSearch, and their usage and output depend on the nearest-neighbor method you’re using:Pre-filtering: For exact KNN queries, you can provide a filter as part of the query, and OpenSearch will first filter, then apply vector search to the results.Post-filtering: For approximate nearest neighbor, you can apply a post-filter. You’ll see examples in the following sections. You can also wrap a knn query inside a bool query with a post-filter.Efficient filtering: For some approximate nearest neighbor engines, OpenSearch provides just-in-time filtering, which applies the filter as it encounters nearest neighbors.You apply a fi ter to an exact KNN query through a filter clause in the query. Examine the code in exact.py to find the filtered_query_script query body. We’ve changed match_all to a bool query with a filter for sci-fi movies and a rating of 6.9 or better: "query": { "script_score": { "query": { "bool": { "filter": [ { "term": { "genres.keyword": "Sci-Fi" }}, { "range": {"rating": {"gte": 6.9} }}]}},You can run this query by adding the --filtered parameter to the command line:python exact.py --skip-indexing --filteredThe search results now contain the six Star Wars movies, along with Iron Man. In this case, the filters remove movies that are not in the sci-fi genre or have low ratings.Exact KNN b rings you the most accurate results but scales poorly for latency.ConclusionSemantic search changes the retrieval problem from term overlap to vector similarity. Done well, it produces results that track user intent more reliably than purely lexical approaches.OpenSearch provides the building blocks to operationalize this. The KNN plugin handles vector indexing and retrieval, ML Commons manages model deployment and connectivity, and the Neural Search plugin helps integrate embedding generation into pipelines.The trade-off is that relevance work becomes more disciplined. Exact KNN is useful for evaluation and building golden sets, while approximate methods are typically required at scale, especially when latency matters.If you want the full, end-to-end walkthrough, including the surrounding architecture choices, implementation details, and the broader context for how these pieces fit together, this article is adapted from The Definitive Guide to OpenSearch by Jon Handler, Soujanya Konka, and Prashant Agrawal. The book goes deeper into production patterns and practical examples you can apply directly to real systems.Author BioJon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon's career as a software developer included four years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a Ph.D. in Computer Science and Artificial Intelligence from Northwestern University.Soujanya Konka is an accomplished Senior Solutions Architect at AWS with over 17 years of experience working with databases and analytics. She focuses on big data services and has worked with customers to migrate, right-size and enhance search using OpenSearch. Prior to joining AWS, Soujanya worked on large scale data warehouse and search migrations to the cloud and building enterprise search catalogs. Her journey has been one of continuous learning, innovation and equipping herself with skills and insights to tackle data challenges.Prashant Agrawal is a seasoned Search Specialist Solutions Architect at AWS based out of Seattle. With over 13 years of invaluable experience in the field of search and log analytics, he brings a wealth of knowledge to every project where he collaborates closely with clients to facilitate seamless migration and fine-tune OpenSearch clusters for optimal performance and cost savings. Beyond his tech prowess, he's an avid explorer, often found immersing himself in travel adventures and discovering new places. In essence, he thrives on the mantra: Eat Travel Repeat, making each journey a delightful experience. Join him on a transformative journey where every search and analytics challenge become a rewarding adventure.

AWS launched a DevOps Agent that actually debugs production for you CloudPro #115 Elevate Your Cloud Security Strategy with Dark Reading As a cloud security professional, you need cutting-edge insights to stay ahead of evolving vulnerabilities. The Dark Reading daily newsletter provides in-depth analysis of cloud vulnerabilities, advanced threat detection, and risk mitigation strategies. Stay informed on zero-trust architecture, compliance frameworks, and securing complex multi-cloud and hybrid environments. Signup to newsletter In today's issue, we'll look at: Google pushes Kubernetes to 130K nodes (yes, really), AWS launches an AI agent that debugs production while you sleep, and the network jobs market sends mixed signals: AI certs pay 12% more while automation threatens to eliminate a fifth of IT roles. Plus, hard lessons from recent AWS and Cloudflare outages that went global from single subsystem failures. Cheers, Shreyans Singh Editor-in-Chief Ransomware Just Hit your AWS Cloud. What Happens Next? Join us for an immersive simulation that’ll let you experience a fictionalized ransomware attack, without any of the actual consequences. You'll witness: -The first suspicious alert -The shocking depth of the breach -A heart-stopping realization about compromised backups -The impossible choice: pay or rebuild? Don't just hear about ransomware. Experience it. Learn how to be truly cyber resilient. Save My Spot This Week in Cloud How Google Built a Kubernetes Cluster with 130,000 nodes Google has been testing GKE at 130,000 nodes, twice their official support limit. They're hitting 1,000 pods/sec scheduling throughput with P99 startup under 10 seconds, used Kueue to preempt 39K pods in 93 seconds when priorities shifted, and kept the control plane stable with 1M+ objects in the datastore. The architectural wins: consistent reads from cache (KEP-2340), snapshottable API server cache (KEP-4988), and Spanner-backed storage handling 13K QPS just for lease updates. This matters because we're moving from chip-limited to power-limited infrastructure. One GB200 pulls 2.7KW, so at 100K+ nodes you're talking hundreds of megawatts across multiple data centers. Google's betting on multi-cluster orchestration becoming the norm (MultiKueue, managed DRANET). Gang scheduling via Kueue now, native Kubernetes support coming (KEP-4671). Sneak Peek into Kubernetes v1.35 K8s 1.35 is finally killing off cgroup v1 support. If you're still running nodes on ancient distros without cgroup v2, your kubelet won't start. Also deprecating ipvs mode in kube-proxy since maintaining feature parity became impossible; nftables is the way forward on Linux. On the features side: in-place pod resource updates hitting GA (no more pod restarts for cpu/memory changes), native pod certificates for mTLS without needing SPIFFE/SPIRE, numeric comparisons for taints (finally can do SLA-based scheduling with Gt/Lt operators), user namespaces maturing through beta (container root remapped to unprivileged host UID), and image volumes likely enabled by default (mount OCI artifacts directly as volumes). Node declared features are going alpha too - nodes will publish supported capabilities to avoid version skew scheduling failures. AWS launched a DevOps Agent that actually debugs production for you No more 3am war rooms might actually be realistic now. DevOps Agent is a "frontier agent" that runs autonomously for hours investigating incidents while you sleep. It connects to CloudWatch, Datadog, Dynatrace, GitHub/GitLab, ServiceNow, and builds an application topology map automatically. When stuff breaks, it correlates metrics/logs/deployments, identifies root causes, updates Slack channels, and suggests mitigations. It has a web app for operators to manually trigger investigations or steer the agent mid-investigation. The interesting part: it analyzes past incidents to recommend systematic improvements (multi-AZ gaps, monitoring coverage, deployment pipeline issues). It also creates detailed mitigation specs that work with agentic dev tools. Supports custom tool integration via MCP servers for your internal systems. AWS will manage your Argo CD, ACK, and KRO now AWS just launched EKS Capabilities: fully managed versions of Argo CD, AWS Controllers for Kubernetes (ACK), and Kube Resource Orchestrator (KRO) that run in AWS-owned accounts, not your cluster. They handle scaling, patching, upgrades, and breaking change analysis automatically. SSO with IAM Identity Center for Argo CD, ACK has resource adoption for migrating from Terraform/CloudFormation, KRO for building reusable resource bundles. This is basically AWS saying "stop running your own GitOps infrastructure." Makes sense given 45% of K8s users already run Argo CD in production (per 2024 CNCF survey). Early Bird Offer: Get 40% Off Use code EARLY40 Early Bird Offer: Get 40% Off Use code EARLY40 Deep Dive Controlling Kubernetes Network Traffic Ingress NGINX is retiring and it got me thinking about how convoluted network traffic control has become in Kubernetes. You've got your CNI for connectivity, network policies for security, ingress controllers or Gateway API for north-south routing, maybe a service mesh for east-west traffic, and honestly most apps don't need all of this. The real decision most people face is simpler: ingress controller vs Gateway API. Here's the thing: if you just need basic HTTP/HTTPS routing and you're already comfortable with nginx or Traefik, stick with ingress controllers. They work, they're stable, tooling is mature. Gateway API makes sense if you need advanced stuff like protocol-agnostic routing, cross-namespace setups, or you're running multi-team environments where role separation matters. All three clouds (AWS ALB Controller, Azure AGIC, GKE Ingress) have solid managed options for both approaches now. Gateway API is clearly the future, but "future-proof" doesn't mean you need to migrate today. Network jobs roundup: AI certs pay, skills gap persists, mixed employment signals The network jobs market is weird right now. AI certifications are commanding 12% higher pay year-over-year while overall IT skills premiums dropped 0.7%. CompTIA just launched AI Infrastructure and AITECH certs, Cisco added wireless-only tracks (CCNP/CCIE Wireless launching March 2026). Meanwhile unemployment for tech workers sits at 2.5-3% depending on who's counting, but large enterprises keep announcing layoffs while small/midsize companies are actually hiring. Skills gap is real though- 68% of orgs say they're understaffed in AI/ML ops, 65% in cybersecurity. Telecom lost 59% of positions to automation, and survey data shows 18-22% of IT workforce could be eliminated by AI in the next 5 years. But demand for AI/ML, cloud architecture, and security skills keeps growing. The takeaway: upskill in AI and automation or get left behind, especially if you're in support, help desk, or legacy infrastructure roles. Three Lessons from the Recent AWS and Cloudflare Outages AWS US-EAST-1 went down for 15 hours in October (DNS race condition in DynamoDB), Cloudflare ate it in November (oversized Bot Management config file crashed proxies globally). Both followed the same pattern: small defect in one subsystem cascaded everywhere. The lessons are obvious but worth repeating: design out single points of failure with multi-region/multi-cloud by default, use AI-powered monitoring to correlate signals and automate rollback (monitoring without automated response is just expensive alerting), and actually practice your DR plan regularly because you fall to the level of your practice, not rise to your runbook. The deeper point: complexity keeps growing with every new region and service, multiplying ways a small change can blow up globally. The answer is designing for failure: limit blast radius, decouple planes, automate validation. No provider is immune, so your architecture needs to assume failures will happen and route around them automatically. Test your DR plan with chaos engineering, not hope- Google SRE Practice Lead Google's SRE team wrote a piece on why your disaster recovery plan probably doesn't work and how chaos engineering proves it. The premise: systems change constantly (microservices, config updates, API dependencies), so that DR doc you wrote last quarter is already outdated. Chaos engineering lets you run controlled experiments—simulate database failovers, regional outages, resource exhaustion, and measure if you actually meet your SLOs during the disaster. It's not about breaking things randomly. You define steady state, form a hypothesis (like "traffic will failover to secondary region in 3 minutes with <1% errors"), inject a specific failure, and measure what happens. The key insight is connecting chaos to SLOs. Traditional DR drills might "pass" because backup systems came online, but if it took 20 minutes and burned your entire error budget, customers saw you as down. Start small with one timeout or retry test, build confidence, scale from there. Stelvio: AWS for Python devs Stelvio is a Python framework that lets you define AWS infrastructure in pure Python with smart defaults handling the annoying bits. Run stlv init, write your infra in Python (DynamoDB tables, Lambda functions, API Gateway routes), hit stlv deploy and you're done. No Terraform, no CDK yaml hell, no mixing infrastructure code with application code. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

Semantic Search in OpenSearch

How Google Built a Kubernetes Cluster with 130,000 nodes

WebDevPro #122: Rise of LLM Powered agents

AI Is Reshaping BI Faster Than Dashboards Ever Did

Are AI Agents Ready for Real Work Yet?

#228: See You Soon!

Our Newsletters

_SecPro

_BI-Pro

AI_Distilled

Attack & Defend

_ProgrammingPro

_WebDevPro

AlgoFinance

_DataPro

PythonPro

_CloudPro

SalesforcePulse

_MobilePro

_SysAdminPro

Featured Issues