aaaaIntroductionMost search systems still rely on keyword matching. It works, until users describe intent in natural language and the right answer does not share the same words.Semantic search closes that gap by ranking results based on meaning. It converts both documents and queries into vector embeddings, then retrieves the closest matches using K-Nearest Neighbor search.In this article, I will cover how OpenSearch supports semantic search in practice. That includes the KNN plugin, ML Commons for managing models, and the ingest and search pipelines used to generate and apply embeddings. I will also touch on accuracy versus latency trade-offs and how to evaluate relevance.Semantic searchWe define semantic search as using vector embeddings for the corpus of documents, to fi nd mathematically close neighbors to a vector embedding computed for the user’s query text. Semantic search contrasts with lexical search, in which the query matches terms from the user’s query text to terms in the index. We call this process of finding mathematically close neighbors K-Nearest-Neighbors (KNN).OpenSearch provides KNN through its KNN plugin. The plugin supports storage engines and matching algorithms that index vector embeddings that you provide in your documents. The KNN plugin provides additional Query DSL query APIs for you to send a vector embedding as part of or as a whole query.The KNN plugin supports both exact KNN and approximate KNN. When you use exact KNN, OpenSearch compares the query vector to every embedding on every document that matches the query’s filters and scores and sorts the results based on vector distance. Approximate KNN uses various techniques to reduce the number of vector comparisons. We’ll cover KNN in the following sections. In the next section, you’ll learn about OpenSearch’s ml_commons plugin, which manages local-to-the-cluster Machine Learning (ML) models and provides a connector framework for ML models hosted on other platforms such as Amazon SageMaker or Amazon Bedrock. ML Commons and ML modelsInteraction with ML models is mediated by OpenSearch’s ml_commons plugin. You use ML Commons for loading models onto nodes in your cluster, and for connecting to models in third-party providers who provide APIs for calling them.The history of search includes many machine-learning techniques, and ML Commons supports models that perform different tasks.K-means: An iterative clustering algorithm that builds centroids for similar points.Linear regression: Fits a line to a set of inputs/outputs.Random Cut Forest (RCF): An unsupervised algorithm that learns normal variation in a series of points. OpenSearch’s Anomaly Detection (AD) uses RCF.RCF Summarize: Employs a hierarchical clustering technique that successively merges close points to create a collection of centroids.Localization: Detects anomalous or interesting points, such as spikes, over aggregate data.Logistic regression: Takes an input variable and predicts which of a class of outputs fits that variable.Metrics correlation (experimental): Detects correlations in groups of metric data, where there are anomalies closely related in time.Large language and multi-modal embedding models: These models create vector embeddings for text and images. OpenSearch can deploy several open source embedding models from Hugging Face and can access embedding models hosted on third-party platforms.Text generation models: Th ese models take an input prompt and generate a text response for that prompt.To use OpenSearch’s ML Commons plugin, you’ll set some cluster-wide settings that define where the plugin is deployed, which nodes perform ML processing, which features are enabled, and limits on memory consumed. Refer to os_client_factory.py, in the ch10 folder of the book’s repository to see the settings we used for the examples:allow_registering_model_via_url: Set True when you load models onto the cluster.only_run_on_ml_node: Set True when you want to run ml_commons on dedicated ML nodes. The book’s Docker compose file creates an ML node and sets this setting to True.In the next section, we cover OpenSearch’s Neural Search plugin, which helps simplify accessing and using ML models.OpenSearch’s Neural Search pluginThe Neural Search plugin provides a connector framework that enables communication with ML models, whether they are hosted on an ml node in the OpenSearch cluster, or they are hosted in a third-party model host such as Amazon SageMaker or Amazon Bedrock. The Neural Search plugin works with OpenSearch ingest pipelines and search pipelines, using a model ID that abstracts the underlying location and connection to the model itself.To use the Neural Search plugin for creating vector embeddings, you first define an ingest pipeline. Ingest pipelines are a native OpenSearch construct that enables you to chain together processors for data as OpenSearch ingests that data. There are a variety of processors (40 out of the box, and with additional plugins you can install) that perform functions such as mutating documents to add or remove fields, dropping documents, enriching documents with Geographic data, extracting text PDF documents or Microsoft Word documents, and more. In this chapter, you’ll use the text_embedding processor and the sparse_encoding processor.Ingest pipelines, ingest nodes, and Data PrepperIngest pipelines are a convenient way to mutate and enrich your data as OpenSearch processes that data. By default, OpenSearch will use data nodes as the compute resource to perform this processing. Data nodes have their own important use—responding to your queries! To avoid overloading your data nodes, consider deploying nodes with the ingest node role, separate from your data nodes. Th at way, you will localize the resource demands to those nodes. Even better, consider using Data Prepper, a component provided by the OpenSearch project. Data Prepper is a standalone component that can c onnect to many data sources, and perform Extract, Transform, Load (ETL) on your data. Amazon OpenSearch Service provides a managed version of Data Prepper, called Amazon OpenSearch Ingestion, that can greatly simplify collecting, processing, and loading data to OpenSearch or Amazon OpenSearch Service. At the time of writing, Data Prepper does not have a processor that can call out to ML model hosts, so this chapter uses ingest pipelines.Search pipelines similarly provide processors that you can use to orchestrate your query processing. Query processors act on the user’s query to perform tasks such as filtering or generating embeddings. Result processors act on the query’s result set to perform tasks such as normalizing and combining results for hybrid queries, or re-ranking queries.As you work through the examples in the chapter, you’ll see ingest and search processors in action. You’ll also see a new feature of OpenSearch 2.19—ingest and search workflows that let you define ingest and query processing with a simple, declarative syntax.In the next section, you’ll work through exact KNN queries to put the theory into practice.Exact K-Nearest-NeighborWhen you need to get the most accurate results, use exact KNN to query your data. An exact KNN query computes the distance between the search vector and each possible vector result. As with all queries, exact KNN queries scale latency with matches. Exact KNN is great for generating a gold judgment set, but not fast enough to be the right algorithm for large (million+) datasets.Precision, accuracy, and measuring the quality of search resultsTo figure out how successful the retrieval and ranking are, relevance engineers measure and compare precision and accuracy. Precision is the proportion of search results that are relevant to the query. Accuracy is the proportion of correct results the query returns. Another measure, cumulative gain, measures the aggregate relevance of the results. Most practitioners use a version of cumulative gain, Normalized Discounted Cumulative Gain (NDCG). NDCG includes the position of each result in the overall result, relative to a human-juried, ideal ordering. NDCG is usually relative to the number of results, so you’ll see metrics such as NDCG@10, meaning the gain at result 10. Relevance measures require that you have a “golden set” – a set of queries and results that you know to be correct. You use this golden set as a yardstick and compare your query results to fi gure out how good (or bad) they are. As we mentioned, you can use exact KNN search to prepare a golden set so that you can measure the accuracy you lose when you employ an approximate nearest neighbor search.Code walk-throughTo make the code easier to understand, and to support the various examples in this chapter, we’ve provided several utility modules that we’ll reuse throughout to support the examples. Take a moment to examine the ch10 folder. You’ll find the following:os_client_factory.py: Wraps the authentication and connection details the OpenSearch Python client needs to make API calls.cleanup.py: Use with caution! This utility cleans up models (--models), indices (--indices), and connectors (--connectors) that the example code creates. This can be helpful to reset your cluster, but beware of side effects or accidentally deleting these resources.movie_source.py: Provides a Python generator that produces a line at a time, with data cleaning and enrichment, from the movies file, and a generator that produces a batch that the OpenSearch Python client’s bulk helper sends to OpenSearch.index_utils.py: Provides a mapping and a function that creates an index in OpenSearch.model_utils.py: Provides definitions for models that OpenSearch can host locally, along with code to register and deploy models to the OpenSearch cluster.connection_utils.py: Provides definitions and code to deploy connectors to model hosts.auto_incrementing_counter.py: Provides a convenience class that increments a counter when cast to a string. Used for indicating progress when indexing the movies.Take a moment to examine these scripts. Next, we’ll dive into the code and explain the implementation of exact matching.Exact KNN searchThe exact.py file is the main entry point for the exact KNN example. If you look at that file, you’ll find a number of constants that define the model to use, the index to use, and the ingest pipeline that enables OpenSearch’s Neural Search plugin to create embeddings automatically. Its main function deploys the model to OpenSearch’s ml node and gets it ready for use. Then, it loads the movies in batches of 1,000 to OpenSearch.We’ve employed OpenSearch’s Neural Search plugin to facilitate the creation of vector embeddings for each movie. When you use the Neural Search plugin, you create an OpenSearch ingest pipeline, with a text_embedding processor that invokes the model for each document you ingest. The alternative is to use an offline batch process to transform the source data, adding vector embeddings. If you already have embeddings, you can simply load the data to your OpenSearch index with normal bulk API calls. The Neural Search plugin automates the process of creating embeddings at ingestion and search time.ingest_pipeline_definition = {
"description": "Embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "",
"field_map": {
EMBEDDING_SOURCE_FIELD_NAME:
EMBEDDING_FIELD_NAME
}}}]}This pipeline definition specifies a blank model_id (the code fills that in when it has a model ID) and a field_map, which tells the text_embedding ingest pipeline processor the source field for the text to compute an embedding, and the destination field for the embedding.pipeline_definition = deepcopy(ingest_pipeline_definition)
pipeline_definition['processors'][0]['text_embedding'][
'model_id'] = model_id
os_client.ingest.put_pipeline(
id=PIPELINE_NAME, body=pipeline_ definition)The code adds the model ID to the pipeline definition and uses the Python client to create the pipeline. To engage the pipeline, you define a default_pipeline in the index’s settings. The call to delete_then_create_index takes the pipeline name and adds it to the index settings in index_utils.py.[exact.py]
index_utils.delete_then_create_index(
os_client=os_client,
index_name=INDEX_NAME,
ingest_pipeline_name=PIPELINE_NAME,
additional_fields=KNN_FIELDS
)
[Index_utils.py]
settings['settings']['default_pipeline'] = ingest_pipeline_nameExecute the exact.py script:python exact.pyThe script downloads the Hugging Face all-MiniLM-L12-V2 model to the cluster’s ml node and deploys it. It uses the model ID to create an ingest pipeline that builds encodings for the movie data, using the title, plot, and genre fields, concatenated in the embedding_source field. The indexing takes about five minutes on a MacBook Pro, running Docker Desktop with 16 GiB of dedicated RAM. It then creates an embedding for and executes a hardcoded query, Sci-fi about the force and jedis. You can use the --query command-line argument to run your own queries. Each time you run the script, it deletes and recreates the index. To avoid this lengthy process, use the --skip-indexing command-line argument.Examine script_query. It is a script_score query that uses the built-in knn_score function to compare the query vector (embedded in the script) with the embedding on the document. The score is based on the cosinesimil metric, which compares the cosines of the vectors. The query itself is a match_all query, scoring the vector distance against every document.Even though you didn’t query for star wars, the results include six of the Star Wars franchise movies in the results. That’s because “the force” and “jedis” are frequent terms in Star Wars movie titles and plots. As we highlighted in the Semantic search section of this chapter, these words are common contexts for those movies. You’ll also see that Ninja Strike Force is the third result, an interesting connection from the term “force.” You can imagine that jedis and ninjas are closely related in context as well.In the next section, we cover the combination of filters and vector search. In the section following that, you’ll apply filters to try to retrieve more relevant results.Exact KNN queries with filtersAs the last example illustrated, you sometimes want to filter out irrelevant documents, or filter in relevant documents. The vector embeddings don’t have a direct correlation with the fields of your document that you don’t send to the LLM for embedding. Fields such as the genres field in the movie data have values with their own semantic meanings that only loosely correlate with the plot field. Instead, you apply a filter to a knn query for those fields.There are three methods for applying filters in OpenSearch, and their usage and output depend on the nearest-neighbor method you’re using:Pre-filtering: For exact KNN queries, you can provide a filter as part of the query, and OpenSearch will first filter, then apply vector search to the results.Post-filtering: For approximate nearest neighbor, you can apply a post-filter. You’ll see examples in the following sections. You can also wrap a knn query inside a bool query with a post-filter.Efficient filtering: For some approximate nearest neighbor engines, OpenSearch provides just-in-time filtering, which applies the filter as it encounters nearest neighbors.You apply a fi ter to an exact KNN query through a filter clause in the query. Examine the code in exact.py to find the filtered_query_script query body. We’ve changed match_all to a bool query with a filter for sci-fi movies and a rating of 6.9 or better: "query": {
"script_score": {
"query": {
"bool": {
"filter": [
{ "term": {
"genres.keyword": "Sci-Fi"
}},
{ "range": {"rating": {"gte": 6.9}
}}]}},You can run this query by adding the --filtered parameter to the command line:python exact.py --skip-indexing --filteredThe search results now contain the six Star Wars movies, along with Iron Man. In this case, the filters remove movies that are not in the sci-fi genre or have low ratings.Exact KNN b rings you the most accurate results but scales poorly for latency.ConclusionSemantic search changes the retrieval problem from term overlap to vector similarity. Done well, it produces results that track user intent more reliably than purely lexical approaches.OpenSearch provides the building blocks to operationalize this. The KNN plugin handles vector indexing and retrieval, ML Commons manages model deployment and connectivity, and the Neural Search plugin helps integrate embedding generation into pipelines.The trade-off is that relevance work becomes more disciplined. Exact KNN is useful for evaluation and building golden sets, while approximate methods are typically required at scale, especially when latency matters.If you want the full, end-to-end walkthrough, including the surrounding architecture choices, implementation details, and the broader context for how these pieces fit together, this article is adapted from The Definitive Guide to OpenSearch by Jon Handler, Soujanya Konka, and Prashant Agrawal. The book goes deeper into production patterns and practical examples you can apply directly to real systems.Author BioJon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon's career as a software developer included four years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a Ph.D. in Computer Science and Artificial Intelligence from Northwestern University.Soujanya Konka is an accomplished Senior Solutions Architect at AWS with over 17 years of experience working with databases and analytics. She focuses on big data services and has worked with customers to migrate, right-size and enhance search using OpenSearch. Prior to joining AWS, Soujanya worked on large scale data warehouse and search migrations to the cloud and building enterprise search catalogs. Her journey has been one of continuous learning, innovation and equipping herself with skills and insights to tackle data challenges.Prashant Agrawal is a seasoned Search Specialist Solutions Architect at AWS based out of Seattle. With over 13 years of invaluable experience in the field of search and log analytics, he brings a wealth of knowledge to every project where he collaborates closely with clients to facilitate seamless migration and fine-tune OpenSearch clusters for optimal performance and cost savings. Beyond his tech prowess, he's an avid explorer, often found immersing himself in travel adventures and discovering new places. In essence, he thrives on the mantra: Eat Travel Repeat, making each journey a delightful experience. Join him on a transformative journey where every search and analytics challenge become a rewarding adventure.
Read more