New terms learned in this chapter
- Traditional search: Mostly applied to text retrieval. Measures the similarity by the weighted score of occurrences of a set of tokens from a query and documents.
- Indexing: The process of converting files that allow a rapid search and skipping the continuous scanning of all files.
- Searching: The process of conducting similarity score computation against a user query and indexed documents inside the document store and returning the top-k matches.
- Vector space model: A way to represent a document numerically. The dimension of the VSM is the number of distinct tokens in all documents. The value of each dimension is the weight of each term.
- TF-IDF: Term-Frequency Inverse Document Frequency is an algorithm that is intended to reflect how important a word is to a document in a collection of documents that are to be indexed.
- Machine learning: This refers to a technique that teaches computers to make decisions in a way that comes naturally to humans by enabling computers to learn the distribution of data and acquire new experience and knowledge.
- Deep neural networks: A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers that aims to predict, classify, or learn a compact representation (dense vector) of a piece of data.
- Neural search: Unlike symbolic search, neural search makes use of the representation (a dense vector) generated by DNNs and measures the similarity between a query vector and a document vector, returning the top-k matches based on certain metrics.