Cross-modal search with images with text
In this section, we will cover an advanced example showcasing cross-modal search. Cross-modal search is a subtype of neural search, where the data we index and the data we search with belong to different modalities. This is something that is unique to neural search, as none of the traditional search technologies could easily achieve this. This is possible due to the central neural search technology: all deep learning models fundamentally transform all data types to the same shared numeric representation of a vector (the embedding extracted from a specific layer of the network).
These modalities can be represented by different data types: audio, text, video, and images. At the same time, they can also be of the same type, but of different distributions. An example of this could be searching with a paper summary and wanting to get the paper title. They are both texts, but the underlying data distribution is different. The distribution is thus...