Introducing multimodal documents
Over the last decade, various types of data, such as texts, images, and audio, have been growing rapidly on the internet. Commonly, different types of data are associated with one piece of content. For example, images often also have textual tags and captions to describe the content. Therefore, the content has two modalities: image and text. A movie clip with subtitles has three modalities: image, audio, and text.
Jina is a data-type-agnostic framework, letting you work with any type of data and develop cross-modal and multimodal search systems. To better understand what this implies, it makes sense to first show how to represent documents of different data types, and then show how to represent multimodal documents in Jina.
Text document
To represent a textual document in Jina is quite easy. You can do it simply by using the following code:
from docarray import Document doc = Document(text='Hello World.')
In some cases, one...