How to encode multimodal documents
After defining the document for different types of data, the next step is to encode the documents into vector embeddings using a model. Formally, embedding was a multi-dimension of a document (often a [1, D]
vector), which was designed to contain the content information of a document. With current advances in the performance of all the deep learning methods, even general-purpose models (for example, CNN models trained on ImageNet) can be used to extract meaningful feature vectors. In the following sections, we will show how to encode embedding for documents of different modalities.
Encoding text documents
To convert textual documents into vectors, we can use the pretrained Bert model (https://www.sbert.net/docs/pretrained_models.html) provided by Sentence Transformer (https://www.sbert.net/), as shown in the following example:
from docarray import DocumentArray from sentence_transformers import SentenceTransformer da = DocumentArray(......