Working with multimodal search
Multimodal search is another pre-built example that comes with the Jina installation, which you can run directly from the comfort of your command line without even getting into the code.
This example uses Kaggle’s public people image dataset (https://www.kaggle.com/ahmadahmadzada/images2000), which consists of 2,000 image-caption pairs. The data type here is a multimodal document containing multiple data types such as a PDF document that contains text and images together. Jina lets you build the search for multimodal data types with the same ease and comfort:
- To run this example from the command line, you need to install the following dependencies:
- pip install transformers
- pip install torch
- pip install torchvision
- pip install “jina[demo]”
- Once all the dependencies are installed, simply type the following command to launch the application:
jina hello multimodal
- After typing this command, you will see the following text...