Reader small image

You're reading from  Neural Search - From Prototype to Production with Jina

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781801816823
Edition1st Edition
Right arrow
Authors (6):
Jina AI
Jina AI
author image
Jina AI

Jina AI is a neural search company that provides cloud-native neural search solutions powered by AI and deep learning. It provides an open-source neural search ecosystem for businesses and developers, enabling everyone to search for information in all kinds of data with high availability and scalability.
Read more about Jina AI

Bo Wang
Bo Wang
author image
Bo Wang

Bo Wang is a machine learning engineer at Jina AI. He has a background in computer science, especially interested in the field of information retrieval. In the past years, he has been conducting research and engineering work on search intent classification, search result diversification, content-based image retrieval, and neural information retrieval. At Jina AI, Bo is working on developing a platform for automatically improving search quality with deep learning. In his spare time, he likes to play with his cats, watch anime, and play mobile games.
Read more about Bo Wang

Cristian Mitroi
Cristian Mitroi
author image
Cristian Mitroi

Cristian Mitroi is a machine learning engineer with a wide breadth of experience in full stack, from infrastructure to model iteration and deployment. His background is based in linguistics, which led to him focusing on NLP. He also enjoys, and has experience in, teaching and interacting with the community, and has given workshops at various events. In his spare time, he performs improv comedy and organizes too many pen-and-paper role-playing games.
Read more about Cristian Mitroi

Feng Wang
Feng Wang
author image
Feng Wang

Feng Wang is a machine learning engineer at Jina AI. He received his Ph.D. from the department of computer science at the Hong Kong Baptist University in 2018. He has been a full-time R&D engineer for the past few years, and his interests include data mining and artificial intelligence, with a particular focus on natural language processing, multi-modal representation learning, and recommender systems. In his spare time, he likes climbing, hiking, and playing mobile games.
Read more about Feng Wang

Shubham Saboo
Shubham Saboo
author image
Shubham Saboo

Shubham Saboo has taken on multiple roles, from a data scientist to an AI evangelist, at renowned firms across the globe, where he was involved in building organization-wide data strategies and technology infrastructure to create and scale data teams from scratch. His work as an AI evangelist has led him to build communities and reach out to a broader audience to foster the exchange of ideas and thoughts in the burgeoning field of AI. As part of his passion for learning new things and sharing knowledge with the community, he writes technical blogs on the advancements in AI and its economic implications. In his spare time, you can find him traveling the world, which enables him to immerse himself in different cultures and refine his worldview.
Read more about Shubham Saboo

Susana Guzmán
Susana Guzmán
author image
Susana Guzmán

Susana Guzmán is the product manager at Jina AI. She has a background in computer science and for several years was working at different firms as a software developer with a focus on computer vision, working with both C++ and Python. She has a big interest in open source, which was what led her to Jina, where she started as a software engineer for 1 year until she got a clear overview of the product, which made her make the switch from engineering to PM. In her spare time, she likes to cook food from different cuisines around the world, looking for her new favorite dish.
Read more about Susana Guzmán

View More author details
Right arrow

Learning Jina’s Basics

In the previous chapter, we learned about neural search, and now we can start thinking about how to work with it and the steps we’ll need to take to implement our own search engine. However, as we saw in previous chapters, in order to implement an end-to-end search solution, time and effort will be needed to gather all of the resources required. This is where Jina can help as it will take care of many of the necessary tasks, letting you focus on the design of your implementation.

In this chapter, you will understand the core concepts of Jina: Documents, DocumentArrays, Executors, and Flow. You will see each of them in detail and understand their overall design and how they connect.

We’re going to cover the following main topics:

  • Exploring Jina
  • Documents
  • DocumentArrays
  • Executors
  • Flow

By the end of this chapter, you will have a solid understanding of idioms in Jina, what they are, and how to use them...

Technical requirements

This chapter has the following technical requirements:

  • A laptop with a minimum of 4 GB of RAM, ideally 8 GB
  • Python 3.7, 3.8, or 3.9 installed on a Unix-like operating system, such as macOS or Ubuntu

Exploring Jina

Jina is a framework that helps you build deep learning search systems on the cloud using state-of-the-art models. Jina is an infrastructure that allows you to focus only on the areas that you are interested in. In this way, you don’t need to be involved in every aspect of building a search engine. This involves everything from pre-processing your data to spinning up microservices if needed. Another good thing about neural search is that you can search for any kind of data regardless of type. Here are some examples of how you can search using different data types:

  • Image-to-image search
  • Text-to-image search
  • Question answering search
  • Audio search

Building your own search engine can be very time-consuming, so one of the core goals of Jina is reducing the time you would need if you were going to build one from scratch. Jina is designed in a layered way that lets you focus only on the specific parts that you need, letting the rest of the infrastructure...

Documents

In Jina, Documents are the most basic data type you can work with. They are the data you want to use and can be used for indexing and/or querying. They can be made with whatever data type you require, such as text, gifs, PDF files, 3D meshes, and so on.

We will use Documents to index and query, but since Documents can be of any type and size, it’s likely that we will need to divide them before use.

As an analogy, think of a Document as a chocolate bar. There are several types of chocolate: white, dark, milk, and so on. Likewise, a Document can be of several types, such as audio, text, video, a 3D mesh, and so on. Also, if we have a big chocolate bar, we will probably divide it into smaller pieces before eating it. Accordingly, if we have a big Document, we should divide it into smaller pieces before indexing.

This is how a Document looks in Python code:

from jina import Document
document = Document()

As you can see, all you need to create a Document...

DocumentArray

Another powerful concept in Jina is the DocumentArray, which is a list of Document objects. If you need multiple Documents, you can group them all together in a list using DocumentArray. You can use a DocumentArray as a regular list in Python with all of the usual methods, such as insert, delete, construct, traverse, and sort. The DocumentArray is a first-class citizen to an Executor, serving as its input and output. We will talk about Executors in the next section, but for now, think of them as the way Jina processes Documents.

Constructing a DocumentArray

You can construct, delete, insert, sort, and traverse a DocumentArray like a Python list. You can create these in different ways:

from jina import DocumentArray, Document
documentarray = DocumentArray([Document(), Document()])
from jina import DocumentArray, Document
documentarray = DocumentArray((Document() for _ in range(10))
from jina import DocumentArray, Document
documentarray1 = DocumentArray((Document...

Executors

The Executor represents the processing component in a Jina Flow. It performs a single task on a Document or DocumentArray. You can think of an Executor as the logical part of Jina. Executors are the ones that will perform tasks of all kinds on a Document. For example, you could have an Executor for extracting text from a PDF file, or for encoding audio for your Document. They handle all of the algorithmic tasks in Jina.

Since Executors are one of the main parts of Jina, and they are the ones that perform all the algorithmic tasks, it would be very useful for you to make them in a way that means they could be easily shared with other people, so that others can re-use your work. Similarly, you could use prebuilt Executors made by someone else in your own code. This is in fact possible because Executors are easily available in a marketplace, which in Jina is called Jina Hub (https://hub.jina.ai/). There you can browse between various Executors that solve different...

Flow

Now that you know what Documents and Executors are and how to work with them, we can start to talk about Flow, one of the most important concepts in Jina.

Think of Flow as a manager in Jina; it takes care of all the tasks that will run on your application and will use Documents as its input and output.

Creating a Flow

The creation of a Flow in Jina is very easy and works just like any other object in Python. For example, this is how you would create an empty Flow:

from jina import Flow
f = Flow()

In order to use a Flow, it’s best to always open it as a context manager, just like you would open a file in Python, by using the with function:

from jina import Flow
f = Flow()
with f:     
f.block()

Note

Flow follows a lazy construction pattern: it won’t actually run unless you use the with function to open it.

Adding Executors to a Flow

To add elements to your Flow, all you need to do is use the .add() method. You...

Summary

This chapter introduced the main concepts in Jina: Document, DocumentArray, Flow, and Executor. You should now have an overview of what each of those concepts are, why they are important, and how they relate to each other.

Besides understanding the theory of why Document, DocumentArray, Flow, and Executor are important while building your search engine, you should also be able to create a simple Document and assign its corresponding attributes.  As you are done with this chapter, you should also be able to create your own Executor and spin up a basic Flow.

You will use all of this knowledge in the next chapter, where you will learn how to integrate these concepts together.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Neural Search - From Prototype to Production with Jina
Published in: Oct 2022Publisher: PacktISBN-13: 9781801816823
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (6)

author image
Jina AI

Jina AI is a neural search company that provides cloud-native neural search solutions powered by AI and deep learning. It provides an open-source neural search ecosystem for businesses and developers, enabling everyone to search for information in all kinds of data with high availability and scalability.
Read more about Jina AI

author image
Bo Wang

Bo Wang is a machine learning engineer at Jina AI. He has a background in computer science, especially interested in the field of information retrieval. In the past years, he has been conducting research and engineering work on search intent classification, search result diversification, content-based image retrieval, and neural information retrieval. At Jina AI, Bo is working on developing a platform for automatically improving search quality with deep learning. In his spare time, he likes to play with his cats, watch anime, and play mobile games.
Read more about Bo Wang

author image
Cristian Mitroi

Cristian Mitroi is a machine learning engineer with a wide breadth of experience in full stack, from infrastructure to model iteration and deployment. His background is based in linguistics, which led to him focusing on NLP. He also enjoys, and has experience in, teaching and interacting with the community, and has given workshops at various events. In his spare time, he performs improv comedy and organizes too many pen-and-paper role-playing games.
Read more about Cristian Mitroi

author image
Feng Wang

Feng Wang is a machine learning engineer at Jina AI. He received his Ph.D. from the department of computer science at the Hong Kong Baptist University in 2018. He has been a full-time R&D engineer for the past few years, and his interests include data mining and artificial intelligence, with a particular focus on natural language processing, multi-modal representation learning, and recommender systems. In his spare time, he likes climbing, hiking, and playing mobile games.
Read more about Feng Wang

author image
Shubham Saboo

Shubham Saboo has taken on multiple roles, from a data scientist to an AI evangelist, at renowned firms across the globe, where he was involved in building organization-wide data strategies and technology infrastructure to create and scale data teams from scratch. His work as an AI evangelist has led him to build communities and reach out to a broader audience to foster the exchange of ideas and thoughts in the burgeoning field of AI. As part of his passion for learning new things and sharing knowledge with the community, he writes technical blogs on the advancements in AI and its economic implications. In his spare time, you can find him traveling the world, which enables him to immerse himself in different cultures and refine his worldview.
Read more about Shubham Saboo

author image
Susana Guzmán

Susana Guzmán is the product manager at Jina AI. She has a background in computer science and for several years was working at different firms as a software developer with a focus on computer vision, working with both C++ and Python. She has a big interest in open source, which was what led her to Jina, where she started as a software engineer for 1 year until she got a clear overview of the product, which made her make the switch from engineering to PM. In her spare time, she likes to cook food from different cuisines around the world, looking for her new favorite dish.
Read more about Susana Guzmán