Next-Gen Search powered Jina

What’s Neural Semantic Search exactly? Neural search uses intelligent methods to retrieve contextually and semantically relevant information. Neural search uses a trained neural network to do the exact same job as traditional data mining. Developers don’t need to create every rule.

This saves them both time and frustration. The system also trains itself to become better over time. Conventional Search and Neural Search Conventional Search is keyword-driven, symbolic search. This means that it doesn’t have the context. Because of its rigidly-coded rules engines, conventional search can be fragile. Conventional search is time-consuming and non-scalable because it requires the updating of existing rules to add new data. To implement conventional search, you need to have a high level of knowledge in the domain. Neural/Semantic search Neural searches are context-driven, meaning they can find semantically appropriate information. The flexibility of neural search allows it to adapt to any corner case and is resilient to noise. Neural search can, on the other side, train itself using past inferences/context which makes it extremely scalable and effective. To implement neural search, you don’t need any domain knowledge. What’s Jina? Jina is an cloud-native, neural search platform. You can deploy it in containers, on-prem, or clod. You can use it to search for anything from text-to–text, image–to-image and video–to–video. Jina works on the primitive data type of a document. You can use the input queries to locate what you are looking for, but documents only contain pieces of data from any given dataset. They are basically the output and input data to the Jina search processes. Jina core is made up of two major flows. Indexing flow: The indexing flow makes it possible to search the entire corpus by phrase. Indexing flows prepare and process the data for search. After the input documents have been processed and saved as searchable indexes, they are sent to the processing end. Querying Flow. A querying flow uses the input document of the user (primitive Jina) to return a list ranked matches that are similar to the query. This is based on the similarity score in the word embeddings. Jina Components Flow is a higher-level task. indexing, searching, training. A group of pods orchestrates them all to achieve a single task. A pod is an executor group that shares the same property. It allows multiple executors to be executed simultaneously and gives context and control. An algorithmic unit is Jina’s Executor. Executors can be used to formulate algorithms such as the conversion of images into vectors and storage on disk. Executor allows engineers and AI developers to focus solely on the algorithm by providing useful interfaces. These executors can be used in a variety of ways: Crafter: This is for the pre-processing, and then separating the document into smaller chunks. Encoder. Encoder uses the pre-processed input chuck of files from crafter to encode them into embedded vectors. Indexer: The indexer uses the encoded vectors to input, and then indexes the vectors. Ranker: This program runs on the index storage and sort the results according to a particular ranking. The data-type-agnostic search framework Jina allows you to work with all types of data, and can run multi-modal or cross-modal searches. This type of search uses a single modality. The type of input is the same as the output. It includes text-to–text, image-to–image, and audio-to–audio searches. A single modality is a search that only deals with one data type. This makes it more fragile and less adaptable to different inputs. Cross Modality Search allows you to search for relevant documents in modality A, let’s call it “image”, by querying documents in modality B (“text”)). Cross Modality is a group of programs that allows you to search for documents in one modality, such as “images” or “text”. Images can be combined with questions from another modality (e.g. text). Multi-Modality search: This allows you to combine information from multiple modalities in one space. You can also use it to find the relevant documents. Jina is able to search for different types of modes, unlocking a wealth of patterns. Jina can also be flexible to other types of data. Jina in action – A simple neural semantic search of textual data. This model was trained using data from random Wikipedia pages. Jina reads the input document, and then follows the Jina internal flows (Index followed by Querying), to create a search engine. Jina Core is a framework/tool that enables querying and indexing for each application. Language Model: The language model used here comes from the BERT(Bi-directional Encoder representation for Transformers) family, here we have used “distilbert-bert-cased” for understanding the context under the querying flow of Jina. JinaBox: JinaBox is a lightweight front-end component that can be customized to support data type-agnostic searches (text, audio and video). It can easily be connected to Jina’s backend, providing a user an efficient and simple interface for using the search engine. Python 3.7 is the environment used to develop the Jina Application. Exemple: We can search the “computer” keyword in this search box and receive the following results. Although the exact term “computer” is not mentioned anywhere in this indexed document it’s still interesting that the model can identify the sentences which are semantically or contextually related to computer. Jina does the magic.

If you would like to learn more or want to me write more on this subject, feel free to reach out. Twitter Next-Gen Search powered Jina originally appeared in on Medium. People are responding and highlighting this story. Published via

THE FOREFRONT OF TECHNOLOGY

We monitors and writes about new technologies in areas such as technology, innovation, digitization, space, Earth, IT and AI.

Related Posts

Leave a Reply