All Projects → CVxTz → Image_search_engine

CVxTz / Image_search_engine

Licence: mit
Image search engine

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Image search engine

Curatedseotools
Best SEO Tools Stash
Stars: ✭ 128 (-27.68%)
Mutual labels:  search-engine
Search Engine Google
🕷 Google client for SERPS
Stars: ✭ 138 (-22.03%)
Mutual labels:  search-engine
Sf1r Lite
Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search
Stars: ✭ 158 (-10.73%)
Mutual labels:  search-engine
Collector Http
Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Stars: ✭ 130 (-26.55%)
Mutual labels:  search-engine
Datasets
🎁 3,000,000+ Unsplash images made available for research and machine learning
Stars: ✭ 1,805 (+919.77%)
Mutual labels:  search-engine
Ambar
🔍 Ambar: Document Search Engine
Stars: ✭ 1,829 (+933.33%)
Mutual labels:  search-engine
Downloadsearch
search for any kinds of files to download
Stars: ✭ 124 (-29.94%)
Mutual labels:  search-engine
Dot Hugo Documentation Theme
Dot - Hugo Documentation Theme
Stars: ✭ 162 (-8.47%)
Mutual labels:  search-engine
Poseidon
A search engine which can hold 100 trillion lines of log data.
Stars: ✭ 1,793 (+912.99%)
Mutual labels:  search-engine
Tis Solr
an enterprise search engine base on Apache Solr
Stars: ✭ 158 (-10.73%)
Mutual labels:  search-engine
Rated Ranking Evaluator
Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
Stars: ✭ 134 (-24.29%)
Mutual labels:  search-engine
Cosmos Search
🌱 The next generation unbiased real-time privacy and user focused code search engine for everyone; Join us at https://discourse.opengenus.org/
Stars: ✭ 137 (-22.6%)
Mutual labels:  search-engine
Awesome Deep Learning Papers For Search Recommendation Advertising
Awesome Deep Learning papers for industrial Search, Recommendation and Advertising. They focus on Embedding, Matching, Ranking (CTR prediction, CVR prediction), Post Ranking, Transfer, Reinforcement Learning, Self-supervised Learning and so on.
Stars: ✭ 136 (-23.16%)
Mutual labels:  search-engine
Instantsearch Android
A library of widgets and helpers to build instant-search applications on Android.
Stars: ✭ 129 (-27.12%)
Mutual labels:  search-engine
Bm25
A Python implementation of the BM25 ranking function.
Stars: ✭ 159 (-10.17%)
Mutual labels:  search-engine
Swift Selection Search
Swift Selection Search (SSS) is a simple Firefox add-on that lets you quickly search for some text in a page using your favorite search engines.
Stars: ✭ 125 (-29.38%)
Mutual labels:  search-engine
Search
An Open Source Search Engine
Stars: ✭ 139 (-21.47%)
Mutual labels:  search-engine
Rusticsearch
Lightweight Elasticsearch compatible search server.
Stars: ✭ 171 (-3.39%)
Mutual labels:  search-engine
Covid Papers Browser
Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖
Stars: ✭ 161 (-9.04%)
Mutual labels:  search-engine
Caiss
跨平台/多语言的 相似向量/相似词/相似句 高性能检索引擎。功能强大,使用方便。欢迎star & fork。Build together! Power another !
Stars: ✭ 142 (-19.77%)
Mutual labels:  search-engine

Building a Deep Image Search Engine using tf.Keras

Motivation :

Imagine having a data collection of hundreds of thousands to millions of images without any metadata describing the content of each image. How can we build a system that is able to find a sub-set of those images that best answer a user’s search query ?
What we will basically need is a search engine that is able to rank image results given how well they correspond to the search query, which can be either expressed in a natural language or by another query image.
The way we will solve the problem in this post is by training a deep neural model that learns a fixed length representation (or embedding) of any input image and text and makes it so those representations are close in the euclidean space if the pairs text-image or image-image are “similar”.

Data set :

I could not find a data-set of search result ranking that is big enough but I was able to get this data-set : http://jmcauley.ucsd.edu/data/amazon/ which links E-commerce item images to their title and description. We will use this metadata as the supervision source to learn meaningful joined text-image representations. The experiments were limited to fashion (Clothing, Shoes and Jewelry) items and to 500,000 images in order to manage the computations and storage costs.

Problem setting :

The data-set we have links each image with a description written in natural language. So we define a task in which we want to learn a joined, fixed length representation for images and text so that each image representation is close to the representation of its description.

Model :

The model takes 3 inputs : The image (which is the anchor), the image title+description ( the positive example) and the third input is some randomly sampled text (the negative example).
Then we define two sub-models :

  • Image encoder : Resnet50 pre-trained on ImageNet+GlobalMaxpooling2D
  • Text encoder : GRU+GlobalMaxpooling1D

The image sub-model produces the embedding for the Anchor **E_a **and the text sub-model outputs the embedding for the positive title+description E_p and the embedding for the negative text E_n.

We then train by optimizing the following triplet loss:

L = max( d(E_a, E_p)-d(E_a, E_n)+alpha, 0)

Where d is the euclidean distance and alpha is a hyper parameter equal to 0.4 in this experiment.

Basically what this loss allows to do is to make **d(E_a, E_p) small and make d(E_a, E_n) **large, so that each image embedding is close to the embedding of its description and far from the embedding of random text.

Visualization Results :

Once we learned the image embedding model and text embedding model we can visualize them by projecting them into two dimensions using tsne (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html ).

Test Images and their corresponding text description are linked by green lines

We can see from the plot that generally, in the embedding space, images and their corresponding descriptions are close. Which is what we would expect given the training loss that was used.

Text-image Search :

Here we use few examples of text queries to search for the best matches in a set of 70,000 images. We compute the text embedding for the query and then the embedding for each image in the collection. We finally select the top 9 images which are the closest to the query in the embedding space.

These examples show that the embedding models are able to learn useful representations of images and embeddings of simple composition of words.

Image-Image Search :

Here we will use an image as a query and then search in the database of 70,000 images for the examples that are most similar to it. The ranking is determined by how close each pair of images are in the embedding space using the euclidean distance.

The results illustrate that the embeddings generated are high level representations of images that capture the most important characteristics of the objects represented without being excessively influenced by the orientation, lighting or minor local details, without being trained explicitly to do so.

Conclusion :

In this project we worked on the Machine learning blocks that allow us to build a keyword and image based search engine applied to a collection of images. The basic idea is to learn a meaningful and joined embedding function for text and image and then use the distance between items in the embedding space to rank search results.

References :

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].