All Projects → sunshaoyan → ISeeNN

sunshaoyan / ISeeNN

Licence: Apache-2.0 License
A CNN feature based image retrieval website

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
C++
36643 projects - #6 most used programming language
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
CMake
9771 projects
shell
77523 projects

Projects that are alternatives of or similar to ISeeNN

Hypertag
Knowledge Management for Humans using Machine Learning & Tags
Stars: ✭ 116 (+673.33%)
Mutual labels:  search-engine, image-retrieval
Sis
Simple image search engine
Stars: ✭ 438 (+2820%)
Mutual labels:  search-engine, image-retrieval
see
Search Engine in Erlang
Stars: ✭ 27 (+80%)
Mutual labels:  search-engine
james
Fast and extendable modern launcher for Windows
Stars: ✭ 32 (+113.33%)
Mutual labels:  search-engine
openverse-api
The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
Stars: ✭ 41 (+173.33%)
Mutual labels:  search-engine
DOLG-pytorch
Unofficial PyTorch Implementation of "DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features"
Stars: ✭ 69 (+360%)
Mutual labels:  image-retrieval
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (+80%)
Mutual labels:  search-engine
vim-www
Toolbox to open & search URLs from vim
Stars: ✭ 32 (+113.33%)
Mutual labels:  search-engine
JASSv2
Experimental search engine in C/C++17 - still in early development.
Stars: ✭ 22 (+46.67%)
Mutual labels:  search-engine
art
Exploring the connections between artworks with deep "Visual Analogies"
Stars: ✭ 73 (+386.67%)
Mutual labels:  image-retrieval
sonar-tantivy
Search engine based on tantivy with a Node.js frontend
Stars: ✭ 30 (+100%)
Mutual labels:  search-engine
bing-ip2hosts
bingip2hosts is a Bing.com web scraper that discovers websites by IP address
Stars: ✭ 99 (+560%)
Mutual labels:  search-engine
collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
Stars: ✭ 17 (+13.33%)
Mutual labels:  search-engine
Search Ads Web Service
Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (+100%)
Mutual labels:  search-engine
Free-Internet-Plugin
A free Internet is a better Internet. This Chrome browser plugin removes paywalled content from Google search results.
Stars: ✭ 121 (+706.67%)
Mutual labels:  search-engine
Kotlin-Coroutine-Flow
Search engine functionality using Kotlin Coroutines and Flow
Stars: ✭ 25 (+66.67%)
Mutual labels:  search-engine
bitshift
A semantic search engine for source code
Stars: ✭ 30 (+100%)
Mutual labels:  search-engine
keras rmac
RMAC implementation in Keras
Stars: ✭ 80 (+433.33%)
Mutual labels:  image-retrieval
gvcnn-tf
Group-View Convolutional Neural Networks for 3D Shape Recognition
Stars: ✭ 22 (+46.67%)
Mutual labels:  image-retrieval
app-search-flask-app
This is an example of a Python Flask app with Elasticsearch/ Elastic App Search with respective Python Clients
Stars: ✭ 17 (+13.33%)
Mutual labels:  search-engine

The Introduction to ISeeNN System

This is a comprehensive document about our CNN based image retrieval system ISeeNN building procedure.

The phrase ISeeNN can be expanded as I See (with) CNN.

We provide an online demo here, which indexes MirFlickr 1M dataset. It is only for demonstration purpose, and may not be always available.

System Overview

The system consists of three principal components:

  • A distributed Image Serving system
  • A Search Engine for running retrieval
  • A Front End for user interaction

The system is featured in:

  • Support multiple CNN models, feature types with dynamic switch
  • Support index update with new queries
  • Support specifying dataset coverage
  • Support user session to record feedback
  • Open API for new algorithms
  • Support image URL distribution from multiple internal servers

The framework of our system can be illustrated as:

Setup

In this part, I will show the system setup details.

Our system is built upon a number of linux servers. Essentially, ISeeNN is not a distributed system, but only with supporting of image storage and fetch from multiple servers.

Here is the configuration of our premier implement:

ID Operating System Role Internal IP GPU Memory
1 Ubuntu 14.04 x64 Front End & Retrieval System & Mongo Server 192.168.6.232 K80 x 4 64G
2 Open Suse 13.2 x64 Image Server 192.168.104.244 K40 x 2 64G
3 Open Suse Leap 42.1 x64 Image Server 192.168.102.200 GeForce GTX 660 Ti 16G

It can be seen that we do not rely on the very same operating system for different servers. We choose Server 1 as the principal server because it has relatively higher computing resource. Of course it can also serve as an image server.

In the following we first focus on the configurations on Server 1.

Database Setting

We use MongoDB v3.4.1 as the backend storage database. Contents in the database include:

  • index of the image dataset (feature_id -> image_id)
  • image distribution information (image_id -> image_url)

Because the apt repository is old for MongoDB, we directly download the binary files.

$ wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1404-3.4.1.tgz
$ tar zxvf mongodb-linux-x86_64-ubuntu1404-3.4.1.tgz
$ sudo cp mongodb-linux-x86_64-ubuntu1404-3.4.1/bin/* /usr/local/bin

To initialize the database, we first create admin users for the admin and our image_retrieval database.

We use the script written by frodenas for the initialization.

# first_run.sh
#!/bin/bash
 USER=${MONGODB_USERNAME:-mongo}
 PASS=${MONGODB_PASSWORD:-$(pwgen -s -1 16)}
 DB=${MONGODB_DBNAME:-admin}
 DBPATH=/db/mongo # set your own db_path here
 if [ ! -z "$MONGODB_DBNAME" ]
 then
     ROLE=${MONGODB_ROLE:-dbOwner}
 else
     ROLE=${MONGODB_ROLE:-dbAdminAnyDatabase}
 fi

 # Start MongoDB service
 mongod --dbpath $DBPATH --nojournal &
 while ! nc -vz localhost 27017; do sleep 1; done

 # Create User
 echo "Creating user: \"$USER\"..."                                                                           
 mongo $DB --eval "db.createUser({ user: '$USER', pwd: '$PASS', roles: [ { role: '$ROLE', db: '$DB' } ] }); "
 
 # Stop MongoDB service
 mongod --dbpath $DBPATH --shutdown                                                                                                                                                                                     
 echo "MongoDB User: \"$USER\""
 echo "MongoDB Password: \"$PASS\""
 echo "MongoDB Database: \"$DB\""
 echo "MongoDB Role: \"$ROLE\""

First, create a admin user for the admin database, by

$ ./first_run.sh
MongoDB User: "mongo"
MongoDB Password: "xxxxxxxxxxxxx"
MongoDB Database: "admin"
MongoDB Role: "dbAdminAnyDatabase"

Then, create a dbOwner user for the image_retrieval database, by

$ export MONGODB_USERNAME=webclient
$ export MONGODB_DBNAME=image_retrieval
$ ./first_run.sh
MongoDB User: "webclient"
MongoDB Password: "xxxxxxxxxxxxx"
MongoDB Database: "image_retrieval"
MongoDB Role: "dbOwner"

Here I hided the password. Remember to save the user information for further use.

After the initialization, start the mongod server with config file /db/mongodb.conf

# /db/mongodb.conf
dbpath=/db/mongo/
logpath=/db/mongodb.log
logappend=true
journal=true
auth = true
$ mongod --config /db/mongodb.conf

Now test your database

$ mongo
MongoDB shell version v3.4.1
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.1
> use image_retrieval
switched to db image_retrieval
> db.auth('webclient', 'xxxxxxxxxxx')
1

Django Development Environment

We use Python3 to develop the web service, and to extract CNN features with Tensorflow Python interface.

In this part we will config Python with Django module for web service and its MongoDB backend

It is benificial to create standalone python runtime environment with virtualenv.

Install virtualenv:

$ sudo pip install virtualenv

Now in your workspace, create and enter a virtualenv environment:

$ virtualenv --no-site-packages -p python3 image_retrieval
$ cd image_retrieval
$ source bin/activate

Setup Django, MongoEngine, ski-image and Pillow:

$ pip install django
$ pip install mongoengine
$ pip install scikit-image
$ pip install Pillow

Then setup your TensorFlow under the instruction of the website.

Now we have Python environment as (depending on your own environment):

$ pip freeze
appdirs==1.4.0
cycler==0.10.0
dask==0.13.0
decorator==4.0.11
Django==1.10.5
matplotlib==2.0.0
mongoengine==0.11.0
networkx==1.11
numpy==1.12.0
olefile==0.44
packaging==16.8
Pillow==4.0.0
protobuf==3.2.0
pymongo==3.4.0
pyparsing==2.1.10
python-dateutil==2.6.0
pytz==2016.10
scikit-image==0.12.3
scipy==0.18.1
six==1.10.0
tensorflow-gpu==0.12.1
toolz==0.8.2

And let's start our ISeeNN project:

$ django-admin startproject ISeeNN
$ tree ISeeNN
ISeeNN
├── ISeeNN
│   ├── __init__.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
└── manage.py

To work with the MongoDB backend, modify the DATABASE setting in ISeeNN/settings.py as:

import mongoengine
...
DATABASES = {
    'default': {
        'ENGINE': '',
    }
}

_MONGODB_USER = 'webclient'
_MONGODB_PASSWD = 'xxxxxxxxxxxxxx'
_MONGODB_HOST = '192.168.6.232'
_MONGODB_NAME = 'image_retrieval'
_MONGODB_DATABASE_HOST = \
    'mongodb://%s:%s@%s/%s' \
    % (_MONGODB_USER, _MONGODB_PASSWD, _MONGODB_HOST, _MONGODB_NAME)

mongoengine.connect(_MONGODB_NAME, host=_MONGODB_DATABASE_HOST)

And configure the TIMEZONE = 'Asia/Shanghai'.

Due to the weak support of Django for MongoDB, we are not using the admin component for this version. So let's comment the related code in urls.py

from django.conf.urls import url
# from django.contrib import admin

urlpatterns = [
    # url(r'^admin/', admin.site.urls),
]

OK! now lanch the server by

./manage.py runserver 8000

Then open your browser to browse http://localhost:8000 to test it.

Boost Python Wrapper

Our search engine backend is implemented with C++ for efficiency concern. To call this backend, we wrap the C++ code with Boost Python to be exposed as a Python module.

Make sure to install boost-dev and boost-python-dev with python 3 support.

For Ubuntu:

$ sudo apt-get install libboost-dev libboost-python-dev

For OSX: (it's important to set the flags below)

$ brew install boost
$ brew install boost-python --with-python3 --without-python

For Suse Linux:

$ sudo zypper in boost-devel

To use these libraries, add these lines to your CMakeLists.txt:

INCLUDE(FindPythonLibs)
FIND_PACKAGE(PythonInterp)
FIND_PACKAGE(PythonLibs)
FIND_PACKAGE(Boost COMPONENTS python3)

INCLUDE_DIRECTORIES(${Boost_INCLUDE_DIRS} ${PYTHON_INCLUDE_DIRS})
LINK_LIBRARIES(${Boost_LIBRARIES} ${PYTHON_LIBRARIES})

PYTHON_ADD_MODULE(your_target ${SOURCE_FILES})

To run the project

There are three parts in this project:

  • the web interface ISeeNN/, including two web apps search_web/ and image_server/.
  • the indexer Indexer/ that runs off-line, to index specifical image dataset into the database.
  • the search engine search_engine/ implemented wit C++, which will be compiled to shared library to be used as a Python module in the web interface.

To run the project, follow these steps:

  • setup environment as the previous chapter.
  • copy ISeeNN/IseeNN/personal_settings.py.example to ISeeNN/IseeNN/personal_settings.py
  • specify the mongodb user name and password in the above setting file and Indexer/indexer.py
  • create image_server documents in your mongo shell. e.g.,
> db.image_server.insert({server_name: 'Amax', server_ip: '192.168.104.244'})
  • index your target image dataset in local disks.

    1. set the dir_name = '', server_name='' in Indexer/indexer.py. Maybe you also want to specify normalizer type and model type. Currently this script is not well organized. We will make a revision in the future.

    2. run cd Indexer && python indexer.py

  • compile and install the search engine backend:

$ cd search_engine && ./build.sh
  • run the server
$ cd ISeeNN && ./manager.py runserver 0.0.0.0:8000
  • Have fun!
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].