All Projects → drakerc → tanukai

drakerc / tanukai

Licence: other
Furry imageboard / Manga/anime image search engine

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
SCSS
7915 projects

Projects that are alternatives of or similar to tanukai

FurryBot
Furry Bot for Discord
Stars: ✭ 17 (-32%)
Mutual labels:  furry, furry-fandom
app
An e621/e926 client, fetch and download posts and pools!
Stars: ✭ 14 (-44%)
Mutual labels:  furry
foxbot
Telegram bot for finding furry image sources and inline mirroring
Stars: ✭ 25 (+0%)
Mutual labels:  furry
YiffSpot
A real-time web chat for "yiffing" randomly with other furries anonymously.
Stars: ✭ 18 (-28%)
Mutual labels:  furry
e621-api-docs
Documentation library for the e621's API
Stars: ✭ 34 (+36%)
Mutual labels:  furry
fuzzysearch
A site that allows you to reverse image search millions of furry images in under a second
Stars: ✭ 34 (+36%)
Mutual labels:  furry
rawr-x3dh
TypeScript Implementation of X3DH
Stars: ✭ 51 (+104%)
Mutual labels:  furry
faexport
The API for Furaffinity you wish existed
Stars: ✭ 61 (+144%)
Mutual labels:  furry

Tanukai

Tanukai Demo

This project creates an index of art images (mainly Japanese manga/anime-style drawings) so that you can find similar images to these stored images by uploading your own image. Basically: reverse image search for manga/anime. Available on https://tanukai.com

Implementation details

In order to store the image so that it can be used for reverse-searching, Tanukai uses two methods:

  • Convoluted Neural Network (DenseNet121) - it uses a pre-trained CNN to infer the features of the image (it gets the features vector from the second-to-last output layer of the CNN). This features vector is stored in Milvus - vector similarity search engine (https://milvus.io/). Additional image data is stored in Elasticsearch (including its path, timestamp and metadata such as tags, author etc). When a user uploads an image to search for similar images, Tanukai gets the features of the uploaded image using the CNN, then it makes the similarity search query using these features in Milvus, and then makes a query to Elasticsearch to get data about the received results. Tanukai uses DenseNet121 as it's pretty fast, has a small vector (1024) and returns proper results for anime images.
  • Perceptual Hash (pHash) - Tanukai calculates the pHash of images and stores it in Elasticsearch (every character of the hash is stored in a separate keyword-type element in ES to leverage Lucene's capabilities). When someone wants to find similar images, it calculates the pHash of the uploaded image, makes a ES query to find images with minimum_should_match elements of the pHash. This is mostly useful when searching for almost identical images and is not currently used (only stored in ES).

Other features:

  • REST back-end made in Django REST Framework
  • Scrapy is used to scrape images from image-boards such as Danbooru that are then processed by the CNN and stored in Milvus and ES.
  • OpenCV is used for template matching (to show the found sub-image in a large image, currently not used)
  • Database (MySQL) along with Django's ORM is used to store user data, user settings (prefered art websites/safety rating) and previous searches
  • React front-end (extremely basic, my React skills are lacking)
  • Containerized using docker-compose

Libraries used:

  • Backend: Keras, elasticsearch-dsl, Pillow, ImageHash, OpenCV, Django (DRF), Scrapy
  • Frontend: React, Redux, React-Semantic-UI

Directories description:

  • home - Django start directory
  • img_match - reverse image search "library"
  • nginx - nginx settings
  • node_modules - node packages
  • public - directory used by React to generate some front-end stuff
  • scrapers - Scrapy-based scrapers that fetch and save images
  • src - front-end
  • static - directory containing images and other static files
  • tanukai - Tanukai Django files

Starting the project

  • Install docker and docker-compose

  • Copy .env.dist to .env (and change some variables if necessary, especially REACT_APP_API_URL - this should be the address to the Django API)

  • Execute docker-compose build

  • Execute docker-compose up -d (or docker-compose -f docker-compose.prod.yml up -d in deployment environment)

  • Create and apply migrations by using docker-compose exec image_search_python python manage.py makemigrations && docker-compose exec image_search_python python manage.py migrate (on production, use docker-compose -f docker-compose.prod.yml exec image_search_python python manage.py makemigrations --settings=tanukai_backend.settings.prod && docker-compose -f docker-compose.prod.yml exec image_search_python python manage.py migrate --settings=tanukai_backend.settings.prod)

  • The front-end should be available on the port specified in DOCKER_WEB_PORT .env variable. More details can be found in the nginx/nginx.confg file.

  • In order to start scraping, execute docker-compose -f docker-compose.prod.yml exec image_search_python bash && cd scrapers && scrapy crawl SCRAPER_NAME -s JOBDIR=crawls/SCRAPER_NAME e.g. scrapy crawl e621 -s JOBDIR=crawls/e621. You can use param -a param_ignore_scraped=true to ignore the "previously scraped" check (especially helpful if you are filling up the database)

Testing

Coming soon

TODOs

  • Create tests (pytest + mock databases; Selenium front-end tests)
  • Create deployment process
  • Cleanup code, add linters
  • Add MySQL support instead of sqlite
  • Use AWS S3 to store images
  • Add FAQ section, terms, etc
  • Improve the front-end look
  • Add UserTags functionality (white/blacklisting of tags)
  • Add more searching methods (search by tags and sort by the score returned by ES)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].