All Projects → idealo → Imagededup

idealo / Imagededup

Licence: apache-2.0
😎 Finding duplicate images made easy!

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Imagededup

Image Super Resolution
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.
Stars: ✭ 3,293 (-15.78%)
Mutual labels:  e-commerce, idealo
Firebase Cms
A CMS + E-commerce platform built with Angular and Firebase
Stars: ✭ 286 (-92.69%)
Mutual labels:  e-commerce
spree-postal-service
Weight based calculator for Spree Commerce.
Stars: ✭ 21 (-99.46%)
Mutual labels:  e-commerce
Ahash
aHash is a non-cryptographic hashing algorithm that uses the AES hardware instruction
Stars: ✭ 251 (-93.58%)
Mutual labels:  hashing
instantsearch-android-examples
Example apps built with algolia/instantsearch-android
Stars: ✭ 60 (-98.47%)
Mutual labels:  e-commerce
Cuckoofilter
Substitute for bloom filter.
Stars: ✭ 270 (-93.09%)
Mutual labels:  hashing
TokoTokoan
No description or website provided.
Stars: ✭ 19 (-99.51%)
Mutual labels:  e-commerce
T1ha
One of the fastest hash functions
Stars: ✭ 302 (-92.28%)
Mutual labels:  hashing
Cleancheckout
A drop-in replacement for the Magento 2 checkout.
Stars: ✭ 280 (-92.84%)
Mutual labels:  e-commerce
Awesome Ecommerce Stack
💰 Popular marketing tools and add-ons used by 10,000+ of the top e-commerce stores.
Stars: ✭ 255 (-93.48%)
Mutual labels:  e-commerce
Shopsys
Main repository for maintaining Shopsys Framework packages. Open for ISSUES and PULL REQUESTS.
Stars: ✭ 257 (-93.43%)
Mutual labels:  e-commerce
HashCompare
Compare various different Hashing Algorithms
Stars: ✭ 18 (-99.54%)
Mutual labels:  hashing
Vuejs Firebase Shopping Cart
Shopping cart demo using Vuejs and Firebase
Stars: ✭ 274 (-92.99%)
Mutual labels:  e-commerce
easy-scrypt
This is a nice and simple wrapper in Go over the scrypt password based key derivation algorithm.
Stars: ✭ 21 (-99.46%)
Mutual labels:  hashing
S Cart
Free Laravel open source e-commerce for business: shopping cart, cms content, and more...
Stars: ✭ 286 (-92.69%)
Mutual labels:  e-commerce
devdevdev
The next trendy apparel e-commerce store maybe?
Stars: ✭ 27 (-99.31%)
Mutual labels:  e-commerce
AquilaCMS
AquilaCMS is an Open Source and "all in one" ecommerce solution, self hosted, built using nodejs (MERN stack)
Stars: ✭ 69 (-98.24%)
Mutual labels:  e-commerce
S Cart
This project has been replaced by https://github.com/s-cart/s-cart
Stars: ✭ 258 (-93.4%)
Mutual labels:  e-commerce
Ccxt
A JavaScript / Python / PHP cryptocurrency trading API with support for more than 100 bitcoin/altcoin exchanges
Stars: ✭ 22,501 (+475.47%)
Mutual labels:  e-commerce
Deephash Papers
Must-read papers on deep learning to hash (DeepHash)
Stars: ✭ 302 (-92.28%)
Mutual labels:  hashing

Image Deduplicator (imagededup)

Build Status Build Status Docs codecov PyPI Version License

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

This package provides functionality to make use of hashing algorithms that are particularly good at finding exact duplicates as well as convolutional neural networks which are also adept at finding near duplicates. An evaluation framework is also provided to judge the quality of deduplication for a given dataset.

Following details the functionality provided by the package:

Detailed documentation for the package can be found at: https://idealo.github.io/imagededup/

imagededup is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows. It is distributed under the Apache 2.0 license.

📖 Contents

⚙️ Installation

There are two ways to install imagededup:

  • Install imagededup from PyPI (recommended):
pip install imagededup

⚠️ Note: The TensorFlow >=2.1 and TensorFlow 1.15 release now include GPU support by default. Before that CPU and GPU packages are separate. If you have GPUs, you should rather install the TensorFlow version with GPU support especially when you use CNN to find duplicates. It's way faster. See the TensorFlow guide for more details on how to install it for older versions of TensorFlow.

  • Install imagededup from the GitHub source:
git clone https://github.com/idealo/imagededup.git
cd imagededup
pip install "cython>=0.29"
python setup.py install

🚀 Quick Start

In order to find duplicates in an image directory using perceptual hashing, following workflow can be used:

  • Import perceptual hashing method
from imagededup.methods import PHash
phasher = PHash()
  • Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')
  • Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)
  • Plot duplicates obtained for a given file (eg: 'ukbench00120.jpg') using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

The output looks as below:

The complete code for the workflow is:

from imagededup.methods import PHash
phasher = PHash()

# Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')

# Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)

# plot duplicates obtained for a given file using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

For more examples, refer this part of the repository.

For more detailed usage of the package functionality, refer: https://idealo.github.io/imagededup/

Benchmarks

Detailed benchmarks on speed and classification metrics for different methods have been provided in the documentation. Generally speaking, following conclusions can be made:

  • CNN works best for near duplicates and datasets containing transformations.
  • All deduplication methods fare well on datasets containing exact duplicates, but Difference hashing is the fastest.

🤝 Contribute

We welcome all kinds of contributions. See the Contribution guide for more details.

📝 Citation

Please cite Imagededup in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{idealods2019imagededup,
  title={Imagededup},
  author={Tanuj Jain and Christopher Lennan and Zubin John and Dat Tran},
  year={2019},
  howpublished={\url{https://github.com/idealo/imagededup}},
}

🏗 Maintainers

© Copyright

See LICENSE for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].