All Projects → elisemercury → Duplicate-Image-Finder

elisemercury / Duplicate-Image-Finder

Licence: MIT license
difPy - Python package for finding duplicate or similar images within folders

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Duplicate-Image-Finder

reddit-fetch
A program to fetch some comments/pictures from reddit
Stars: ✭ 50 (-73.26%)
Mutual labels:  pictures, images
fsimilar
find/file similar
Stars: ✭ 13 (-93.05%)
Mutual labels:  similarity, duplicate
Unsplash Js
🤖 A server-side JavaScript wrapper for the Unsplash API
Stars: ✭ 1,647 (+780.75%)
Mutual labels:  pictures, images
Waifu2x-Image-Saver
A Firefox extension to download any image and process them with Waifu2x with one click.
Stars: ✭ 13 (-93.05%)
Mutual labels:  pictures, images
reddit get top images
Get top images from any subreddit
Stars: ✭ 37 (-80.21%)
Mutual labels:  pictures, images
blur-up
A tool that creates preview images.
Stars: ✭ 28 (-85.03%)
Mutual labels:  images
wallup-android
Hand curated Images & 'Auto Wallpaper'
Stars: ✭ 30 (-83.96%)
Mutual labels:  images
Everybody-dance-now
Implementation of paper everybody dance now for Deep learning course project
Stars: ✭ 22 (-88.24%)
Mutual labels:  images
pretty-formula
A small Java library to parse mathematical formulas to LaTeX and display them as images
Stars: ✭ 29 (-84.49%)
Mutual labels:  images
react-simple-image-viewer
Simple image viewer component for React
Stars: ✭ 44 (-76.47%)
Mutual labels:  images
ReactionDecoder
Reaction Decoder Tool (RDT) - Atom Atom Mapping Tool
Stars: ✭ 59 (-68.45%)
Mutual labels:  similarity
PictureToAudio
a few picture to audio (多张图片合成视频)
Stars: ✭ 21 (-88.77%)
Mutual labels:  pictures
folder-auth-plugin
Authorization Plugin for Jenkins that works on folders
Stars: ✭ 21 (-88.77%)
Mutual labels:  folder
hermitage
Service that provides storage, delivery and modification of your images
Stars: ✭ 33 (-82.35%)
Mutual labels:  images
uniquify
Uniquify is a Telegram bot interface used to remove duplicate media files from a chat
Stars: ✭ 45 (-75.94%)
Mutual labels:  duplicate
vektonn
vektonn.github.io/vektonn
Stars: ✭ 109 (-41.71%)
Mutual labels:  similarity
TwinBert
pytorch implementation of the TwinBert paper
Stars: ✭ 36 (-80.75%)
Mutual labels:  similarity
django-clone
Controlled Django model instance replication.
Stars: ✭ 89 (-52.41%)
Mutual labels:  duplicate
android-doc-picker
A simple and easy to use documents Picker android library. Choose any documents like pdf, ppt, text, word or media files from your device
Stars: ✭ 37 (-80.21%)
Mutual labels:  images
ngx-fire-uploader
Angular Fire Uploader
Stars: ✭ 18 (-90.37%)
Mutual labels:  images

Duplicate Image Finder (difPy)

PyPIv PyPI status PyPI - Python Version PyPI - License

Tired of going through all images in a folder and comparing them manually to check if they are duplicates?

The Duplicate Image Finder (difPy) Python package automates this task for you!

pip install difPy

👉 difPy v2.4.x has some major updates and new features. Check out the release notes for a detailed listing.

👐 Our motto? The more users use difPy, the more issues and missing features can be detected, and the better the algorithm gets over time. Contributions are always welcome - check our contributor guidelines for more information.

Read more on how the algorithm of difPy works in my Medium article Finding Duplicate Images with Python.

Check out the difPy package on PyPI.org

Description

DifPy searches for images in one or two different folders, compares the images it found and checks whether these are duplicates. It then outputs the image files classified as duplicates and the filenames of the duplicate images having the lower resolution, so you know which of the duplicate images are safe to be deleted. You can then either delete them manually, or let difPy delete them for you.

DifPy does not compare images based on their hashes. It compares them based on their tensors i. e. the image content - this allows difPy to not only search for duplicate images, but also for similar images.

Basic Usage

Use the following function to make difPy search for duplicates within one specific folder and its subfolders:

from difPy import dif
search = dif("C:/Path/to/Folder/")

To search for duplicates within two folders and their subfolders:

from difPy import dif
search = dif("C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/")

Folder paths must be specified as a Python string.

📓 For a detailed usage guide, please view the official difPy Usage Documentation.

Output

DifPy gives two types of output that you may use depending on your use case:

A dictionary of duplicates/similar images that were found, where the keys are a unique id for each image file:

search.result

> Output:
{20220824212437767808 : {"filename" : "image1.jpg",
                         "location" : "C:/Path/to/Image/image1.jpg"},
                         "duplicates" : ["C:/Path/to/Image/duplicate_image1.jpg",
                                         "C:/Path/to/Image/duplicate_image2.jpg"]},
...
}

A list of duplicates/similar images that have the lowest quality:

search.lower_quality

> Output:
["C:/Path/to/Image/duplicate_image1.jpg", 
 "C:/Path/to/Image/duplicate_image2.jpg", ...]

DifPy can also generate a dictionary with statistics on the completed process:

search.stats

> Output:
{"directory_1" : "C:/Path/to/Folder_A/",
 "directory_2" : "C:/Path/to/Folder_B/",
 "duration" : {"start_date": "2022-06-13",
               "start_time" : "14:44:19",
               "end_date" : "2022-06-13",
               "end_time" : "14:44:38",
               "seconds_elapsed" : 18.6113},
 "similarity_grade" : "normal",
 "similarity_mse" : 200,
 "total_images_searched" : 1032,
 "total_dupl_sim_found" : 1024}

CLI Usage

You can make use of difPy through the CLI interface by using the following commands:

python dif.py -A "C:/Path/to/Folder_A/"

python dif.py -A "C:/Path/to/Folder_A/" -B "C:/Path/to/Folder_B/"

It supports the following arguments:

dif.py [-h] -A DIRECTORY_A [-B [DIRECTORY_B]] [-Z [OUTPUT_DIRECTORY]] 
       [-s [{low,normal,high,int}]] [-px [PX_SIZE]] [-p [{True,False}]] [-o [{True,False}]]
       [-d [{True,False}]] [-D [{True,False}]]

The output of difPy is then written to files and saved in the working directory by default, or to the folder specified in the -Z / -output_directory parameter. The "xxx" in the filename is a unique timestamp:

difPy_results_xxx.json
difPy_lower_quality_xxx.txt
difPy_stats_xxx.json

📓 For a detailed usage guide, please view the official difPy Usage Documentation.

Additional Parameters

DifPy has the following optional parameters:

dif(directory_A, directory_B, similarity="normal", px_size=50, 
    show_progress=True, show_output=False, delete=False, silent_del=False)

similarity (str, int)

Depending on which use-case you want to apply difPy for, the granularity for the classification of the images can be adjusted. DifPy can f. e. search for exact matching duplicate images, or images that look similar, but are not necessarily duplicates.

"normal" = (recommended, default) searches for duplicates with a certain tolerance

"high" = searches for duplicate images with extreme precision, f. e. for use when comparing images that contain a lot of details like f. e. text

"low" = searches for similar images

To customize the classification threshold and define the MSE value manually, you can set similarity to any integer.

px_size (int)

! Recommended not to change default value

Absolute size in pixels (width x height) that the images will be compressed to before being compared. The higher the px_size, the more computational ressources and time required.

show_progress (bool)

Per default, difPy will set this parameter to True, so that you can see where your lengthy processing is. Change this value to False to disable the progress bar.

False= (default) no progress bar is shown

True = outputs a progress bar

show_output (bool)

Per default, difPy will output only the filename of the duplicate images it found. If you want the duplicate images to be shown in the console output, change this value to True.

False= (default) outputs filename of the duplicate/similar images found

True = outputs a sample and the filename

delete (bool)

! Please use with care, as this cannot be undone

When set to True, the lower resolution duplicate images that were found by difPy are deleted from the folder. Asks for user confirmation before deleting the images. To skip the user confimation, set silent_del to True.

silent_del (bool)

! Please use with care, as this cannot be undone

When set to True, the user confirmation is skipped and the lower resolution duplicate images that were found by difPy are automatically deleted from the folder.

Similar Work

I. DifPy as Webapp

A Streamlit based Webapp to find duplicate images from single/multiple directories - 🧬 based on difPy

Single Directory 📸 demo1

Two directories 📸 demo2

II. Mac Photos Tool to find Duplicates (photosdup)

Tool to scan a Mac Photos library for duplicates, thumbnails etc. - inspired by difPy


💭 Also want to be featured in the "Related Projects" section? Check our contributor guidelines to find out how!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].