All Projects → arsenetar → Dupeguru

arsenetar / Dupeguru

Licence: gpl-3.0
Find duplicate files

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
NSIS
403 projects

Projects that are alternatives of or similar to Dupeguru

RocketMQDedupListener
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
Stars: ✭ 132 (-94.47%)
Mutual labels:  deduplication
Rdedup
Data deduplication engine, supporting optional compression and public key encryption.
Stars: ✭ 690 (-71.07%)
Mutual labels:  deduplication
Rltk
Record Linkage ToolKit (Find and link entities)
Stars: ✭ 71 (-97.02%)
Mutual labels:  deduplication
lieu
Dedupe/batch geocode addresses and venues around the world with libpostal
Stars: ✭ 73 (-96.94%)
Mutual labels:  deduplication
Recordlinkage
A toolkit for record linkage and duplicate detection in Python
Stars: ✭ 532 (-77.69%)
Mutual labels:  deduplication
Borgmatic
Simple, configuration-driven backup software for servers and workstations
Stars: ✭ 902 (-62.18%)
Mutual labels:  deduplication
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (-96.18%)
Mutual labels:  deduplication
Vdo
Userspace tools for managing VDO volumes.
Stars: ✭ 138 (-94.21%)
Mutual labels:  deduplication
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (-75.51%)
Mutual labels:  deduplication
Rmlint
Extremely fast tool to remove duplicates and other lint from your filesystem
Stars: ✭ 996 (-58.24%)
Mutual labels:  deduplication
Libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Stars: ✭ 3,312 (+38.87%)
Mutual labels:  deduplication
Kopia
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
Stars: ✭ 507 (-78.74%)
Mutual labels:  deduplication
Dupandas
📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe
Stars: ✭ 20 (-99.16%)
Mutual labels:  deduplication
UMICollapse
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
Stars: ✭ 31 (-98.7%)
Mutual labels:  deduplication
Fingerprints
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Stars: ✭ 91 (-96.18%)
Mutual labels:  deduplication
record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Stars: ✭ 67 (-97.19%)
Mutual labels:  deduplication
Jdupes
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Stars: ✭ 790 (-66.88%)
Mutual labels:  deduplication
Dejavu
Quickly detect already witnessed data.
Stars: ✭ 151 (-93.67%)
Mutual labels:  deduplication
Spark Lucenerdd
Spark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-95.22%)
Mutual labels:  deduplication
Fastcdc Rs
FastCDC implementation in Rust
Stars: ✭ 31 (-98.7%)
Mutual labels:  deduplication

dupeGuru

dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It is written mostly in Python 3 and has the peculiarity of using multiple GUI toolkits, all using the same core Python code. On OS X, the UI layer is written in Objective-C and uses Cocoa. On Linux, it is written in Python and uses Qt5.

The Cocoa UI of dupeGuru is hosted in a separate repo: https://github.com/arsenetar/dupeguru-cocoa

Current status

Still looking for additional help especially with regards to:

  • OSX maintenance: reproducing bugs & cocoa version, building package with Cocoa UI.
  • Linux maintenance: reproducing bugs, maintaining PPA repository, Debian package.
  • Translations: updating missing strings, transifex project at https://www.transifex.com/voltaicideas/dupeguru-1
  • Documentation: keeping it up-to-date.

Contents of this folder

This folder contains the source for dupeGuru. Its documentation is in help, but is also available online in its built form. Here's how this source tree is organized:

  • core: Contains the core logic code for dupeGuru. It's Python code.
  • qt: UI code for the Qt toolkit. It's written in Python and uses PyQt.
  • images: Images used by the different UI codebases.
  • pkg: Skeleton files required to create different packages
  • help: Help document, written for Sphinx.
  • locale: .po files for localization.
  • hscommon: A collection of helpers used across HS applications.
  • qtlib: A collection of helpers used across Qt UI codebases of HS applications.

How to build dupeGuru from source

Windows & macOS specific additional instructions

For windows instructions see the Windows Instructions.

For macos instructions (qt version) see the macOS Instructions.

Prerequisites

System Setup

When running in a linux based environment the following system packages or equivalents are needed to build:

  • python3-pyqt5
  • pyqt5-dev-tools (on some systems, see note)
  • python3-wheel (for hsaudiotag3k)
  • python3-venv (only if using a virtual environment)
  • python3-dev
  • build-essential

Note: On some linux systems pyrcc5 is not put on the path when installing python3-pyqt5, this will cause some issues with the resource files (and icons). These systems should have a respective pyqt5-dev-tools package, which should also be installed. The presence of pyrcc5 can be checked with which pyrcc5. Debian based systems need the extra package, and Arch does not.

To create packages the following are also needed:

  • python3-setuptools
  • debhelper

Building with Make

dupeGuru comes with a makefile that can be used to build and run:

$ make && make run

Building without Make

$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt
$ python build.py
$ python run.py

Generating Debian/Ubuntu package

To generate packages the extra requirements in requirements-extra.txt must be installed, the steps are as follows:

$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt -r requirements-extra.txt
$ python build.py --clean
$ python package.py

This can be made a one-liner (once in the directory) as:

$ bash -c "python3 -m venv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt -r requirements-extra.txt && python build.py --clean && python package.py"

Running tests

The complete test suite is run with Tox 1.7+. If you have it installed system-wide, you don't even need to set up a virtualenv. Just cd into the root project folder and run tox.

If you don't have Tox system-wide, install it in your virtualenv with pip install tox and then run tox.

You can also run automated tests without Tox. Extra requirements for running tests are in requirements-extra.txt. So, you can do pip install -r requirements-extra.txt inside your virtualenv and then py.test core hscommon

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].