All Projects → mighty-justice → django-super-deduper

mighty-justice / django-super-deduper

Licence: MIT license
Utilities for de-duping Django model instances

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to django-super-deduper

mail-deduplicate
📧 CLI to deduplicate mails from mail boxes.
Stars: ✭ 134 (+396.3%)
Mutual labels:  dedupe
duplex
Duplicate code finder for Elixir
Stars: ✭ 20 (-25.93%)
Mutual labels:  dedupe
Restic
Fast, secure, efficient backup program
Stars: ✭ 15,105 (+55844.44%)
Mutual labels:  dedupe
Dedupe
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Stars: ✭ 3,241 (+11903.7%)
Mutual labels:  dedupe
dduper
Fast block-level out-of-band BTRFS deduplication tool.
Stars: ✭ 108 (+300%)
Mutual labels:  dedupe
dupe-krill
A fast file deduplicator
Stars: ✭ 147 (+444.44%)
Mutual labels:  dedupe
yadf
Yet Another Dupes Finder
Stars: ✭ 32 (+18.52%)
Mutual labels:  dedupe
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+2325.93%)
Mutual labels:  dedupe

Django Super Deduper

Build status Python version

A collection of classes and utilities to aid in de-duping Django model instances.

Requirements

  • Python 3.6
  • Django 1.11

Install

pip install django-super-deduper

Usage

Merging Duplicate Instances

By default any empty values on the primary object will take the value from the duplicates. Additionally, any related one-to-one, one-to-many, and many-to-many related objects will be updated to reference the primary object.

> from django_super_deduper.merge import MergedModelInstance
> primary_object = Model.objects.create(attr_A=None, attr_B='')
> alias_object_1 = Model.objects.create(attr_A=X)
> alias_object_2 = Model.objects.create(attr_B=Y)
> merged_object = MergedModelInstance.create(primary_object, [alias_object_1, alias_object_2])
> merged_object.attr_A
X
> merged_object.attr_B
Y

Improvements

  • Support multiple merging strategies
  • Recursive merging of related one-to-one objects

Logging

This package does have some rudimentary logging for debugging purposes. Add this snippet to your Django logging settings to enable it:

LOGGING = {
    'loggers': {
        'django_super_deduper': {
            'handlers': ['console'],
            'level': 'DEBUG',
        },
    },
}

References

Releasing

Pre-reqs:

pip install pypandoc twine
brew install pandoc
  1. Draft a new release and create new tag in Github
  2. Run python3 setup.py sdist bdist_wheel on master
  3. Upload to pypi python -m twine upload dist/*
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].