All Projects → deduplication → Similar Projects or Alternatives

48 Open source projects that are alternatives of or similar to deduplication

Frost
A backup program that does deduplication, compression, encryption
Stars: ✭ 25 (-57.63%)
Mutual labels:  deduplication
dedupsqlfs
Deduplicating filesystem via Python3, FUSE and SQLite
Stars: ✭ 24 (-59.32%)
Mutual labels:  deduplication
md-svg-vue
Material design icons by Google for Vue.js & Nuxt.js (server side support & inline svg with path)
Stars: ✭ 14 (-76.27%)
Mutual labels:  chunking
NotEnoughAV1Encodes-Qt
Linux GUI for AV1 Encoders
Stars: ✭ 27 (-54.24%)
Mutual labels:  chunking
esa-httpclient
An asynchronous event-driven HTTP client based on netty.
Stars: ✭ 82 (+38.98%)
Mutual labels:  chunking
nomenklatura
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Stars: ✭ 158 (+167.8%)
Mutual labels:  deduplication
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+2894.92%)
Mutual labels:  chunking
GrammarEngine
Грамматический Словарь Русского Языка (+ английский, японский, etc)
Stars: ✭ 68 (+15.25%)
Mutual labels:  chunking
jmem
Break up huge JSON arrays into manageable sizes.
Stars: ✭ 14 (-76.27%)
Mutual labels:  chunking
sequence labeling tf
Sequence Labeling in Tensorflow
Stars: ✭ 18 (-69.49%)
Mutual labels:  chunking
Data Matching Software
A list of free data matching and record linkage software.
Stars: ✭ 206 (+249.15%)
Mutual labels:  deduplication
Lsh
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Stars: ✭ 182 (+208.47%)
Mutual labels:  deduplication
Restic
Fast, secure, efficient backup program
Stars: ✭ 15,105 (+25501.69%)
Mutual labels:  deduplication
Kvdo
A pair of kernel modules which provide pools of deduplicated and/or compressed block storage.
Stars: ✭ 168 (+184.75%)
Mutual labels:  deduplication
Dupeguru
Find duplicate files
Stars: ✭ 2,385 (+3942.37%)
Mutual labels:  deduplication
Dejavu
Quickly detect already witnessed data.
Stars: ✭ 151 (+155.93%)
Mutual labels:  deduplication
Vdo
Userspace tools for managing VDO volumes.
Stars: ✭ 138 (+133.9%)
Mutual labels:  deduplication
Spark Lucenerdd
Spark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (+93.22%)
Mutual labels:  deduplication
Fingerprints
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Stars: ✭ 91 (+54.24%)
Mutual labels:  deduplication
Rltk
Record Linkage ToolKit (Find and link entities)
Stars: ✭ 71 (+20.34%)
Mutual labels:  deduplication
Rmlint
Extremely fast tool to remove duplicates and other lint from your filesystem
Stars: ✭ 996 (+1588.14%)
Mutual labels:  deduplication
Fastcdc Rs
FastCDC implementation in Rust
Stars: ✭ 31 (-47.46%)
Mutual labels:  deduplication
Dupandas
📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe
Stars: ✭ 20 (-66.1%)
Mutual labels:  deduplication
Borgmatic
Simple, configuration-driven backup software for servers and workstations
Stars: ✭ 902 (+1428.81%)
Mutual labels:  deduplication
Jdupes
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Stars: ✭ 790 (+1238.98%)
Mutual labels:  deduplication
Rdedup
Data deduplication engine, supporting optional compression and public key encryption.
Stars: ✭ 690 (+1069.49%)
Mutual labels:  deduplication
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+889.83%)
Mutual labels:  deduplication
Recordlinkage
A toolkit for record linkage and duplicate detection in Python
Stars: ✭ 532 (+801.69%)
Mutual labels:  deduplication
Kopia
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
Stars: ✭ 507 (+759.32%)
Mutual labels:  deduplication
Alertmanager
Prometheus Alertmanager
Stars: ✭ 4,574 (+7652.54%)
Mutual labels:  deduplication
Libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Stars: ✭ 3,312 (+5513.56%)
Mutual labels:  deduplication
lieu
Dedupe/batch geocode addresses and venues around the world with libpostal
Stars: ✭ 73 (+23.73%)
Mutual labels:  deduplication
UMICollapse
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
Stars: ✭ 31 (-47.46%)
Mutual labels:  deduplication
RocketMQDedupListener
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
Stars: ✭ 132 (+123.73%)
Mutual labels:  deduplication
record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Stars: ✭ 67 (+13.56%)
Mutual labels:  deduplication
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (+54.24%)
Mutual labels:  deduplication
entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Stars: ✭ 96 (+62.71%)
Mutual labels:  deduplication
splink
Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+206.78%)
Mutual labels:  deduplication
dduper
Fast block-level out-of-band BTRFS deduplication tool.
Stars: ✭ 108 (+83.05%)
Mutual labels:  deduplication
acid-store
A library for secure, deduplicated, transactional, and verifiable data storage
Stars: ✭ 48 (-18.64%)
Mutual labels:  deduplication
IntraArchiveDeduplicator
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Stars: ✭ 87 (+47.46%)
Mutual labels:  deduplication
yadf
Yet Another Dupes Finder
Stars: ✭ 32 (-45.76%)
Mutual labels:  deduplication
cargo-limit
Cargo with less noise: warnings are skipped until errors are fixed, Neovim integration, etc.
Stars: ✭ 105 (+77.97%)
Mutual labels:  deduplication
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+1010.17%)
Mutual labels:  deduplication
zpaqfranz
Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
Stars: ✭ 86 (+45.76%)
Mutual labels:  deduplication
mail-deduplicate
📧 CLI to deduplicate mails from mail boxes.
Stars: ✭ 134 (+127.12%)
Mutual labels:  deduplication
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-69.49%)
Mutual labels:  deduplication
Bartinter
Dynamically changes status bar style depending on content behind it
Stars: ✭ 1,687 (+2759.32%)
Mutual labels:  content-dependent
1-48 of 48 similar projects