All Projects → Rltk → Similar Projects or Alternatives

83 Open source projects that are alternatives of or similar to Rltk

Recordlinkage
A toolkit for record linkage and duplicate detection in Python
Stars: ✭ 532 (+649.3%)
Mutual labels:  deduplication, similarity
RocketMQDedupListener
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
Stars: ✭ 132 (+85.92%)
Mutual labels:  deduplication
simetric
String similarity metrics for Elixir
Stars: ✭ 59 (-16.9%)
Mutual labels:  similarity
text-similarity-php
通过余弦定理+分词计算文本相似度PHP版
Stars: ✭ 95 (+33.8%)
Mutual labels:  similarity
mrivis
medical image visualization library and development toolkit
Stars: ✭ 19 (-73.24%)
Mutual labels:  similarity
Macropodus
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要,文本相似度,科学计算器,中文数字阿拉伯数字(罗马数字)转换,中文繁简转换,拼音转换。tookit(tool) of NLP,CWS(chinese word segnment),POS(Part-Of-Speech Tagging),NER(name entity recognition),Find(new words discovery),Keyword(keyword extraction),Summarize(text summarization),Sim(text similarity),Calculate(scientific calculator),Chi2num(chinese number to arabic number)
Stars: ✭ 309 (+335.21%)
Mutual labels:  similarity
yadf
Yet Another Dupes Finder
Stars: ✭ 32 (-54.93%)
Mutual labels:  deduplication
Rdedup
Data deduplication engine, supporting optional compression and public key encryption.
Stars: ✭ 690 (+871.83%)
Mutual labels:  deduplication
algorithm coding
推荐算法、相似度算法、布隆过滤器、均值算法、一致性Hash、数据结构、leetcode练习
Stars: ✭ 30 (-57.75%)
Mutual labels:  similarity
Duplicate-Image-Finder
difPy - Python package for finding duplicate or similar images within folders
Stars: ✭ 187 (+163.38%)
Mutual labels:  similarity
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-74.65%)
Mutual labels:  deduplication
acid-store
A library for secure, deduplicated, transactional, and verifiable data storage
Stars: ✭ 48 (-32.39%)
Mutual labels:  deduplication
Final word similarity
综合了同义词词林扩展版与知网(Hownet)的词语相似度计算方法,词汇覆盖更多、结果更准确。
Stars: ✭ 485 (+583.1%)
Mutual labels:  similarity
fsimilar
find/file similar
Stars: ✭ 13 (-81.69%)
Mutual labels:  similarity
Jdupes
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Stars: ✭ 790 (+1012.68%)
Mutual labels:  deduplication
dice-coefficient
Sørensen–Dice coefficient
Stars: ✭ 37 (-47.89%)
Mutual labels:  similarity
lieu
Dedupe/batch geocode addresses and venues around the world with libpostal
Stars: ✭ 73 (+2.82%)
Mutual labels:  deduplication
BertSimilarity
Computing similarity of two sentences with google's BERT algorithm。利用Bert计算句子相似度。语义相似度计算。文本相似度计算。
Stars: ✭ 348 (+390.14%)
Mutual labels:  similarity
Fastcdc Rs
FastCDC implementation in Rust
Stars: ✭ 31 (-56.34%)
Mutual labels:  deduplication
zpaqfranz
Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
Stars: ✭ 86 (+21.13%)
Mutual labels:  deduplication
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (+28.17%)
Mutual labels:  deduplication
ReactionDecoder
Reaction Decoder Tool (RDT) - Atom Atom Mapping Tool
Stars: ✭ 59 (-16.9%)
Mutual labels:  similarity
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+722.54%)
Mutual labels:  deduplication
entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Stars: ✭ 96 (+35.21%)
Mutual labels:  deduplication
deduplication
Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
Stars: ✭ 59 (-16.9%)
Mutual labels:  deduplication
dedupsqlfs
Deduplicating filesystem via Python3, FUSE and SQLite
Stars: ✭ 24 (-66.2%)
Mutual labels:  deduplication
dduper
Fast block-level out-of-band BTRFS deduplication tool.
Stars: ✭ 108 (+52.11%)
Mutual labels:  deduplication
Kopia
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
Stars: ✭ 507 (+614.08%)
Mutual labels:  deduplication
aurora
Malware similarity platform with modularity in mind.
Stars: ✭ 70 (-1.41%)
Mutual labels:  similarity
Borgmatic
Simple, configuration-driven backup software for servers and workstations
Stars: ✭ 902 (+1170.42%)
Mutual labels:  deduplication
apollo
Advanced similarity and duplicate source code proof of concept for our research efforts.
Stars: ✭ 49 (-30.99%)
Mutual labels:  similarity
Alertmanager
Prometheus Alertmanager
Stars: ✭ 4,574 (+6342.25%)
Mutual labels:  deduplication
IntraArchiveDeduplicator
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Stars: ✭ 87 (+22.54%)
Mutual labels:  deduplication
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-46.48%)
Mutual labels:  similarity
geocoding
地理编码技术,提供地址标准化和相似度计算。
Stars: ✭ 148 (+108.45%)
Mutual labels:  similarity
Libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Stars: ✭ 3,312 (+4564.79%)
Mutual labels:  deduplication
mongodb-chemistry
Ideas for chemical similarity searches in MongoDB.
Stars: ✭ 23 (-67.61%)
Mutual labels:  similarity
Similarity
similarity:相似度计算工具包,java编写。用于词语、短语、句子、词法分析、情感分析、语义分析等相关的相似度计算。
Stars: ✭ 760 (+970.42%)
Mutual labels:  similarity
cargo-limit
Cargo with less noise: warnings are skipped until errors are fixed, Neovim integration, etc.
Stars: ✭ 105 (+47.89%)
Mutual labels:  deduplication
UMICollapse
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
Stars: ✭ 31 (-56.34%)
Mutual labels:  deduplication
nxontology
NetworkX-based Python library for representing ontologies
Stars: ✭ 45 (-36.62%)
Mutual labels:  similarity
Computervision Recipes
Best Practices, code samples, and documentation for Computer Vision.
Stars: ✭ 8,214 (+11469.01%)
Mutual labels:  similarity
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+822.54%)
Mutual labels:  deduplication
record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Stars: ✭ 67 (-5.63%)
Mutual labels:  deduplication
NDD
Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity
Stars: ✭ 25 (-64.79%)
Mutual labels:  similarity
Dssim
Image similarity comparison simulating human perception (multiscale SSIM in Rust)
Stars: ✭ 668 (+840.85%)
Mutual labels:  similarity
vektonn
vektonn.github.io/vektonn
Stars: ✭ 109 (+53.52%)
Mutual labels:  similarity
goodreads-toolbox
9 tools for Goodreads.com, for finding people based on the books they’ve read, finding books popular among the people you follow, following new book reviews, etc
Stars: ✭ 56 (-21.13%)
Mutual labels:  similarity
mail-deduplicate
📧 CLI to deduplicate mails from mail boxes.
Stars: ✭ 134 (+88.73%)
Mutual labels:  deduplication
Node Damerau Levenshtein
Damerau - Levenstein distance function for node
Stars: ✭ 27 (-61.97%)
Mutual labels:  similarity
NSL
Implementation for <Neural Similarity Learning> in NeurIPS'19.
Stars: ✭ 33 (-53.52%)
Mutual labels:  similarity
semantic-document-relations
Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"
Stars: ✭ 21 (-70.42%)
Mutual labels:  similarity
Frost
A backup program that does deduplication, compression, encryption
Stars: ✭ 25 (-64.79%)
Mutual labels:  deduplication
Python String Similarity
A library implementing different string similarity and distance measures using Python.
Stars: ✭ 546 (+669.01%)
Mutual labels:  similarity
Consimilo
A Clojure library for querying large data-sets on similarity
Stars: ✭ 54 (-23.94%)
Mutual labels:  similarity
Rmlint
Extremely fast tool to remove duplicates and other lint from your filesystem
Stars: ✭ 996 (+1302.82%)
Mutual labels:  deduplication
Dupandas
📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe
Stars: ✭ 20 (-71.83%)
Mutual labels:  deduplication
ruimtehol
R package to Embed All the Things! using StarSpace
Stars: ✭ 95 (+33.8%)
Mutual labels:  similarity
TwinBert
pytorch implementation of the TwinBert paper
Stars: ✭ 36 (-49.3%)
Mutual labels:  similarity
splink
Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+154.93%)
Mutual labels:  deduplication
1-60 of 83 similar projects