All Projects → nomenklatura → Similar Projects or Alternatives

64 Open source projects that are alternatives of or similar to nomenklatura

Jdupes
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Stars: ✭ 790 (+400%)
Mutual labels:  deduplication
UMICollapse
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
Stars: ✭ 31 (-80.38%)
Mutual labels:  deduplication
cosmosR
COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
Stars: ✭ 30 (-81.01%)
Mutual labels:  data-integration
Rmlint
Extremely fast tool to remove duplicates and other lint from your filesystem
Stars: ✭ 996 (+530.38%)
Mutual labels:  deduplication
IntraArchiveDeduplicator
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Stars: ✭ 87 (-44.94%)
Mutual labels:  deduplication
Mapeathor
Translator of spreadsheet mappings into R2RML, RML or YARRRML
Stars: ✭ 27 (-82.91%)
Mutual labels:  data-integration
Kopia
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
Stars: ✭ 507 (+220.89%)
Mutual labels:  deduplication
SDM-RDFizer
An Efficient RML-Compliant Engine for Knowledge Graph Construction
Stars: ✭ 68 (-56.96%)
Mutual labels:  data-integration
entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Stars: ✭ 96 (-39.24%)
Mutual labels:  deduplication
Restic
Fast, secure, efficient backup program
Stars: ✭ 15,105 (+9460.13%)
Mutual labels:  deduplication
Fingerprints
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Stars: ✭ 91 (-42.41%)
Mutual labels:  deduplication
zpaqfranz
Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
Stars: ✭ 86 (-45.57%)
Mutual labels:  deduplication
doctoral-thesis
📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology
Stars: ✭ 26 (-83.54%)
Mutual labels:  data-integration
Dupandas
📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe
Stars: ✭ 20 (-87.34%)
Mutual labels:  deduplication
thymeflow
Installer for Thymeflow, a personal knowledge management system.
Stars: ✭ 27 (-82.91%)
Mutual labels:  data-integration
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+269.62%)
Mutual labels:  deduplication
data-product-streaming
Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Stars: ✭ 32 (-79.75%)
Mutual labels:  data-integration
Libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Stars: ✭ 3,312 (+1996.2%)
Mutual labels:  deduplication
Awesome Single Cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Stars: ✭ 1,937 (+1125.95%)
Mutual labels:  data-integration
record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Stars: ✭ 67 (-57.59%)
Mutual labels:  deduplication
Data Matching Software
A list of free data matching and record linkage software.
Stars: ✭ 206 (+30.38%)
Mutual labels:  deduplication
dduper
Fast block-level out-of-band BTRFS deduplication tool.
Stars: ✭ 108 (-31.65%)
Mutual labels:  deduplication
assignPOP
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
Stars: ✭ 16 (-89.87%)
Mutual labels:  data-integration
cargo-limit
Cargo with less noise: warnings are skipped until errors are fixed, Neovim integration, etc.
Stars: ✭ 105 (-33.54%)
Mutual labels:  deduplication
Dupeguru
Find duplicate files
Stars: ✭ 2,385 (+1409.49%)
Mutual labels:  deduplication
Spark Lucenerdd
Spark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-27.85%)
Mutual labels:  deduplication
mail-deduplicate
📧 CLI to deduplicate mails from mail boxes.
Stars: ✭ 134 (-15.19%)
Mutual labels:  deduplication
CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Stars: ✭ 22 (-86.08%)
Mutual labels:  data-integration
Rltk
Record Linkage ToolKit (Find and link entities)
Stars: ✭ 71 (-55.06%)
Mutual labels:  deduplication
data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Stars: ✭ 27 (-82.91%)
Mutual labels:  data-integration
Fastcdc Rs
FastCDC implementation in Rust
Stars: ✭ 31 (-80.38%)
Mutual labels:  deduplication
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-51.27%)
Mutual labels:  data-integration
Borgmatic
Simple, configuration-driven backup software for servers and workstations
Stars: ✭ 902 (+470.89%)
Mutual labels:  deduplication
Hudi
Upserts, Deletes And Incremental Processing on Big Data.
Stars: ✭ 2,586 (+1536.71%)
Mutual labels:  data-integration
Rdedup
Data deduplication engine, supporting optional compression and public key encryption.
Stars: ✭ 690 (+336.71%)
Mutual labels:  deduplication
OpenOmics
A bioinformatics API and web-app to integrate multi-omics datasets & interface with public databases.
Stars: ✭ 22 (-86.08%)
Mutual labels:  data-integration
Recordlinkage
A toolkit for record linkage and duplicate detection in Python
Stars: ✭ 532 (+236.71%)
Mutual labels:  deduplication
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+200%)
Mutual labels:  data-integration
Alertmanager
Prometheus Alertmanager
Stars: ✭ 4,574 (+2794.94%)
Mutual labels:  deduplication
winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Stars: ✭ 101 (-36.08%)
Mutual labels:  data-integration
lieu
Dedupe/batch geocode addresses and venues around the world with libpostal
Stars: ✭ 73 (-53.8%)
Mutual labels:  deduplication
Rudder Server
Privacy and Security focused Segment-alternative, in Golang and React
Stars: ✭ 2,874 (+1718.99%)
Mutual labels:  data-integration
RocketMQDedupListener
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
Stars: ✭ 132 (-16.46%)
Mutual labels:  deduplication
CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Stars: ✭ 109 (-31.01%)
Mutual labels:  data-integration
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (-42.41%)
Mutual labels:  deduplication
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-89.87%)
Mutual labels:  data-integration
splink
Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+14.56%)
Mutual labels:  deduplication
Lsh
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Stars: ✭ 182 (+15.19%)
Mutual labels:  deduplication
acid-store
A library for secure, deduplicated, transactional, and verifiable data storage
Stars: ✭ 48 (-69.62%)
Mutual labels:  deduplication
Mara Pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+1065.19%)
Mutual labels:  data-integration
yadf
Yet Another Dupes Finder
Stars: ✭ 32 (-79.75%)
Mutual labels:  deduplication
Kvdo
A pair of kernel modules which provide pools of deduplicated and/or compressed block storage.
Stars: ✭ 168 (+6.33%)
Mutual labels:  deduplication
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+314.56%)
Mutual labels:  deduplication
SchemaMapper
A .NET class library that allows you to import data from different sources into a unified destination
Stars: ✭ 41 (-74.05%)
Mutual labels:  data-integration
Dejavu
Quickly detect already witnessed data.
Stars: ✭ 151 (-4.43%)
Mutual labels:  deduplication
R-Learning-Journey
Some of the projects i made when starting to learn R for Data Science at the university
Stars: ✭ 19 (-87.97%)
Mutual labels:  data-integration
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+3013.29%)
Mutual labels:  data-integration
scarches
Reference mapping for single-cell genomics
Stars: ✭ 175 (+10.76%)
Mutual labels:  data-integration
bio2bel
A Python framework for integrating biological databases and structured data sources in Biological Expression Language (BEL)
Stars: ✭ 16 (-89.87%)
Mutual labels:  data-integration
Vdo
Userspace tools for managing VDO volumes.
Stars: ✭ 138 (-12.66%)
Mutual labels:  deduplication
1-60 of 64 similar projects