All Projects → nomenklatura → Similar Projects or Alternatives

64 Open source projects that are alternatives of or similar to nomenklatura

A powerful duplicate file finder and an enhanced fork of 'fdupes'.

Stars: ✭ 790 (+400%)

Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.

Stars: ✭ 31 (-80.38%)

Mutual labels: deduplication

cosmosR

COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.

Stars: ✭ 30 (-81.01%)

Mutual labels: data-integration

Rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem

Stars: ✭ 996 (+530.38%)

Mutual labels: deduplication

IntraArchiveDeduplicator

Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.

Stars: ✭ 87 (-44.94%)

Mutual labels: deduplication

Mapeathor

Translator of spreadsheet mappings into R2RML, RML or YARRRML

Stars: ✭ 27 (-82.91%)

Mutual labels: data-integration

Kopia

Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

Stars: ✭ 507 (+220.89%)

Mutual labels: deduplication

SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction

Stars: ✭ 68 (-56.96%)

Mutual labels: data-integration

entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

Stars: ✭ 96 (-39.24%)

Mutual labels: deduplication

Restic

Fast, secure, efficient backup program

Stars: ✭ 15,105 (+9460.13%)

Mutual labels: deduplication

Fingerprints

Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.

Stars: ✭ 91 (-42.41%)

Mutual labels: deduplication

zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix

Stars: ✭ 86 (-45.57%)

Mutual labels: deduplication

doctoral-thesis

📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology

Stars: ✭ 26 (-83.54%)

Mutual labels: data-integration

Dupandas

📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe

Stars: ✭ 20 (-87.34%)

Mutual labels: deduplication

thymeflow

Installer for Thymeflow, a personal knowledge management system.

Stars: ✭ 27 (-82.91%)

Mutual labels: data-integration

Talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

Stars: ✭ 584 (+269.62%)

Mutual labels: deduplication

data-product-streaming

Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.

Stars: ✭ 32 (-79.75%)

Mutual labels: data-integration

Libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

Stars: ✭ 3,312 (+1996.2%)

Mutual labels: deduplication

Awesome Single Cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

Stars: ✭ 1,937 (+1125.95%)

Mutual labels: data-integration

record-linkage-resources

Resources for tackling record linkage / deduplication / data matching problems

Stars: ✭ 67 (-57.59%)

Mutual labels: deduplication

Data Matching Software

A list of free data matching and record linkage software.

Stars: ✭ 206 (+30.38%)

Mutual labels: deduplication

dduper

Fast block-level out-of-band BTRFS deduplication tool.

Stars: ✭ 108 (-31.65%)

Mutual labels: deduplication

assignPOP

Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.

Stars: ✭ 16 (-89.87%)

Mutual labels: data-integration

cargo-limit

Cargo with less noise: warnings are skipped until errors are fixed, Neovim integration, etc.

Stars: ✭ 105 (-33.54%)

Mutual labels: deduplication

Dupeguru

Find duplicate files

Stars: ✭ 2,385 (+1409.49%)

Mutual labels: deduplication

Spark Lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Stars: ✭ 114 (-27.85%)

Mutual labels: deduplication

mail-deduplicate

📧 CLI to deduplicate mails from mail boxes.

Stars: ✭ 134 (-15.19%)

Mutual labels: deduplication

CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Stars: ✭ 22 (-86.08%)

Mutual labels: data-integration

Rltk

Record Linkage ToolKit (Find and link entities)

Stars: ✭ 71 (-55.06%)

Mutual labels: deduplication

data-product-batch

Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.

Stars: ✭ 27 (-82.91%)

Mutual labels: data-integration

Fastcdc Rs

FastCDC implementation in Rust

Stars: ✭ 31 (-80.38%)

Mutual labels: deduplication

morph-kgc

Powerful RDF Knowledge Graph Generation with [R2]RML Mappings

Stars: ✭ 77 (-51.27%)

Mutual labels: data-integration

Borgmatic

Simple, configuration-driven backup software for servers and workstations

Stars: ✭ 902 (+470.89%)

Mutual labels: deduplication

Hudi

Upserts, Deletes And Incremental Processing on Big Data.

Stars: ✭ 2,586 (+1536.71%)

Mutual labels: data-integration

Rdedup

Data deduplication engine, supporting optional compression and public key encryption.

Stars: ✭ 690 (+336.71%)

Mutual labels: deduplication

OpenOmics

A bioinformatics API and web-app to integrate multi-omics datasets & interface with public databases.

Stars: ✭ 22 (-86.08%)

Mutual labels: data-integration

Recordlinkage

A toolkit for record linkage and duplicate detection in Python

Stars: ✭ 532 (+236.71%)

Mutual labels: deduplication

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (+200%)

Mutual labels: data-integration

Alertmanager

Prometheus Alertmanager

Stars: ✭ 4,574 (+2794.94%)

Mutual labels: deduplication

winter

WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.

Stars: ✭ 101 (-36.08%)

Mutual labels: data-integration

lieu

Dedupe/batch geocode addresses and venues around the world with libpostal

Stars: ✭ 73 (-53.8%)

Mutual labels: deduplication

Rudder Server

Privacy and Security focused Segment-alternative, in Golang and React

Stars: ✭ 2,874 (+1718.99%)

Mutual labels: data-integration

RocketMQDedupListener

RocketMQ消息幂等去重消费者，支持使用MySQL或者Redis做幂等表，开箱即用

Stars: ✭ 132 (-16.46%)

Mutual labels: deduplication

CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

Stars: ✭ 109 (-31.01%)

Mutual labels: data-integration

gencore

Generate duplex/single consensus reads to reduce sequencing noises and remove duplications

Stars: ✭ 91 (-42.41%)

Mutual labels: deduplication

DataBridge.NET

Configurable data bridge for permanent ETL jobs

Stars: ✭ 16 (-89.87%)

Mutual labels: data-integration

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (+14.56%)

Mutual labels: deduplication

Lsh

Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents

Stars: ✭ 182 (+15.19%)

Mutual labels: deduplication

acid-store

A library for secure, deduplicated, transactional, and verifiable data storage

Stars: ✭ 48 (-69.62%)

Mutual labels: deduplication

Mara Pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Stars: ✭ 1,841 (+1065.19%)

Mutual labels: data-integration

yadf

Yet Another Dupes Finder

Stars: ✭ 32 (-79.75%)

Mutual labels: deduplication

Kvdo

A pair of kernel modules which provide pools of deduplicated and/or compressed block storage.

Stars: ✭ 168 (+6.33%)

Mutual labels: deduplication

zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Stars: ✭ 655 (+314.56%)

Mutual labels: deduplication

SchemaMapper

A .NET class library that allows you to import data from different sources into a unified destination

Stars: ✭ 41 (-74.05%)

Mutual labels: data-integration

Dejavu

Quickly detect already witnessed data.

Stars: ✭ 151 (-4.43%)

Mutual labels: deduplication

R-Learning-Journey

Some of the projects i made when starting to learn R for Data Science at the university

Stars: ✭ 19 (-87.97%)

Mutual labels: data-integration

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+3013.29%)

Mutual labels: data-integration

scarches

Reference mapping for single-cell genomics

Stars: ✭ 175 (+10.76%)

Mutual labels: data-integration

bio2bel

A Python framework for integrating biological databases and structured data sources in Biological Expression Language (BEL)

Stars: ✭ 16 (-89.87%)

Mutual labels: data-integration

Vdo

Userspace tools for managing VDO volumes.

Stars: ✭ 138 (-12.66%)

Mutual labels: deduplication

1-60 of 64 similar projects

›