eddieNo description or website provided.
Stars: ✭ 18 (-41.94%)
yadfYet Another Dupes Finder
Stars: ✭ 32 (+3.23%)
stringbenchString matching algorithm benchmark
Stars: ✭ 31 (+0%)
Daily-Coding-DS-ALGO-PracticeA open source project🚀 for bringing all interview💥💥 and competative📘 programming💥💥 question under one repo📐📐
Stars: ✭ 255 (+722.58%)
strsimstring similarity based on Dice's coefficient in go
Stars: ✭ 39 (+25.81%)
fastq utilsValidation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.
Stars: ✭ 25 (-19.35%)
stringdistanceA fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+93.55%)
PairfqSync paired-end FASTA/Q files and keep singleton reads
Stars: ✭ 18 (-41.94%)
multi string replaceA fast multiple string replace library for ruby. Uses a C implementation of the Aho–Corasick Algorithm based on https://github.com/morenice/ahocorasick while adding support for on the fly multiple string replacement. Faster alternative to String.gsub when dealing with non-regex (exact match) use cases
Stars: ✭ 16 (-48.39%)
Java-Questions-and-SolutionsThis repository aims to solve and create new problems from different spheres of coding. A path to help students to get access to solutions and discuss their doubts.
Stars: ✭ 34 (+9.68%)
readfqA simple tool to calculate reads number and total base count in FASTQ file
Stars: ✭ 19 (-38.71%)
baps-bgd.github.ioThis repository is used to maintain the site of BAPS. Please read the README if you are willing to contribute.
Stars: ✭ 17 (-45.16%)
strutilGolang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+267.74%)
pheniqsFast and accurate sequence demultiplexing
Stars: ✭ 14 (-54.84%)
textics📉 JavaScript Text Statistics that counts lines, words, chars, and spaces.
Stars: ✭ 36 (+16.13%)
pysdslPython bindings to Succinct Data Structure Library 2.0
Stars: ✭ 23 (-25.81%)
zinggScalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+2012.9%)
DS AlgoA repository to maintain various data structures and algorithms
Stars: ✭ 23 (-25.81%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+483.87%)
binMy bioinfo toolbox
Stars: ✭ 42 (+35.48%)
Neural-Scam-ArtistWeb Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-41.94%)
stanceLearned string similarity for entity names using optimal transport.
Stars: ✭ 27 (-12.9%)
record-linkage-resourcesResources for tackling record linkage / deduplication / data matching problems
Stars: ✭ 67 (+116.13%)
acid-storeA library for secure, deduplicated, transactional, and verifiable data storage
Stars: ✭ 48 (+54.84%)
bedaBeda is a golang library for detecting how similar a two string
Stars: ✭ 34 (+9.68%)
fucFrequently used commands in bioinformatics
Stars: ✭ 23 (-25.81%)
IntraArchiveDeduplicatorTool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Stars: ✭ 87 (+180.65%)
OOP-In-CPlusPlusAn Awesome Repository On Object Oriented Programming In C++ Language. Ideal For Computer Science Undergraduates, This Repository Holds All The Resources Created And Used By Me - Code & Theory For One To Master Object Oriented Programming. Filled With Theory Slides, Number Of Programs, Concept-Clearing Projects And Beautifully Explained, Well Doc…
Stars: ✭ 27 (-12.9%)
PythonRepositori untuk belajar pemrograman Python dalam bahasa Indonesia
Stars: ✭ 79 (+154.84%)
LevenshteinThe Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Stars: ✭ 38 (+22.58%)
gencoreGenerate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (+193.55%)
fqCommand line utility for manipulating Illumina-generated FastQ files.
Stars: ✭ 31 (+0%)
cargo-limitCargo with less noise: warnings are skipped until errors are fixed, Neovim integration, etc.
Stars: ✭ 105 (+238.71%)
ngs pipelineExome/Capture/RNASeq Pipeline Implementation using snakemake
Stars: ✭ 40 (+29.03%)
swift-algorithms-data-structs📒 Algorithms and Data Structures in Swift. The used approach attempts to fully utilize the Swift Standard Library and Protocol-Oriented paradigm.
Stars: ✭ 42 (+35.48%)
nullarbor💾 📃 "Reads to report" for public health and clinical microbiology
Stars: ✭ 111 (+258.06%)
AhoCorasickAho-Corasick multi-string search for .NET and SQL Server.
Stars: ✭ 39 (+25.81%)
zpaqfranzDeduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
Stars: ✭ 86 (+177.42%)
entity-embedPyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Stars: ✭ 96 (+209.68%)
deduplicationFast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
Stars: ✭ 59 (+90.32%)
mail-deduplicate📧 CLI to deduplicate mails from mail boxes.
Stars: ✭ 134 (+332.26%)
dduperFast block-level out-of-band BTRFS deduplication tool.
Stars: ✭ 108 (+248.39%)
cs-resourcesCurated Computer Science and Programming Resource Guide
Stars: ✭ 42 (+35.48%)
py-algorithmsAlgorithms and Data Structures, solutions to common CS problems.
Stars: ✭ 26 (-16.13%)
nafNucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (+12.9%)