All Projects → spark-stringmetric → Similar Projects or Alternatives

413 Open source projects that are alternatives of or similar to spark-stringmetric

ceja

PySpark phonetic and string matching algorithms

Stars: ✭ 24 (-52.94%)

Mutual labels: jaro-winkler, nysiis, hamming-distance

strutil

Golang metrics for calculating string similarity and other string utility functions

Stars: ✭ 114 (+123.53%)

Mutual labels: jaro-winkler, jaccard-similarity, hamming-distance

stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..

Stars: ✭ 60 (+17.65%)

Mutual labels: jaro-winkler, jaccard-similarity, hamming-distance

tika-similarity

Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.

Stars: ✭ 92 (+80.39%)

Mutual labels: jaccard-similarity, cosine-distance

stringosim

String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...

Stars: ✭ 47 (-7.84%)

Mutual labels: jaro-winkler, cosine-distance

Example Spark

Spark, Spark Streaming and Spark SQL unit testing strategies

Stars: ✭ 205 (+301.96%)

Mutual labels: spark

Dpark

Python clone of Spark, a MapReduce alike framework in Python

Stars: ✭ 2,668 (+5131.37%)

Mutual labels: spark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+292.16%)

Mutual labels: spark

Azuredatabricksbestpractices

Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs

Stars: ✭ 186 (+264.71%)

Mutual labels: spark

lsh-semantic-similarity

Locality Sensitive Hashing for semantic similarity (Python 3.x)

Stars: ✭ 16 (-68.63%)

Mutual labels: jaccard-similarity

Azure Event Hubs

☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs

Stars: ✭ 233 (+356.86%)

Mutual labels: spark

Sparkstreaming

💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算)；🚀 支持运行过程中增删topic；🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。

Stars: ✭ 179 (+250.98%)

Mutual labels: spark

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+321.57%)

Mutual labels: spark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+382.35%)

Mutual labels: spark

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (+298.04%)

Mutual labels: spark

Text-Similarity

A text similarity computation using minhashing and Jaccard distance on reuters dataset

Stars: ✭ 15 (-70.59%)

Mutual labels: jaccard-similarity

Scanns

A scalable nearest neighbor search library in Apache Spark

Stars: ✭ 190 (+272.55%)

Mutual labels: spark

Video Stream Analytics

Stars: ✭ 240 (+370.59%)

Mutual labels: spark

Roaringbitmap

A better compressed bitset in Java

Stars: ✭ 2,460 (+4723.53%)

Mutual labels: spark

Spark Streaming With Kafka

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

Stars: ✭ 180 (+252.94%)

Mutual labels: spark

Spark Kafka Writer

Write your Spark data to Kafka seamlessly

Stars: ✭ 175 (+243.14%)

Mutual labels: spark

Installations mac ubuntu windows

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Stars: ✭ 231 (+352.94%)

Mutual labels: spark

Spark

Firely's open source FHIR server

Stars: ✭ 174 (+241.18%)

Mutual labels: spark

Deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…

Stars: ✭ 12,277 (+23972.55%)

Mutual labels: spark

Spark Jobserver

REST job server for Apache Spark

Stars: ✭ 2,748 (+5288.24%)

Mutual labels: spark

Spark.fish

▁▂▄▆▇█▇▆▄▂▁

Stars: ✭ 229 (+349.02%)

Mutual labels: spark

Spark Structured Streaming Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Stars: ✭ 168 (+229.41%)

Mutual labels: spark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+323.53%)

Mutual labels: spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+384.31%)

Mutual labels: spark

Hydro Serving

MLOps Platform

Stars: ✭ 213 (+317.65%)

Mutual labels: spark

Spark Knn

k-Nearest Neighbors algorithm on Spark

Stars: ✭ 205 (+301.96%)

Mutual labels: spark

Neo4j Spark Connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

Stars: ✭ 245 (+380.39%)

Mutual labels: spark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+5584.31%)

Mutual labels: spark

simetric

String similarity metrics for Elixir

Stars: ✭ 59 (+15.69%)

Mutual labels: jaro-winkler

Ballista

Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Stars: ✭ 2,274 (+4358.82%)

Mutual labels: spark

Recommendationsystem

Book recommender system using collaborative filtering based on Spark

Stars: ✭ 244 (+378.43%)

Mutual labels: spark

Js Spark

Realtime calculation distributed system. AKA distributed lodash

Stars: ✭ 187 (+266.67%)

Mutual labels: spark

eddie

No description or website provided.

Stars: ✭ 18 (-64.71%)

Mutual labels: jaro-winkler

Kotlin Spark Api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Stars: ✭ 183 (+258.82%)

Mutual labels: spark

Hadoop Docker

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Stars: ✭ 238 (+366.67%)

Mutual labels: spark

Geopyspark

GeoTrellis for PySpark

Stars: ✭ 167 (+227.45%)

Mutual labels: spark

Ruby Spark

Ruby wrapper for Apache Spark

Stars: ✭ 221 (+333.33%)

Mutual labels: spark

Big Whale

Spark、Flink等离线任务的调度以及实时任务的监控

Stars: ✭ 163 (+219.61%)

Mutual labels: spark

Xsql

Unified SQL Analytics Engine Based on SparkSQL

Stars: ✭ 176 (+245.1%)

Mutual labels: spark

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (+358.82%)

Mutual labels: spark

Kraps Rpc

A RPC framework leveraging Spark RPC module

Stars: ✭ 175 (+243.14%)

Mutual labels: spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+5868.63%)

Mutual labels: spark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+4837.25%)

Mutual labels: spark

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+345.1%)

Mutual labels: spark

Transmogrifai

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+3986.27%)

Mutual labels: spark

edits.cr

Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

Stars: ✭ 16 (-68.63%)

Mutual labels: jaro-winkler

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+225.49%)

Mutual labels: spark

Spark Workshop

Apache Spark™ and Scala Workshops