All Projects → spark-stringmetric → Similar Projects or Alternatives

413 Open source projects that are alternatives of or similar to spark-stringmetric

ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (-52.94%)
Mutual labels:  jaro-winkler, nysiis, hamming-distance
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+123.53%)
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+17.65%)
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Stars: ✭ 92 (+80.39%)
stringosim
String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...
Stars: ✭ 47 (-7.84%)
Mutual labels:  jaro-winkler, cosine-distance
Example Spark
Spark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (+301.96%)
Mutual labels:  spark
Dpark
Python clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+5131.37%)
Mutual labels:  spark
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+292.16%)
Mutual labels:  spark
Azuredatabricksbestpractices
Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (+264.71%)
Mutual labels:  spark
lsh-semantic-similarity
Locality Sensitive Hashing for semantic similarity (Python 3.x)
Stars: ✭ 16 (-68.63%)
Mutual labels:  jaccard-similarity
Azure Event Hubs
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (+356.86%)
Mutual labels:  spark
Sparkstreaming
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (+250.98%)
Mutual labels:  spark
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+321.57%)
Mutual labels:  spark
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+382.35%)
Mutual labels:  spark
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+298.04%)
Mutual labels:  spark
Text-Similarity
A text similarity computation using minhashing and Jaccard distance on reuters dataset
Stars: ✭ 15 (-70.59%)
Mutual labels:  jaccard-similarity
Scanns
A scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+272.55%)
Mutual labels:  spark
Video Stream Analytics
Stars: ✭ 240 (+370.59%)
Mutual labels:  spark
Roaringbitmap
A better compressed bitset in Java
Stars: ✭ 2,460 (+4723.53%)
Mutual labels:  spark
Spark Streaming With Kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
Stars: ✭ 180 (+252.94%)
Mutual labels:  spark
Spark Kafka Writer
Write your Spark data to Kafka seamlessly
Stars: ✭ 175 (+243.14%)
Mutual labels:  spark
Installations mac ubuntu windows
Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).
Stars: ✭ 231 (+352.94%)
Mutual labels:  spark
Spark
Firely's open source FHIR server
Stars: ✭ 174 (+241.18%)
Mutual labels:  spark
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+23972.55%)
Mutual labels:  spark
Spark Jobserver
REST job server for Apache Spark
Stars: ✭ 2,748 (+5288.24%)
Mutual labels:  spark
Spark.fish
▁▂▄▆▇█▇▆▄▂▁
Stars: ✭ 229 (+349.02%)
Mutual labels:  spark
Spark Structured Streaming Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Stars: ✭ 168 (+229.41%)
Mutual labels:  spark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+323.53%)
Mutual labels:  spark
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+384.31%)
Mutual labels:  spark
Hydro Serving
MLOps Platform
Stars: ✭ 213 (+317.65%)
Mutual labels:  spark
Spark Knn
k-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+301.96%)
Mutual labels:  spark
Neo4j Spark Connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+380.39%)
Mutual labels:  spark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+5584.31%)
Mutual labels:  spark
simetric
String similarity metrics for Elixir
Stars: ✭ 59 (+15.69%)
Mutual labels:  jaro-winkler
Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+4358.82%)
Mutual labels:  spark
Recommendationsystem
Book recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+378.43%)
Mutual labels:  spark
Js Spark
Realtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (+266.67%)
Mutual labels:  spark
eddie
No description or website provided.
Stars: ✭ 18 (-64.71%)
Mutual labels:  jaro-winkler
Kotlin Spark Api
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (+258.82%)
Mutual labels:  spark
Hadoop Docker
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+366.67%)
Mutual labels:  spark
Geopyspark
GeoTrellis for PySpark
Stars: ✭ 167 (+227.45%)
Mutual labels:  spark
Ruby Spark
Ruby wrapper for Apache Spark
Stars: ✭ 221 (+333.33%)
Mutual labels:  spark
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+219.61%)
Mutual labels:  spark
Xsql
Unified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+245.1%)
Mutual labels:  spark
Mastering Spark Sql Book
The Internals of Spark SQL
Stars: ✭ 234 (+358.82%)
Mutual labels:  spark
Kraps Rpc
A RPC framework leveraging Spark RPC module
Stars: ✭ 175 (+243.14%)
Mutual labels:  spark
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+5868.63%)
Mutual labels:  spark
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+4837.25%)
Mutual labels:  spark
Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+345.1%)
Mutual labels:  spark
Transmogrifai
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+3986.27%)
Mutual labels:  spark
edits.cr
Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-68.63%)
Mutual labels:  jaro-winkler
Spark Iforest
Isolation Forest on Spark
Stars: ✭ 166 (+225.49%)
Mutual labels:  spark
Spark Workshop
Apache Spark™ and Scala Workshops
Stars: ✭ 224 (+339.22%)
Mutual labels:  spark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+223.53%)
Mutual labels:  spark
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+388.24%)
Mutual labels:  spark
visualize-data-with-python
A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+17.65%)
Mutual labels:  spark
Sagemaker Spark
A Spark library for Amazon SageMaker.
Stars: ✭ 219 (+329.41%)
Mutual labels:  spark
Whylogs Java
Profile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (+221.57%)
Mutual labels:  spark
Spark Fast Tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Stars: ✭ 249 (+388.24%)
Mutual labels:  spark
Spark Excel
A Spark plugin for reading Excel files via Apache POI
Stars: ✭ 216 (+323.53%)
Mutual labels:  spark
1-60 of 413 similar projects