All Projects → ceja → Similar Projects or Alternatives

110 Open source projects that are alternatives of or similar to ceja

spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+112.5%)
Mutual labels:  jaro-winkler, nysiis, hamming-distance
eddie
No description or website provided.
Stars: ✭ 18 (-25%)
Textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+10629.17%)
edits.cr
Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-33.33%)
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+375%)
Mutual labels:  jaro-winkler, hamming-distance
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+150%)
Mutual labels:  jaro-winkler, hamming-distance
Jellyfish
🎐 a python library for doing approximate and phonetic matching of strings.
Stars: ✭ 1,571 (+6445.83%)
Mutual labels:  jaro-winkler, metaphone
Java String Similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+9912.5%)
Learningapachespark
LearningApacheSpark
Stars: ✭ 155 (+545.83%)
Mutual labels:  pyspark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+16.67%)
Mutual labels:  pyspark
Repo 2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (+454.17%)
Mutual labels:  pyspark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+9579.17%)
Mutual labels:  pyspark
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+141.67%)
Mutual labels:  pyspark
Cc Pyspark
Process Common Crawl data with Python and Spark
Stars: ✭ 147 (+512.5%)
Mutual labels:  pyspark
flask-spark-docker
Just a boilerplate for PySpark and Flask
Stars: ✭ 32 (+33.33%)
Mutual labels:  pyspark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-4.17%)
Mutual labels:  pyspark
Eat pyspark in 10 days
pyspark🍒🥭 is delicious,just eat it!😋😋
Stars: ✭ 116 (+383.33%)
Mutual labels:  pyspark
Hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+350%)
Mutual labels:  pyspark
Relation extraction
Relation Extraction using Deep learning(CNN)
Stars: ✭ 96 (+300%)
Mutual labels:  pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+5475%)
Mutual labels:  pyspark
double-metaphone
Fast Double Metaphone algorithm
Stars: ✭ 70 (+191.67%)
Mutual labels:  metaphone
workshop-spark
Código para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (+12.5%)
Mutual labels:  pyspark
Pyspark Tutorial
PySpark Code for Hands-on Learners
Stars: ✭ 91 (+279.17%)
Mutual labels:  pyspark
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (+262.5%)
Mutual labels:  pyspark
Morphl Community Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+954.17%)
Mutual labels:  pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (+162.5%)
Mutual labels:  pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+587.5%)
Mutual labels:  pyspark
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+250%)
Mutual labels:  pyspark
Handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+558.33%)
Mutual labels:  pyspark
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+220.83%)
Mutual labels:  pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+525%)
Mutual labels:  pyspark
jgit-spark-connector
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (+195.83%)
Mutual labels:  pyspark
Pyspark Learning
Updated repository
Stars: ✭ 147 (+512.5%)
Mutual labels:  pyspark
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+112.5%)
Mutual labels:  pyspark
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+425%)
Mutual labels:  pyspark
pyspark-cassandra
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Stars: ✭ 70 (+191.67%)
Mutual labels:  pyspark
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+350%)
Mutual labels:  pyspark
OSCI
Open Source Contributor Index
Stars: ✭ 107 (+345.83%)
Mutual labels:  pyspark
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+308.33%)
Mutual labels:  pyspark
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+5529.17%)
Mutual labels:  pyspark
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+33.33%)
Mutual labels:  pyspark
Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+4320.83%)
Mutual labels:  pyspark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+279.17%)
Mutual labels:  pyspark
spark-dgraph-connector
A connector for Apache Spark and PySpark to Dgraph databases.
Stars: ✭ 36 (+50%)
Mutual labels:  pyspark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (+166.67%)
Mutual labels:  pyspark
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+200%)
Mutual labels:  pyspark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+800%)
Mutual labels:  pyspark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+3875%)
Mutual labels:  pyspark
Petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+4516.67%)
Mutual labels:  pyspark
Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+804.17%)
Mutual labels:  pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+4008.33%)
Mutual labels:  pyspark
oshinko-s2i
This is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-33.33%)
Mutual labels:  pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-41.67%)
Mutual labels:  pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+11979.17%)
Mutual labels:  pyspark
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-50%)
Mutual labels:  pyspark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (+0%)
Mutual labels:  pyspark
kafka-twitter-spark-streaming
Counting Tweets Per User in Real-Time
Stars: ✭ 38 (+58.33%)
Mutual labels:  pyspark
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+733.33%)
Mutual labels:  pyspark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-4.17%)
Mutual labels:  pyspark
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-4.17%)
Mutual labels:  pyspark
1-60 of 110 similar projects