spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+112.5%)
eddieNo description or website provided.
Stars: ✭ 18 (-25%)
TextdistanceCompute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+10629.17%)
edits.crEdit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-33.33%)
strutilGolang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+375%)
stringdistanceA fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+150%)
Jellyfish🎐 a python library for doing approximate and phonetic matching of strings.
Stars: ✭ 1,571 (+6445.83%)
Java String SimilarityImplementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+9912.5%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+16.67%)
Repo 2019BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (+454.17%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+9579.17%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+141.67%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (+512.5%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-4.17%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+350%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+5475%)
workshop-sparkCódigo para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (+12.5%)
Morphl Community EditionMorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+954.17%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (+162.5%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+558.33%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+220.83%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+525%)
jgit-spark-connectorjgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (+195.83%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+112.5%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+425%)
pyspark-cassandrapyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Stars: ✭ 70 (+191.67%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+350%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (+345.83%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+308.33%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+5529.17%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+33.33%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+4320.83%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+279.17%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (+166.67%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+200%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+800%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+3875%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+4516.67%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+804.17%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+4008.33%)
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-33.33%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-41.67%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+11979.17%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-50%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (+0%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+733.33%)
Cluster PackA library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-4.17%)