All Projects → pyspark-algorithms → Similar Projects or Alternatives

1055 Open source projects that are alternatives of or similar to pyspark-algorithms

data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-52.78%)
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+108.33%)
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-52.78%)
Mutual labels:  big-data, pyspark, mapreduce
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+111.11%)
Tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (+280.56%)
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+54.17%)
Mutual labels:  big-data, pyspark, dataframe
Helicalinsight
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (+197.22%)
Mutual labels:  big-data, nosql
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-73.61%)
Mutual labels:  pyspark, graphframes
Archived-SANSA-Query
SANSA Query Layer
Stars: ✭ 31 (-56.94%)
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-45.83%)
Mutual labels:  big-data, distributed-computing
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+3177.78%)
Mutual labels:  big-data, dataframe
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-59.72%)
Mutual labels:  big-data, pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+59.72%)
Mutual labels:  big-data, pyspark
javaer-mind
Java 程序员进阶学习的思维导图
Stars: ✭ 66 (-8.33%)
Mutual labels:  big-data, nosql
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-30.56%)
Mutual labels:  big-data, pyspark
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-55.56%)
Mutual labels:  pyspark, rdd
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+226.39%)
Mutual labels:  big-data, dataframe
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-62.5%)
Mutual labels:  big-data, mapreduce
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-45.83%)
Mutual labels:  big-data, pyspark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+4559.72%)
Mutual labels:  big-data, pyspark
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-65.28%)
Mutual labels:  distributed-computing, pyspark
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+30522.22%)
Mutual labels:  big-data, mapreduce
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+6375%)
Mutual labels:  big-data, distributed-computing
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+7556.94%)
Mutual labels:  big-data, nosql
Iotdb
Apache IoTDB
Stars: ✭ 1,221 (+1595.83%)
Mutual labels:  big-data, nosql
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-66.67%)
Mutual labels:  big-data, pyspark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+26.39%)
Mutual labels:  big-data, pyspark
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+15165.28%)
Mutual labels:  big-data, mapreduce
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1758.33%)
Mutual labels:  big-data, pyspark
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+1323.61%)
Mutual labels:  big-data, distributed-computing
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-1.39%)
Mutual labels:  big-data, mapreduce
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+1190.28%)
Mutual labels:  mapreduce, dataframe
ParallelUtilities.jl
Fast and easy parallel mapreduce on HPC clusters
Stars: ✭ 28 (-61.11%)
Mutual labels:  distributed-computing, mapreduce
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+3926.39%)
Mutual labels:  big-data, pyspark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+200%)
Mutual labels:  big-data, pyspark
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+1218.06%)
Mutual labels:  distributed-computing, mapreduce
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (+76.39%)
Mutual labels:  big-data, distributed-computing
Nakedtensor
Bare bone examples of machine learning in TensorFlow
Stars: ✭ 2,443 (+3293.06%)
Mutual labels:  big-data, distributed-computing
Selinon
An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+229.17%)
Mutual labels:  big-data, distributed-computing
MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-72.22%)
Mutual labels:  big-data, mapreduce
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-34.72%)
Mutual labels:  big-data, rdd
Thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (+633.33%)
Mutual labels:  big-data, distributed-computing
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+401.39%)
Mutual labels:  big-data, distributed-computing
Beeva Best Practices
Best Practices and Style Guides in BEEVA
Stars: ✭ 335 (+365.28%)
Mutual labels:  big-data, nosql
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-61.11%)
Mutual labels:  pyspark, dataframe
Asakusafw
Asakusa Framework
Stars: ✭ 114 (+58.33%)
Mutual labels:  big-data, mapreduce
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+4127.78%)
Mutual labels:  big-data, dataframe
merkle-db
High-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-38.89%)
Mutual labels:  big-data, nosql
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+16.67%)
Mutual labels:  pyspark
cdp-service
cdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (-58.33%)
Mutual labels:  big-data
mesos-pinspider
A framework called "pinspider" on Apache mesos, to get basic user information from a pinterest page of a user.
Stars: ✭ 18 (-75%)
Mutual labels:  distributed-computing
dynamodb-onetable
DynamoDB access and management for one table designs with NodeJS
Stars: ✭ 508 (+605.56%)
Mutual labels:  nosql
elearning
elearning linux/mac/db/cache/server/tools/人工智能
Stars: ✭ 72 (+0%)
Mutual labels:  nosql
Quantitative-Big-Imaging-2018
(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (-30.56%)
Mutual labels:  big-data
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-19.44%)
Mutual labels:  pyspark
metriql
The metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+215.28%)
Mutual labels:  big-data
sgd
An R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (-23.61%)
Mutual labels:  big-data
xslweb
Web application framework for XSLT and XQuery developers
Stars: ✭ 39 (-45.83%)
Mutual labels:  transformations
SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (+80.56%)
Mutual labels:  distributed-computing
meesee
Task queue, Long lived workers for work based parallelization, with processes and Redis as back-end. For distributed computing.
Stars: ✭ 14 (-80.56%)
Mutual labels:  distributed-computing
1-60 of 1055 similar projects