All Projects → Eat_pyspark_in_10_days → Similar Projects or Alternatives

458 Open source projects that are alternatives of or similar to Eat_pyspark_in_10_days

experiments
Code examples for my blog posts
Stars: ✭ 21 (-81.9%)
Mutual labels:  spark
Angel
A Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+5467.24%)
Mutual labels:  spark
splink
Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+56.03%)
Mutual labels:  spark
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-20.69%)
Mutual labels:  spark
visualize-data-with-python
A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (-48.28%)
Mutual labels:  spark
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+542.24%)
Mutual labels:  spark
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-70.69%)
Mutual labels:  pyspark
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-85.34%)
Mutual labels:  pyspark
Cdhproject
hadoop各组件使用,持续更新
Stars: ✭ 733 (+531.9%)
Mutual labels:  spark
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-59.48%)
Mutual labels:  pyspark
Elassandra
Elassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+1287.93%)
Mutual labels:  spark
Springboard-Data-Science-Immersive
No description or website provided.
Stars: ✭ 52 (-55.17%)
Mutual labels:  pyspark
Frameless
Expressive types for Spark.
Stars: ✭ 717 (+518.1%)
Mutual labels:  spark
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-78.45%)
Mutual labels:  pyspark
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (+121.55%)
Mutual labels:  spark
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-75%)
Mutual labels:  pyspark
Elasticsearch Spark Recommender
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Stars: ✭ 707 (+509.48%)
Mutual labels:  spark
Big Data
🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-19.83%)
Mutual labels:  spark
Heracles
High performance HBase / Spark SQL engine
Stars: ✭ 27 (-76.72%)
Mutual labels:  spark
Sk Dist
Distributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+124.14%)
Mutual labels:  spark
pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
Stars: ✭ 24 (-79.31%)
Mutual labels:  pyspark
Useractionanalyzeplatform
电商用户行为分析大数据平台
Stars: ✭ 645 (+456.03%)
Mutual labels:  spark
Roffildlibrary
Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-45.69%)
Mutual labels:  spark
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-72.41%)
Mutual labels:  pyspark
Freestyle
A cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (+440.52%)
Mutual labels:  spark
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-56.03%)
Mutual labels:  pyspark
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+123.28%)
Mutual labels:  spark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-66.38%)
Mutual labels:  pyspark
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+4775.86%)
Mutual labels:  spark
flask-spark-docker
Just a boilerplate for PySpark and Flask
Stars: ✭ 32 (-72.41%)
Mutual labels:  pyspark
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-48.28%)
Mutual labels:  spark
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-37.93%)
Mutual labels:  pyspark
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+4652.59%)
Mutual labels:  spark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-52.59%)
Mutual labels:  pyspark
Pyspark Tutorial
PySpark Code for Hands-on Learners
Stars: ✭ 91 (-21.55%)
Mutual labels:  pyspark
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-50%)
Mutual labels:  pyspark
Alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+4537.07%)
Mutual labels:  spark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-75.86%)
Mutual labels:  pyspark
Zemberek Nlp Server
Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-48.28%)
Mutual labels:  spark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-80.17%)
Mutual labels:  pyspark
Spark Daria
Essential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (+376.72%)
Mutual labels:  spark
Xlearning Xdml
extremely distributed machine learning
Stars: ✭ 113 (-2.59%)
Mutual labels:  spark
Bigdata Notebook
Stars: ✭ 100 (-13.79%)
Mutual labels:  spark
Spark Website
Apache Spark Website
Stars: ✭ 75 (-35.34%)
Mutual labels:  spark
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+27156.9%)
Mutual labels:  spark
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (+121.55%)
Mutual labels:  spark
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2524.14%)
Mutual labels:  spark
Lopq
Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
Stars: ✭ 530 (+356.9%)
Mutual labels:  spark
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+114.66%)
Mutual labels:  spark
Pyspark Examples
Code examples on Apache Spark using python
Stars: ✭ 58 (-50%)
Mutual labels:  spark
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+112.93%)
Mutual labels:  spark
Cdap
An open source framework for building data analytic applications.
Stars: ✭ 509 (+338.79%)
Mutual labels:  spark
Neo4j Spark Connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+111.21%)
Mutual labels:  spark
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-23.28%)
Mutual labels:  spark
Big Data Rosetta Code
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (+118.97%)
Mutual labels:  spark
Interview Questions Collection
按知识领域整理面试题,包括C++、Java、Hadoop、机器学习等
Stars: ✭ 21 (-81.9%)
Mutual labels:  spark
spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.0.0
Stars: ✭ 23 (-80.17%)
Mutual labels:  spark
laravel-spark-camera
Profile Photo Camera support for Laravel Spark
Stars: ✭ 30 (-74.14%)
Mutual labels:  spark
Cleanframes
type-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-35.34%)
Mutual labels:  spark
Flint
A Time Series Library for Apache Spark
Stars: ✭ 878 (+656.9%)
Mutual labels:  spark
301-360 of 458 similar projects