Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+3803.17%)

Mutual labels: spark, pyspark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-60.32%)

Mutual labels: spark, pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+544.44%)

Mutual labels: spark, pyspark

Spark Syntax

This is a repo documenting the best practices in PySpark.

Stars: ✭ 412 (+553.97%)

Mutual labels: jupyter-notebook, pyspark

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+555.56%)

Mutual labels: jupyter-notebook, spark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+242.86%)

Mutual labels: spark, pyspark

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (+39.68%)

Mutual labels: jupyter-notebook, spark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+4501.59%)

Mutual labels: spark, pyspark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+1004.76%)

Mutual labels: spark, pyspark

Pyspark Tutorial

PySpark Code for Hands-on Learners

Stars: ✭ 91 (+44.44%)

Mutual labels: jupyter-notebook, pyspark

Repo 2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Stars: ✭ 133 (+111.11%)

Mutual labels: jupyter-notebook, pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+3896.83%)

Mutual labels: spark, pyspark

Scalable Data Science Platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

Stars: ✭ 158 (+150.79%)

Mutual labels: jupyter-notebook, spark

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+753.97%)

Mutual labels: jupyter-notebook, spark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+1022.22%)

Mutual labels: jupyter-notebook, spark

Yandex Big Data Engineering

Stars: ✭ 17 (-73.02%)

Mutual labels: jupyter-notebook, spark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-20.63%)

Mutual labels: spark, pyspark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-58.73%)

Mutual labels: spark, pyspark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+163.49%)

Mutual labels: spark, pyspark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-77.78%)

Mutual labels: spark, pyspark

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

Stars: ✭ 14 (-77.78%)

Mutual labels: jupyter-notebook, spark

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+380.95%)

Mutual labels: jupyter-notebook, spark

Helk

The Hunting ELK

Stars: ✭ 3,097 (+4815.87%)

Mutual labels: jupyter-notebook, spark

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (+553.97%)

Mutual labels: jupyter-notebook, spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+311.11%)

Mutual labels: jupyter-notebook, spark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+904.76%)

Mutual labels: spark, pyspark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+8877.78%)

Mutual labels: jupyter-notebook, spark

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+1082.54%)

Mutual labels: jupyter-notebook, spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-60.32%)

Mutual labels: spark, pyspark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-80.95%)

Mutual labels: spark, pyspark

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-61.9%)

Mutual labels: jupyter-notebook, pyspark

Pyspark Examples

Code examples on Apache Spark using python