BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Stars: ✭ 133 (+478.26%)

Mutual labels: jupyter-notebook, pyspark

Scalable Data Science Platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

Stars: ✭ 158 (+586.96%)

Mutual labels: jupyter-notebook, spark

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+600%)

Mutual labels: jupyter-notebook, spark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+117.39%)

Mutual labels: spark, pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (+39.13%)

Mutual labels: spark, pyspark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+382.61%)

Mutual labels: spark, pyspark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+12504.35%)

Mutual labels: spark, pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+10847.83%)

Mutual labels: spark, pyspark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+621.74%)

Mutual labels: spark, pyspark

Pyspark Examples

Code examples on Apache Spark using python

Stars: ✭ 58 (+152.17%)

Mutual labels: jupyter-notebook, spark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+10000%)

Mutual labels: spark, pyspark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+1026.09%)

Mutual labels: jupyter-notebook, spark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (+8.7%)

Mutual labels: spark, pyspark

Helk

The Hunting ELK

Stars: ✭ 3,097 (+13365.22%)

Mutual labels: jupyter-notebook, spark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1665.22%)

Mutual labels: spark, pyspark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (+295.65%)

Mutual labels: jupyter-notebook, pyspark

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (+286.96%)

Mutual labels: jupyter-notebook, spark

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (+282.61%)

Mutual labels: jupyter-notebook, spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+24491.3%)

Mutual labels: jupyter-notebook, spark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+573.91%)

Mutual labels: spark, pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+2652.17%)

Mutual labels: spark, pyspark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+2973.91%)

Mutual labels: jupyter-notebook, spark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (+13.04%)

Mutual labels: spark, pyspark

Installations mac ubuntu windows

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Stars: ✭ 231 (+904.35%)

Mutual labels: jupyter-notebook, spark

incubator-linkis

Stars: ✭ 2,459 (+10591.3%)

Mutual labels: spark, pyspark

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+886.96%)

Mutual labels: jupyter-notebook, spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+8.7%)

Mutual labels: spark, pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (+47.83%)

Mutual labels: spark, pyspark

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+1217.39%)

Mutual labels: jupyter-notebook, spark

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+2239.13%)

Mutual labels: jupyter-notebook, spark

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+1695.65%)

Mutual labels: jupyter-notebook, spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+2926.09%)

Mutual labels: spark, pyspark

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (+1691.3%)

Mutual labels: jupyter-notebook, spark

Eat pyspark in 10 days

pyspark🍒🥭 is delicious，just eat it!😋😋

Stars: ✭ 116 (+404.35%)

Mutual labels: spark, pyspark

Cc Pyspark

Process Common Crawl data with Python and Spark

Stars: ✭ 147 (+539.13%)

Mutual labels: spark, pyspark

Yandex Big Data Engineering

Stars: ✭ 17 (-26.09%)