Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+500%)

Mutual labels: spark, pyspark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+722.41%)

Mutual labels: spark, pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-72.41%)

Mutual labels: spark, pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1053.45%)

Mutual labels: spark, pyspark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+42.24%)

Mutual labels: spark, pyspark

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (-6.9%)

Mutual labels: spark, pyspark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+36.21%)

Mutual labels: spark, pyspark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+26.72%)

Mutual labels: spark, pyspark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+1902.59%)

Mutual labels: spark, pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+2070.69%)

Mutual labels: spark, pyspark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+72.41%)

Mutual labels: spark, pyspark

incubator-linkis

Stars: ✭ 2,459 (+2019.83%)

Mutual labels: spark, pyspark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-77.59%)

Mutual labels: spark, pyspark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-80.17%)

Mutual labels: spark, pyspark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-89.66%)

Mutual labels: spark, pyspark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-56.9%)

Mutual labels: spark, pyspark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-45.69%)

Mutual labels: spark, pyspark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-78.45%)

Mutual labels: spark, pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+250%)

Mutual labels: spark, pyspark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+29.31%)

Mutual labels: spark, pyspark

Relation extraction

Relation Extraction using Deep learning(CNN)

Stars: ✭ 96 (-17.24%)

Mutual labels: spark, pyspark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+33.62%)

Mutual labels: spark, pyspark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-78.45%)

Mutual labels: spark, pyspark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+750%)

Mutual labels: spark, pyspark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-44.83%)

Mutual labels: spark, pyspark

Almond

A Scala kernel for Jupyter

Stars: ✭ 1,354 (+1067.24%)

Mutual labels: spark

Java learning practice

java 进阶之路：面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&Flink 等

Stars: ✭ 110 (-5.17%)

Mutual labels: spark

Pyspark Stubs

Apache (Py)Spark type annotations (stub files).

Stars: ✭ 98 (-15.52%)

Mutual labels: pyspark

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-16.38%)

Mutual labels: spark

Spring Shiro Spark

Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试

Stars: ✭ 114 (-1.72%)

Mutual labels: spark

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-5.17%)

Mutual labels: spark

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (-16.38%)

Mutual labels: spark

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-6.03%)

Mutual labels: spark

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-20.69%)

Mutual labels: spark

Elassandra

Elassandra = Elasticsearch + Apache Cassandra

Stars: ✭ 1,610 (+1287.93%)

Mutual labels: spark

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (-2.59%)

Mutual labels: spark

Distributed Dataset

A distributed data processing framework in Haskell.

Stars: ✭ 108 (-6.9%)

Mutual labels: spark

Spark Summit 2017 Sanfrancisco

spark summit 2017 SanFrancisco

Stars: ✭ 93 (-19.83%)

Mutual labels: spark

Big Data

🔧 Use dplyr to analyze Big Data 🐘

Stars: ✭ 93 (-19.83%)

Mutual labels: spark

Spark On Kubernetes Helm

Spark on Kubernetes infrastructure Helm charts repo

Stars: ✭ 92 (-20.69%)

Mutual labels: spark

Xlearning Xdml

extremely distributed machine learning

Stars: ✭ 113 (-2.59%)

Mutual labels: spark

Pyspark Tutorial

PySpark Code for Hands-on Learners

Stars: ✭ 91 (-21.55%)

Mutual labels: pyspark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (-21.55%)

Mutual labels: pyspark

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (-23.28%)

Mutual labels: spark

Flink Learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+9708.62%)

Mutual labels: spark

Ammonite Spark

Run spark calculations from Ammonite

Stars: ✭ 88 (-24.14%)

Mutual labels: spark

Teddy

Spark Streaming监控平台，支持任务部署与告警、自启动

Stars: ✭ 120 (+3.45%)

Mutual labels: spark

Ibis

A pandas-like deferred expression system, with first-class SQL support