Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+373.47%)

Mutual labels: spark, pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+810.2%)

Mutual labels: spark, pyspark

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (-26.53%)

Mutual labels: spark, pyspark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-82.99%)

Mutual labels: spark, pyspark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-82.31%)

Mutual labels: spark, pyspark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+46.94%)

Mutual labels: spark, pyspark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-57.14%)

Mutual labels: spark, pyspark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+570.75%)

Mutual labels: spark, pyspark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+1480.27%)

Mutual labels: spark, pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+176.19%)

Mutual labels: spark, pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-78.23%)

Mutual labels: spark, pyspark

incubator-linkis

Stars: ✭ 2,459 (+1572.79%)

Mutual labels: spark, pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+330.61%)

Mutual labels: spark, pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-76.87%)

Mutual labels: spark, pyspark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-56.46%)

Mutual labels: spark, pyspark

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (-40.82%)

Mutual labels: spark, pyspark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-84.35%)

Mutual labels: spark, pyspark

Hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

Stars: ✭ 108 (-26.53%)

Mutual labels: spark, pyspark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+1872.11%)

Mutual labels: spark, pyspark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+7.48%)

Mutual labels: spark, pyspark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+12.93%)

Mutual labels: spark, pyspark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-65.99%)

Mutual labels: spark, pyspark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+36.05%)

Mutual labels: spark, pyspark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-90.48%)

Mutual labels: spark, pyspark

Relation extraction

Relation Extraction using Deep learning(CNN)

Stars: ✭ 96 (-34.69%)

Mutual labels: spark, pyspark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+0%)

Mutual labels: spark, pyspark

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-14.29%)

Mutual labels: spark

Horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Stars: ✭ 11,943 (+8024.49%)

Mutual labels: spark

Spark Bigquery Connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Stars: ✭ 126 (-14.29%)

Mutual labels: spark

Scala Samples

There are pieces of scala code that explain Scala syntax and related things - like what you can do with all this

Stars: ✭ 125 (-14.97%)

Mutual labels: spark

Data science blogs

A repository to keep track of all the code that I end up writing for my blog posts.

Stars: ✭ 139 (-5.44%)

Mutual labels: spark

Aliyun Emapreduce Datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Stars: ✭ 132 (-10.2%)

Mutual labels: spark

Spark Infotheoretic Feature Selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Stars: ✭ 123 (-16.33%)

Mutual labels: spark

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (-17.01%)

Mutual labels: spark

Repo 2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Stars: ✭ 133 (-9.52%)

Mutual labels: pyspark

Deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Stars: ✭ 2,020 (+1274.15%)

Mutual labels: spark

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识

Stars: ✭ 12,136 (+8155.78%)

Mutual labels: spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs