Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+2461.46%)

Mutual labels: spark, pyspark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-73.96%)

Mutual labels: spark, pyspark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-85.42%)

Mutual labels: spark, pyspark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+15.63%)

Mutual labels: spark, pyspark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+71.88%)

Mutual labels: spark, pyspark

Linkis

Stars: ✭ 2,323 (+2319.79%)

Mutual labels: spark, pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+322.92%)

Mutual labels: spark, pyspark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-87.5%)

Mutual labels: spark, pyspark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+53.13%)

Mutual labels: spark, pyspark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-72.92%)

Mutual labels: spark, pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+2522.92%)

Mutual labels: spark, pyspark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+108.33%)

Mutual labels: spark, pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-66.67%)

Mutual labels: spark, pyspark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+125%)

Mutual labels: spark, pyspark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+625%)

Mutual labels: spark, pyspark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-76.04%)

Mutual labels: spark, pyspark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+64.58%)

Mutual labels: spark, pyspark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+927.08%)

Mutual labels: spark, pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+559.38%)

Mutual labels: spark, pyspark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-33.33%)

Mutual labels: spark, pyspark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+56.25%)

Mutual labels: spark, pyspark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+72.92%)

Mutual labels: spark, pyspark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+61.46%)

Mutual labels: spark, pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-64.58%)

Mutual labels: spark, pyspark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+893.75%)

Mutual labels: spark, pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1293.75%)

Mutual labels: spark, pyspark

Lpa Detector

Optimize and improve the Label propagation algorithm

Stars: ✭ 75 (-21.87%)

Mutual labels: spark

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (-12.5%)

Mutual labels: spark

Spark Twitter Stream Example

"Sentiment analysis" on a live Twitter feed with Apache Spark and Apache Bahir

Stars: ✭ 73 (-23.96%)

Mutual labels: spark

Labs

Research on distributed system

Stars: ✭ 73 (-23.96%)

Mutual labels: spark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (-5.21%)

Mutual labels: pyspark

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-13.54%)

Mutual labels: spark

Kamu Cli

Next generation tool for decentralized exchange and transformation of semi-structured data

Stars: ✭ 69 (-28.12%)

Mutual labels: spark

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (-25%)

Mutual labels: spark

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (-14.58%)

Mutual labels: spark

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-26.04%)

Mutual labels: spark

Usersessionbehaviorofflineanalysis

四川大学拓思爱诺用户session行为数据离线分析项目

Stars: ✭ 69 (-28.12%)

Mutual labels: spark

Spark Summit 2017 Sanfrancisco

spark summit 2017 SanFrancisco

Stars: ✭ 93 (-3.12%)

Mutual labels: spark

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (-7.29%)

Mutual labels: spark

Spark Dependencies

Spark job for dependency links

Stars: ✭ 82 (-14.58%)

Mutual labels: spark

Fast Mrmr

An improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).

Stars: ✭ 67 (-30.21%)

Mutual labels: spark

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (-30.21%)

Mutual labels: spark

Mleap

MLeap: Deploy ML Pipelines to Production

Stars: ✭ 1,232 (+1183.33%)

Mutual labels: spark

Thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Stars: ✭ 10,526 (+10864.58%)

Mutual labels: spark

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-32.29%)

Mutual labels: spark

Ammonite Spark

Run spark calculations from Ammonite

Stars: ✭ 88 (-8.33%)

Mutual labels: spark

Lehar

Visualize data using relative ordering

Stars: ✭ 81 (-15.62%)

Mutual labels: spark

Spark Bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Stars: ✭ 65 (-32.29%)

Mutual labels: spark

Spark Gbtlr

Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

Stars: ✭ 81 (-15.62%)

Mutual labels: spark

Pyspark Twitter Stream Mining

Real-time Machine Learning with Apache Spark on Twitter Public Stream

Stars: ✭ 64 (-33.33%)

Mutual labels: spark

Tre

[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations

Stars: ✭ 95 (-1.04%)

Mutual labels: relation-extraction

1-60 of 554 similar projects

›

next*5