Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+19258.33%)

Mutual labels: spark, pyspark

Relation extraction

Relation Extraction using Deep learning(CNN)

Stars: ✭ 96 (+700%)

Mutual labels: spark, pyspark

Hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

Stars: ✭ 108 (+800%)

Mutual labels: spark, pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+11050%)

Mutual labels: spark, pyspark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+316.67%)

Mutual labels: spark, pyspark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+8116.67%)

Mutual labels: spark, pyspark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (+433.33%)

Mutual labels: spark, pyspark

Eat pyspark in 10 days

pyspark🍒🥭 is delicious，just eat it!😋😋

Stars: ✭ 116 (+866.67%)

Mutual labels: spark, pyspark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+1125%)

Mutual labels: spark, pyspark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+1275%)

Mutual labels: spark, pyspark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+1191.67%)

Mutual labels: spark, pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (+166.67%)

Mutual labels: spark, pyspark

incubator-linkis

Stars: ✭ 2,459 (+20391.67%)

Mutual labels: spark, pyspark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+1216.67%)

Mutual labels: spark, pyspark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+108.33%)

Mutual labels: spark, pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+20883.33%)

Mutual labels: spark, pyspark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+5700%)

Mutual labels: spark, pyspark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (+16.67%)

Mutual labels: spark, pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+5175%)

Mutual labels: spark, pyspark

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (+625%)

Mutual labels: spark, pyspark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+24058.33%)

Mutual labels: spark, pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (+183.33%)

Mutual labels: spark, pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+3283.33%)

Mutual labels: spark, pyspark

Hail

Scalable genomic data analysis.

Stars: ✭ 706 (+5783.33%)

Mutual labels: spark

Yandex Big Data Engineering

Stars: ✭ 17 (+41.67%)

Mutual labels: spark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+5791.67%)

Mutual labels: spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+7641.67%)

Mutual labels: spark

Parquet Generator

Parquet file generator

Stars: ✭ 16 (+33.33%)

Mutual labels: spark

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+5275%)

Mutual labels: spark

Big Data Scala Spark

Coursera's big data course with Scala and Spark

Stars: ✭ 16 (+33.33%)

Mutual labels: spark

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+5125%)

Mutual labels: spark

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+46483.33%)

Mutual labels: spark

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+7041.67%)

Mutual labels: spark

Chronicler

Scala toolchain for InfluxDB

Stars: ✭ 24 (+100%)

Mutual labels: spark

Sparkling Water

Sparkling Water provides H2O functionality inside Spark cluster

Stars: ✭ 887 (+7291.67%)

Mutual labels: spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+47033.33%)

Mutual labels: spark

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+4991.67%)

Mutual labels: spark

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+6783.33%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+45841.67%)

Mutual labels: spark

Mongo Spark

The MongoDB Spark Connector

Stars: ✭ 588 (+4800%)

Mutual labels: spark

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (+100%)

Mutual labels: pyspark

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+6708.33%)

Mutual labels: spark

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+44725%)

Mutual labels: spark

Sparklearning

Learning Apache spark,including code and data .Most part can run local.

Stars: ✭ 558 (+4550%)

Mutual labels: spark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+6508.33%)

Mutual labels: spark

Spark Daria

Essential Spark extensions and helper methods ✨😲

Stars: ✭ 553 (+4508.33%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-8.33%)

Mutual labels: spark

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+6958.33%)

Mutual labels: spark

1-60 of 458 similar projects

›

next*5