Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+2409.18%)

Mutual labels: pyspark

Apache Spark Internals

The Internals of Apache Spark

Stars: ✭ 1,045 (+966.33%)

Mutual labels: apache-spark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-67.35%)

Mutual labels: pyspark

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+690.82%)

Mutual labels: apache-spark

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (-60.2%)

Mutual labels: apache-spark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (-7.14%)

Mutual labels: pyspark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-48.98%)

Mutual labels: pyspark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+610.2%)

Mutual labels: pyspark

Azure-Databricks-NYC-Taxi-Workshop

An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset

Stars: ✭ 71 (-27.55%)

Mutual labels: pyspark

Spark Scala Maven Example

Example Maven configuration for a Spark, Scala project

Stars: ✭ 45 (-54.08%)

Mutual labels: apache-spark

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (+238.78%)

Mutual labels: apache-spark

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.

Stars: ✭ 29 (-70.41%)

Mutual labels: apache-spark

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+525.51%)

Mutual labels: apache-spark

dlsa

Distributed least squares approximation (dlsa) implemented with Apache Spark

Stars: ✭ 25 (-74.49%)

Mutual labels: pyspark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-35.71%)

Mutual labels: pyspark

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-83.67%)

Mutual labels: pyspark

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (+465.31%)

Mutual labels: apache-spark

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (+383.67%)

Mutual labels: pyspark

Spark Examples

Spark examples

Stars: ✭ 41 (-58.16%)

Mutual labels: apache-spark

spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.

Stars: ✭ 129 (+31.63%)

Mutual labels: apache-spark

Sparkle

Haskell on Apache Spark.

Stars: ✭ 419 (+327.55%)

Mutual labels: apache-spark

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-82.65%)

Mutual labels: pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1265.31%)

Mutual labels: pyspark

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (-41.84%)

Mutual labels: apache-spark

Spark Syntax

This is a repo documenting the best practices in PySpark.

Stars: ✭ 412 (+320.41%)

Mutual labels: pyspark

pulsar-adapters

Apache Pulsar Adapters

Stars: ✭ 18 (-81.63%)

Mutual labels: apache-spark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+906.12%)

Mutual labels: pyspark

Awesome Kafka

A list about Apache Kafka

Stars: ✭ 397 (+305.1%)

Mutual labels: apache-spark

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-70.41%)

Mutual labels: pyspark

Awesome Pulsar

A curated list of Pulsar tools, integrations and resources.

Stars: ✭ 57 (-41.84%)

Mutual labels: apache-spark

Sparkmeasure

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

Stars: ✭ 368 (+275.51%)

Mutual labels: apache-spark

hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark

Stars: ✭ 31 (-68.37%)

Mutual labels: apache-spark

parquet-dotnet

🐬 Apache Parquet for modern .Net

Stars: ✭ 199 (+103.06%)

Mutual labels: apache-spark

Cloud Based Sql Engine Using Spark

Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.

Stars: ✭ 30 (-69.39%)

Mutual labels: apache-spark

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-75.51%)

Mutual labels: apache-spark

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-43.88%)

Mutual labels: apache-spark

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

Stars: ✭ 30 (-69.39%)

Mutual labels: apache-spark

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+3285.71%)

Mutual labels: apache-spark

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (+17.35%)

Mutual labels: pyspark

Cuesheet

A framework for writing Spark 2.x applications in a pretty way

Stars: ✭ 86 (-12.24%)

Mutual labels: apache-spark

pyspark-k8s-boilerplate

Boilerplate for PySpark on Cloud Kubernetes

Stars: ✭ 24 (-75.51%)

Mutual labels: pyspark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]