简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-7.14%)

Mutual labels: spark, apache-spark

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (+1571.43%)

Mutual labels: spark, apache-spark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+1657.14%)

Mutual labels: spark, analytics

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+178.57%)

Mutual labels: apache-spark, pyspark

isarn-sketches-spark

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

Stars: ✭ 28 (+100%)

Mutual labels: apache-spark, pyspark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+257.14%)

Mutual labels: spark, pyspark

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (+178.57%)

Mutual labels: spark, apache-spark

Spark-for-data-engineers

Apache Spark for data engineers

Stars: ✭ 22 (+57.14%)

Mutual labels: apache-spark, pyspark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (+78.57%)

Mutual labels: spark, pyspark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+17464.29%)

Mutual labels: spark, pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (+142.86%)

Mutual labels: spark, pyspark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-14.29%)

Mutual labels: spark, pyspark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+6535.71%)

Mutual labels: spark, apache-spark

Spark Gotchas

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

Stars: ✭ 308 (+2100%)

Mutual labels: apache-spark, pyspark

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+5435.71%)

Mutual labels: spark, apache-spark

Pyspark Boilerplate

A boilerplate for writing PySpark Jobs

Stars: ✭ 318 (+2171.43%)

Mutual labels: apache-spark, pyspark

Spark Workshop

Apache Spark™ and Scala Workshops

Stars: ✭ 224 (+1500%)

Mutual labels: spark, apache-spark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+1442.86%)

Mutual labels: spark, pyspark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+1664.29%)

Mutual labels: spark, apache-spark

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+1435.71%)

Mutual labels: spark, apache-spark

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (+292.86%)

Mutual labels: apache-spark, pyspark

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (+500%)

Mutual labels: apache-spark, pyspark

jupyterlab-sparkmonitor

JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook

Stars: ✭ 78 (+457.14%)

Mutual labels: apache-spark, pyspark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Stars: ✭ 307 (+2092.86%)

Mutual labels: spark, apache-spark

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (+2271.43%)

Mutual labels: spark, apache-spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (+64.29%)

Mutual labels: spark, pyspark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+578.57%)

Mutual labels: spark, analytics

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (+85.71%)

Mutual labels: spark, pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (+128.57%)

Mutual labels: spark, pyspark

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+721.43%)

Mutual labels: apache-spark, pyspark

pyspark-asyncactions

Asynchronous actions for PySpark

Stars: ✭ 30 (+114.29%)

Mutual labels: apache-spark, pyspark

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (+257.14%)

Mutual labels: apache-spark, pyspark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+1328.57%)

Mutual labels: spark, pyspark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+5564.29%)

Mutual labels: spark, apache-spark

Spark Notebook

Interactive and Reactive Data Science using Scala and Spark.

Stars: ✭ 3,081 (+21907.14%)

Mutual labels: spark, apache-spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+27778.57%)

Mutual labels: spark, analytics

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+1750%)

Mutual labels: spark, apache-spark

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (+2114.29%)

Mutual labels: spark, analytics

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+23600%)

Mutual labels: spark, apache-spark

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (+2492.86%)

Mutual labels: spark, analytics

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (+64.29%)

Mutual labels: spark, apache-spark

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+2757.14%)

Mutual labels: spark, analytics

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Stars: ✭ 20,147 (+143807.14%)

Mutual labels: spark, analytics

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+2800%)

Mutual labels: spark, pyspark

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (+2550%)

Mutual labels: spark, apache-spark

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+3564.29%)

Mutual labels: spark, analytics

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+4871.43%)

Mutual labels: spark, pyspark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+1085.71%)

Mutual labels: spark, pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+17885.71%)

Mutual labels: spark, pyspark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+78.57%)

Mutual labels: spark, pyspark

1-60 of 908 similar projects

›

next*5