简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-79.37%)

Mutual labels: spark

Python Master Courses

人生苦短我用Python

Stars: ✭ 61 (-3.17%)

Mutual labels: spark

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (+44.44%)

Mutual labels: spark

spark-word2vec

A parallel implementation of word2vec based on Spark

Stars: ✭ 24 (-61.9%)

Mutual labels: spark

docker-spark

Apache Spark docker container image (Standalone mode)

Stars: ✭ 34 (-46.03%)

Mutual labels: spark

kubernetes-iperf3

Simple wrapper around iperf3 to measure network bandwidth from all nodes of a Kubernetes cluster

Stars: ✭ 80 (+26.98%)

Mutual labels: benchmark

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (-42.86%)

Mutual labels: spark

shamash

Autoscaling for Google Cloud Dataproc

Stars: ✭ 31 (-50.79%)

Mutual labels: spark

spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

Stars: ✭ 20 (-68.25%)

Mutual labels: spark

KLUE

📖 Korean NLU Benchmark

Stars: ✭ 420 (+566.67%)

Mutual labels: benchmark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+3803.17%)

Mutual labels: spark

Spark-PMoF

Spark Shuffle Optimization with RDMA+AEP

Stars: ✭ 28 (-55.56%)

Mutual labels: spark

View All Similar Projects ➔

tpch-spark

TPC-H queries implemented in Spark using the DataFrames API. Tested under Spark 2.4.0

Savvas Savvides

[email protected]

Generating tables

Under the dbgen directory do:

make

This should generate an executable called dbgen

./dbgen -h

gives you the various options for generating the tables. The simplest case is running:

./dbgen

which generates tables with extension .tbl with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the -s option:

./dbgen -s 10

will generate roughly 10GB of input data.

You can then either upload your data to hdfs or read them locally.

Running

First compile using:

sbt package

Make sure you set the INPUT_DIR and OUTPUT_DIR in TpchQuery class before compiling to point to the location the of the input data and where the output should be saved.

You can then run a query using:

spark-submit --class "main.scala.TpchQuery" --master MASTER target/scala-2.11/spark-tpc-h-queries_2.11-1.0.jar ##

where ## is the number of the query to run e.g 1, 2, ..., 22 and MASTER specifies the spark-mode e.g local, yarn, standalone etc...

Other Implementations

Data generator (http://www.tpc.org/tpch/)
TPC-H for Hive (https://issues.apache.org/jira/browse/hive-600)
TPC-H for PIG (https://github.com/ssavvides/tpch-pig)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ssavvides / tpch-spark

Programming Languages

Labels

Projects that are alternatives of or similar to tpch-spark

tpch-spark

Generating tables

Running

Other Implementations