flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+326.46%)

Mutual labels: spark, stream-processing

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-95.28%)

Mutual labels: spark, bigdata

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-96.55%)

Mutual labels: spark, mapreduce

Awesome Bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Stars: ✭ 10,478 (+292.73%)

Mutual labels: bigdata, stream-processing

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (-35.49%)

Mutual labels: spark, bigdata

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-96.36%)

Mutual labels: spark, stream-processing

Sparktutorial

Source code for James Lee's Aparch Spark with Java course

Stars: ✭ 105 (-96.06%)

Mutual labels: spark, bigdata

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-49.85%)

Mutual labels: spark, bigdata

Lambda Arch

Applying Lambda Architecture with Spark, Kafka, and Cassandra.

Stars: ✭ 111 (-95.84%)

Mutual labels: spark, bigdata

Hudi

Upserts, Deletes And Incremental Processing on Big Data.

Stars: ✭ 2,586 (-3.07%)

Mutual labels: bigdata, stream-processing

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-94.75%)

Mutual labels: spark, bigdata

Redisgears

Dynamic execution framework for your Redis data

Stars: ✭ 152 (-94.3%)

Mutual labels: stream-processing, mapreduce

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (-92.39%)

Mutual labels: spark, bigdata

View All Similar Projects ➔

DPark

DPark is a Python clone of Spark, MapReduce(R) alike computing framework supporting iterative computation.

Installation

## Due to the use of C extensions, some libraries need to be installed first.

$ sudo apt-get install libtool pkg-config build-essential autoconf automake
$ sudo apt-get install python-dev
$ sudo apt-get install libzmq-dev

## Then just pip install dpark (``sudo`` maybe needed if you encounter permission problem).

$ pip install dpark

Example

for word counting (wc.py):

from dpark import DparkContext
ctx = DparkContext()
file = ctx.textFile("/tmp/words.txt")
words = file.flatMap(lambda x:x.split()).map(lambda x:(x,1))
wc = words.reduceByKey(lambda x,y:x+y).collectAsMap()
print wc

This script can run locally or on a Mesos cluster without any modification, just using different command-line arguments:

$ python wc.py
$ python wc.py -m process
$ python wc.py -m host[:port]

See examples/ for more use cases.

Configuration

DPark can run with Mesos 0.9 or higher.

If a $MESOS_MASTER environment variable is set, you can use a shortcut and run DPark with Mesos just by typing

$ python wc.py -m mesos

$MESOS_MASTER can be any scheme of Mesos master, such as

$ export MESOS_MASTER=zk://zk1:2181,zk2:2181,zk3:2181/mesos_master

In order to speed up shuffling, you should deploy Nginx at port 5055 for accessing data in DPARK_WORK_DIR (default is /tmp/dpark), such as:

server {
        listen 5055;
        server_name localhost;
        root /tmp/dpark/;
}

UI

2 DAGs:

stage graph: stage is a running unit, contain a set of task, each run same ops for a split of rdd.
use api callsite graph

UI when running

Just open the url from log like start listening on Web UI http://server_01:40812 .

UI after running

before run, config LOGHUB & LOGHUB_PATH_FORMAT in dpark.conf, pre-create LOGHUB_DIR.
get log hubdir from log like logging/prof to LOGHUB_DIR/2018/09/27/16/b2e3349b-9858-4153-b491-80699c757485-8754, which in clude mesos framework id.
run dpark_web.py -p 9999 -l LOGHUB_DIR/2018/09/27/16/b2e3349b-9858-4153-b491-80699c757485-8728/, dpark_web.py is in tools/

UI examples for features

show sharing shuffle map output

rdd = DparkContext().makeRDD([(1,1)]).map(m).groupByKey()
rdd.map(m).collect()
rdd.map(m).collect()

combine nodes iff with same lineage, form a logic tree inside stage, then each node contain a PIPELINE of rdds.

rdd1 = get_rdd()
rdd2 = dc.union([get_rdd() for i in range(2)])
rdd3 = get_rdd().groupByKey()
dc.union([rdd1, rdd2, rdd3]).collect()

More docs (in Chinese)

https://dpark.readthedocs.io/zh_CN/latest/

https://github.com/jackfengji/test_pro/wiki

Mailing list: [email protected] (http://groups.google.com/group/dpark-users)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

douban / Dpark

Programming Languages

Labels

Projects that are alternatives of or similar to Dpark

DPark

Installation

Example

Configuration

UI

UI when running

UI after running

UI examples for features

More docs (in Chinese)