All Projects → FlinkML → flink-parameter-server

FlinkML / flink-parameter-server

Licence: Apache-2.0 License
Parameter Server implementation in Apache Flink

Programming Languages

scala
5932 projects
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to flink-parameter-server

Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Stars: ✭ 2,936 (+5656.86%)
Mutual labels:  flink, flink-ml
FlinkForward201709
Flink Forward 201709
Stars: ✭ 43 (-15.69%)
Mutual labels:  flink
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-70.59%)
Mutual labels:  online-learning
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-29.41%)
Mutual labels:  flink
Remote-Work-and-Study-Resources
Free services, tools, articles and other resources for remote workers and distance learners
Stars: ✭ 49 (-3.92%)
Mutual labels:  online-learning
python-libmf
No description or website provided.
Stars: ✭ 24 (-52.94%)
Mutual labels:  online-learning
piglet
A compiler for Pig Latin to Spark and Flink.
Stars: ✭ 23 (-54.9%)
Mutual labels:  flink
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-5.88%)
Mutual labels:  flink
data aggregation
This repository contains the code for the CVPR 2020 paper "Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving"
Stars: ✭ 26 (-49.02%)
Mutual labels:  online-learning
LarkMidTable
LarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
Stars: ✭ 873 (+1611.76%)
Mutual labels:  flink
2018-flink-forward-china
Flink Forward China 2018 第一届记录,视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming
Stars: ✭ 25 (-50.98%)
Mutual labels:  flink
FlinkTutorial
FlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容,使用Java开发,同时含有Scala部分核心代码。欢迎关注我的博客及github。
Stars: ✭ 46 (-9.8%)
Mutual labels:  flink
flink-k8s-operator
An example of building kubernetes operator (Flink) using Abstract operator's framework
Stars: ✭ 28 (-45.1%)
Mutual labels:  flink
EasyRec
A framework for large scale recommendation algorithms.
Stars: ✭ 599 (+1074.51%)
Mutual labels:  online-learning
np-flink
flink详细学习实践
Stars: ✭ 26 (-49.02%)
Mutual labels:  flink
review-notes
团队分享学习、复盘笔记资料共享。Java、Scala、Flink...
Stars: ✭ 27 (-47.06%)
Mutual labels:  flink
df data service
DataFibers Data Service
Stars: ✭ 31 (-39.22%)
Mutual labels:  flink
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-60.78%)
Mutual labels:  flink
oll-python
Online machine learning algorithms (based on OLL C++ library)
Stars: ✭ 23 (-54.9%)
Mutual labels:  online-learning
mriya
Real-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.
Stars: ✭ 65 (+27.45%)
Mutual labels:  flink

Flink Parameter Server

A Parameter Server implementation based on the Streaming API of Apache Flink.

Parameter Server is an abstraction for model-parallel machine learning (see the work of Li et al.). Our implementation could be used with the Streaming API: it can take a DataStream of data-points as input, and produce a DataStream of model updates. This way, we can implement both online and offline ML algorithms. Currently only asynchronous training is supported.

Build

Use SBT. It can be published to the local SBT cache

sbt publish-local

and then added to a project as a dependency

libraryDependencies += "hu.sztaki.ilab" %% "flink-ps" % "0.1.0"

API

We can use the Parameter Server in the following way:

Parameter Server architecture

Basically, we can access the Parameter Server by defining a WorkerLogic, which can pull or push parameters. We provide input data to the worker via a Flink DataStream.

We need to implement the WorkerLogic trait

trait WorkerLogic[T, Id, P, WOut] extends Serializable {
  def onRecv(data: T, ps: ParameterServerClient[Id, P, WOut]): Unit
  def onPullRecv(paramId: Id, paramValue: P, ps: ParameterServerClient[Id, P, WOut]): Unit
}

where we can handle incoming data (onRecv), pull parameters from the Parameter Server, handle the answers to the pulls (onPullRecv), and push parameters to the Parameter Server or output results. We can use the ParameterServerClient:

trait ParameterServerClient[Id, P, WOut] extends Serializable {
  def pull(id: Id): Unit
  def push(id: Id, deltaUpdate: P): Unit
  def output(out: WOut): Unit
}

When we defined our worker logic we can wire it into a Flink job with the transform method of FlinkParameterServer.

def transform[T, Id, P, WOut](
  trainingData: DataStream[T],
  workerLogic: WorkerLogic[T, Id, P, WOut],
  paramInit: => Id => P,
  paramUpdate: => (P, P) => P,
  workerParallelism: Int,
  psParallelism: Int,
  iterationWaitTime: Long): DataStream[Either[WOut, (Id, P)]]

Besides the trainingData stream and the workerLogic, we need to define how the Parameter Server should initialize a parameter based on the parameter id (paramInit), and how to update a parameter based on a received push (paramUpdate). We must also define how many parallel instances of workers and parameter servers we should use (workerParallelism and psParallelism), and the iterationWaitTime (see Limitations).

There are also other options to define a DataStream transformation with a Parameter Server which let us specialize the process in more detail. See the different methods of FlinkParameterServer.

Limitations

We implement the two-way communication of workers and the parameter server with Flink Streaming iterations, which is not yet production-ready. The main issues are

  • Sometimes deadlocks due to cyclic backpressure. A workaround could be to limiting the amount of unanswered pulls per worker (e.g. by using WorkerLogic.addPullLimiter), or manually limiting the input rate of data on the input stream. In any case, deadlock would still be possible.
  • Termination is not defined for finite input. As a workaround, we can set the iterationwaitTime for the milliseconds to wait before shutting down if there's no messages sent along the iteration (see the Flink (Java Docs)https://ci.apache.org/projects/flink/flink-docs-master/api/java/)).
  • No fault tolerance.

All these issues are being addressed in FLIP-15 and FLIP-16 and soon to be fixed. Until then, we need to use workarounds.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].