All Projects → TianLangStudio → DataXServer

TianLangStudio / DataXServer

Licence: Apache-2.0 license
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能

Programming Languages

scala
5932 projects
java
68154 projects - #9 most used programming language
Thrift
134 projects

Projects that are alternatives of or similar to DataXServer

Armeria
Your go-to microservice framework for any situation, from the creator of Netty et al. You can build any type of microservice leveraging your favorite technologies, including gRPC, Thrift, Kotlin, Retrofit, Reactive Streams, Spring Boot and Dropwizard.
Stars: ✭ 3,392 (+2509.23%)
Mutual labels:  thrift, http-server, thrift-server
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+179.23%)
Mutual labels:  yarn, thrift
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+373.08%)
Mutual labels:  etl, datax
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-83.85%)
Mutual labels:  etl, datax
arcanist-linters
A collection of custom Arcanist linters
Stars: ✭ 64 (-50.77%)
Mutual labels:  thrift
UmaSupporter.WebClient
🏃🏽‍♀️ 우마무스메 육성 도우미 '우마서포터'의 프론트엔드 애플리케이션입니다.
Stars: ✭ 14 (-89.23%)
Mutual labels:  yarn
malloy
A C++ library providing embeddable server & client components for both HTTP and WebSocket.
Stars: ✭ 29 (-77.69%)
Mutual labels:  http-server
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-83.85%)
Mutual labels:  etl
relay-starter-kit
💥 Monorepo template (seed project) pre-configured with GraphQL API, PostgreSQL, React, Relay, Material UI.
Stars: ✭ 3,513 (+2602.31%)
Mutual labels:  yarn
vite-vue-admin
🎉🎉使用Vite + Vue3 + TypeScript + Element-plus + Mock开发的后台管理系统🎉🎉
Stars: ✭ 97 (-25.38%)
Mutual labels:  yarn
ewallet-rest-api
E-Wallet Rest Api Example. Using Node.js, Express and MongoDB.
Stars: ✭ 89 (-31.54%)
Mutual labels:  yarn
docker-symfony
Docker Symfony (PHP-FPM - NGINX - MySQL - MailHog - Redis - RabbitMQ)
Stars: ✭ 32 (-75.38%)
Mutual labels:  yarn
rivery cli
Rivery CLI
Stars: ✭ 16 (-87.69%)
Mutual labels:  etl
maxwell-sink
consume maxwell generated message from kafka,export it to another mysql.
Stars: ✭ 16 (-87.69%)
Mutual labels:  etl
kafka-connect-datagen
A Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (-79.23%)
Mutual labels:  etl
wp-graphql
WordPress REST API exposed via GraphQL
Stars: ✭ 59 (-54.62%)
Mutual labels:  yarn
coconat
🍥 StarterKit Builder for rocket-speed App creation on 🚀 React 17 + 📙 Redux 4 + 🚠 Router 5 + 📪 Webpack 5 + 🎳 Babel 7 + 📜 TypeScript 4 + 🚔 Linters 23 + 🔥 HMR 3
Stars: ✭ 95 (-26.92%)
Mutual labels:  yarn
foxy
Session-based Beast/Asio wrapper requiring C++14
Stars: ✭ 61 (-53.08%)
Mutual labels:  http-server
dotenv-load
Load environment variables from .env, .env.local, .env.production, etc. when running a npm or yarn command.
Stars: ✭ 27 (-79.23%)
Mutual labels:  yarn
cv
✏️✏️Java软件工程师简历
Stars: ✭ 47 (-63.85%)
Mutual labels:  yarn

DataX Server

DataX 提供远程调用(Thrift Server, Http Server)分布式运行(DataX On YARN)功能

Feature

    1. Thrift Server
    1. DataX on Yarn
    1. Http Server
    1. 单机多线程方式运行
    1. 单机多进程方式运行
    1. 分布式运行(On Yarn)
    1. 混合模式运行(Yarn+多进程模式运行)
    1. 自动伸缩

TODO

  • 1.Http Server
  • 2.代码重构
  • 3.按照功能类型拆分到多个子项目中 重新组织包名 方便后续新增功能
  • 4.完善文档示例

Deploy

  下载发布包DataXServer-0.0.1.tar.gz 并解压 进入 0.0.1 目录

  启动Thrift Server

./bin/startThriftServer.sh     

使用NodeJS提交测试任务到Thrift Server

  cd example/nodejs    
  node submitStream2Stream.js 

 

Develop

下载程序源码

 项目依赖阿里 DataX

git clone https://github.com/alibaba/DataX.git 
cd DataX    
mvn install

git clone https://github.com/TianLangStudio/DataXServer.git  
cd DataXServer  
mvn clean compile install -DskipTests

单机多线程模式运行http server (已部署好datax 且能正常运行job/test_job.json)

  • 配置DataX安装目录

修改pom.xml中的datax-home配置项为部署datax的地址

 <datax-home>/data/test/datax</datax-home>
  • 启动http server
 cd httpserver
 mvn scala:run -Dlauncher=httpserver -DskipTests
  • 提交任务 获取任务ID
curl -XPOST -d "@测试文件路径" 127.0.0.1:9808/dataxserver/task

tianlang@tianlang:job$ curl -XPOST -d "@job/test_job.json" 127.0.0.1:9808/dataxserver/task
0 (任务ID)

  • 获取任务执行状态结果耗时
curl  127.0.0.1:9808/dataxserver/task/status/0
curl  127.0.0.1:9808/dataxserver/task/0
curl  127.0.0.1:9808/dataxserver/task/cost/0

运行成功日志

单机多进程模式运行

  • 配置DataX安装目录
    同多线程模式
  • 启动server
  cd hamal-yarn
  mvn scala:run -Dlauncher=httpserver-mp -DskipTests
  • 提交运行任务同多线程模式

多机多进程模式运行(On Yarn)

  • 配置DataX 安装目录 修改hamal-yarn/src/main/resources/master.conf 里的datax.home配置项的值为 DataX安装目录
  • 打包
cd hamal-yarn
mvn clean package -DskipTests
  • 上传jar包到hdfs 将hamal-yarn/target/hamal-yarn--with-dependencies.jar上传到hdfs /app/hamal/master.jar 将hamal-yarn/target/hamal-yarn--package.zip上传到hdfs /app/hamal/executor.zip
hdfs dfs -put hamal-yarn-*-with-dependencies.jar /app/hamal/master.jar
hdfs dfs -put hamal-yarn-*-package.zip /app/hamal/executor.zip
  • 运行Master
yarn jar hamal-yarn-*_with-dependencies.jar  org.tianlangstudio.data.hamal.yarn.Client /app/hamal/master.jar

可以通过yarn ui看到运行的Master

  • 提交运行任务同多线程模式

提交任务后可看到, container数量增加, master运行日志中可看到当前executor数量 ,在master.conf文件中可以配置最大executor数量,可以将local.num.max设置为不为0的值即代表可以在本机启动executor. executor空闲一段时间后自动销毁。

On Yarn Hamal Master On Yarn Log

如用在生产环境建议修改ID生成策略,提交任务存储方式等  

QA

  • 编译失败

检查是否是依赖包下载失败,可以将依赖包安装到本机
可以尝试注释掉pom文件中recompileMode配置

  • 是否集群中每台机器都要安装datax

不需要每台机器都安装datax,可以把datax打包到excutor的部署zip包中,放到hdfs上

  • Excutor和Master是通过http还是thrift通信?

Excutor和Master的通信是基于akka实现的

  • Excutor的个数会随着任务个数增减?

是的,但不会大于配置的最大Excutor个数

Document

TODO

问题交流可加群

QQ群:579896894

KeepLearning QQ

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].