All Projects → zhaoyachao → zdh_server

zhaoyachao / zdh_server

Licence: Apache-2.0 license
数据采集平台zdh,etl 处理服务

Programming Languages

scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to zdh server

APIConnectors
A curated list of example code to collect data from Web APIs using DataPrep.Connector.
Stars: ✭ 22 (-58.49%)
Mutual labels:  dataconnector, datacollection
neo4j-jdbc
JDBC driver for Neo4j
Stars: ✭ 110 (+107.55%)
Mutual labels:  etl
openmrs-fhir-analytics
A collection of tools for extracting FHIR resources and analytics services on top of that data.
Stars: ✭ 55 (+3.77%)
Mutual labels:  etl
chronicle-etl
📜 A CLI toolkit for extracting and working with your digital history
Stars: ✭ 78 (+47.17%)
Mutual labels:  etl
NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+92.45%)
Mutual labels:  etl
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+426.42%)
Mutual labels:  etl
vixtract
www.vixtract.ru
Stars: ✭ 40 (-24.53%)
Mutual labels:  etl
FlowMaster
ETL flow framework based on Yaml configs in Python
Stars: ✭ 19 (-64.15%)
Mutual labels:  etl
link-move
A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-39.62%)
Mutual labels:  etl
pentaho-gis-plugins
🗺 GIS plugins for Pentaho Data Integration
Stars: ✭ 42 (-20.75%)
Mutual labels:  etl
dbt-databricks
A dbt adapter for Databricks.
Stars: ✭ 115 (+116.98%)
Mutual labels:  etl
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (-30.19%)
Mutual labels:  etl
wikirepo
Python based Wikidata framework for easy dataframe extraction
Stars: ✭ 33 (-37.74%)
Mutual labels:  etl
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-62.26%)
Mutual labels:  etl
awesome-integration
A curated list of awesome system integration software and resources.
Stars: ✭ 117 (+120.75%)
Mutual labels:  etl
id3c
Data logistics system enabling real-time pathogen surveillance. Built for the Seattle Flu Study.
Stars: ✭ 21 (-60.38%)
Mutual labels:  etl
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-62.26%)
Mutual labels:  etl
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-60.38%)
Mutual labels:  etl
openrefine-batch
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Stars: ✭ 76 (+43.4%)
Mutual labels:  etl
iex-stocks
ETL for the IEX Stocks API
Stars: ✭ 19 (-64.15%)
Mutual labels:  etl

技术栈

  • spark 2.4.4
  • hadoop 3.1.x
  • hive > 2.3.3
  • kafka 1.x,2.x
  • scala 2.11.12
  • java 1.8

提示

zdh 分2部分,前端配置+后端数据ETL处理,此部分只包含ETL处理
前端配置项目 请参见项目 https://github.com/zhaoyachao/zdh_web
zdh_web 和zdh_server 保持同步 大版本会同步兼容 如果zdh_web 选择版本1.0 ,zdh_server 使用1.x 都可兼容
二次开发同学 请选择dev 分支,dev 分支只有测试通过才会合并master,所以master 可能不是最新的,但是可保证可用性

在线预览

http://zycblog.cn:8081/login
用户名:zyc
密码:123456

服务器资源有限,界面只供预览,不包含数据处理部分,谢码友们手下留情    

项目介绍

数据采集ETL处理,通过spark 平台抽取数据,并根据etl 相关函数,做数据处理
新增数据源需要继承ZdhDataSources 公共接口,重载部分函数即可

项目编译打包

项目采用gradle 管理
打包命令,在当前项目目录下执行
window: gradlew.bat release -x test
linux : ./gradlew release -x test

项目需要的jar 会自动生成到relase/libs 目录下

如果想单独打包项目代码
window: gradlew.bat jar
linux : ./gradlew jar

部署

1 先编译项目--参见上方项目编译打包
2 下载release 目录修改启动脚本
3 需要将release/copy_spark_jars 目录下的jar 拷贝到spark home 目录下的jars 目录
4 启动脚本 start_server.sh

启动脚本

注意项目需要用到log4j.properties 需要单独放到driver 机器上,启动采用client 模式
在release/bin 目录下 修改start_server.sh 脚本中的BASE_RUN_PATH 变量为当前所在路径
运行start_server.sh 脚本即可

停止脚本

 kill `ps -ef |grep SparkSubmit |grep zdh_server |awk -F ' ' '{print $2}'`

个人联系方式

FAQ

使用tidb 连接时,需要在zdh_server 启动配置文件中添加如下配置
spark.tispark.pd.addresses 192.168.1.100:2379
spark.sql.extensions org.apache.spark.sql.TiExtensions
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].