All Projects → apache → incubator-linkis

apache / incubator-linkis

Licence: Apache-2.0 License
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects
javascript
184084 projects - #8 most used programming language
Vue
7211 projects
SCSS
7915 projects
shell
77523 projects

Projects that are alternatives of or similar to incubator-linkis

Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (-5.53%)
Mutual labels:  spark, presto, hive, storage, jdbc, engine, impala, pyspark, udf, thrift-server, resource-manager, jobserver, application-manager, livy, hive-table, linkis, context-service, scriptis
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-91.22%)
Mutual labels:  spark, jdbc, pyspark
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-85.24%)
Mutual labels:  spark, hive, jdbc
Sqli
orm sql interface, Criteria, CriteriaBuilder, ResultMapBuilder
Stars: ✭ 1,644 (-33.14%)
Mutual labels:  presto, jdbc, impala
Yanagishima
Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (-82.76%)
Mutual labels:  spark, presto, hive
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+387.31%)
Mutual labels:  spark, presto, hive
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+86.3%)
Mutual labels:  presto, hive, jdbc
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (-71.7%)
Mutual labels:  spark, hive, pyspark
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (-93.45%)
Mutual labels:  spark, presto, hive
hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (-99.31%)
Mutual labels:  presto, engine, impala
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-98.54%)
Mutual labels:  spark, hive
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (-95%)
Mutual labels:  presto, hive
Hadoop Docker
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (-90.32%)
Mutual labels:  spark, hive
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-99.15%)
Mutual labels:  hive, impala
TiBigData
TiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (-92.19%)
Mutual labels:  presto, hive
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-99.31%)
Mutual labels:  hive, udf
implyr
SQL backend to dplyr for Impala
Stars: ✭ 74 (-96.99%)
Mutual labels:  jdbc, impala
hive-jdbc-driver
An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-98.74%)
Mutual labels:  hive, jdbc
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+17.89%)
Mutual labels:  spark, pyspark
liquibase-impala
Liquibase extension to add Impala Database support
Stars: ✭ 23 (-99.06%)
Mutual labels:  hive, impala

Linkis

License

English | 中文

Introduction

Linkis builds a layer of computation middleware between upper applications and underlying engines. By using standard interfaces such as REST/WS/JDBC provided by Linkis, the upper applications can easily access the underlying engines such as MySQL/Spark/Hive/Presto/Flink, etc., and achieve the intercommunication of user resources like unified variables, scripts, UDFs, functions and resource files at the same time.

As a computation middleware, Linkis provides powerful connectivity, reuse, orchestration, expansion, and governance capabilities. By decoupling the application layer and the engine layer, it simplifies the complex network call relationship, and thus reduces the overall complexity and saves the development and maintenance costs as well.

Since the first release of Linkis in 2019, it has accumulated more than 700 trial companies and 1000+ sandbox trial users, which involving diverse industries, from finance, banking, tele-communication, to manufactory, internet companies and so on. Lots of companies have already used Linkis as a unified entrance for the underlying computation and storage engines of the big data platform.

linkis-intro-01

linkis-intro-03

Features

  • Support for diverse underlying computation storage engines.
    Currently supported computation/storage engines: Spark, Hive, Python, Presto, ElasticSearch, MLSQL, TiSpark, JDBC, Shell, etc;
    Computation/storage engines to be supported: Flink(Supported in version >=1.0.2), Impala, etc;
    Supported scripting languages: SparkSQL, HiveQL, Python, Shell, Pyspark, R, Scala and JDBC, etc.

  • Powerful task/request governance capabilities. With services such as Orchestrator, Label Manager and customized Spring Cloud Gateway, Linkis is able to provide multi-level labels based, cross-cluster/cross-IDC fine-grained routing, load balance, multi-tenancy, traffic control, resource control, and orchestration strategies like dual-active, active-standby, etc.

  • Support full stack computation/storage engine. As a computation middleware, it will receive, execute and manage tasks and requests for various computation storage engines, including batch tasks, interactive query tasks, real-time streaming tasks and storage tasks;

  • Resource management capabilities. ResourceManager is not only capable of managing resources for Yarn and Linkis EngineManger as in Linkis 0.X, but also able to provide label-based multi-level resource allocation and recycling, allowing itself to have powerful resource management capabilities across mutiple Yarn clusters and mutiple computation resource types;

  • Unified Context Service. Generate Context ID for each task/request, associate and manage user and system resource files (JAR, ZIP, Properties, etc.), result set, parameter variable, function, etc., across user, system, and computing engine. Set in one place, automatic reference everywhere;

  • Unified materials. System and user-level unified material management, which can be shared and transferred across users and systems.

Supported engine types

Engine Supported Version Linkis 0.X version requirement Linkis 1.X version requirement Description
Flink 1.12.2 >=dev-0.12.0, PR #703 not merged yet. >=1.0.2 Flink EngineConn. Supports FlinkSQL code, and also supports Flink Jar to Linkis Manager to start a new Yarn application.
Impala >=3.2.0, CDH >=6.3.0" >=dev-0.12.0, PR #703 not merged yet. ongoing Impala EngineConn. Supports Impala SQL.
Presto >= 0.180 >=0.11.0 ongoing Presto EngineConn. Supports Presto SQL.
ElasticSearch >=6.0 >=0.11.0 ongoing ElasticSearch EngineConn. Supports SQL and DSL code.
Shell Bash >=2.0 >=0.9.3 >=1.0.0_rc1 Shell EngineConn. Supports shell code.
MLSQL >=1.1.0 >=0.9.1 ongoing MLSQL EngineConn. Supports MLSQL code.
JDBC MySQL >=5.0, Hive >=1.2.1 >=0.9.0 >=1.0.0_rc1 JDBC EngineConn. Supports MySQL and HiveQL code.
Spark Apache 2.0.0~2.4.7, CDH >=5.4.0 >=0.5.0 >=1.0.0_rc1 Spark EngineConn. Supports SQL, Scala, Pyspark and R code.
Hive Apache >=1.0.0, CDH >=5.4.0 >=0.5.0 >=1.0.0_rc1 Hive EngineConn. Supports HiveQL code.
Hadoop Apache >=2.6.0, CDH >=5.4.0 >=0.5.0 ongoing Hadoop EngineConn. Supports Hadoop MR/YARN application.
Python >=2.6 >=0.5.0 >=1.0.0_rc1 Python EngineConn. Supports python code.
TiSpark 1.1 >=0.5.0 ongoing TiSpark EngineConn. Support querying TiDB data by SparkSQL.

Ecosystem

Component Description Linkis 0.x(recommend 0.11.0) Compatible Linkis 1.x(recommend 1.0.3) Compatible
DataSphereStudio DataSphere Studio (DSS for short) is WeDataSphere, a one-stop data application development management portal. DSS 0.9.1[released] DSS 1.0.1[developing]
Scriptis Support online script writing such as SQL, Pyspark, HiveQL, etc., submit to Linkis to perform data analysis web tools. Scriptis merged in DSS(DSS 0.9.1[released]) In DSS 1.0.1[developing]
Schedulis Workflow task scheduling system based on Azkaban secondary development, with financial-grade features such as high performance, high availability and multi-tenant resource isolation. Schedulis 0.6.1[released] Schedulis0.6.2 [developing]
Qualitis Data quality verification tool, providing data verification capabilities such as data integrity and correctness Qualitis 0.8.0[released] Qualitis 0.9.0 [developing]
Streamis Streaming application development management tool. It supports the release of Flink Jar and Flink SQL, and provides the development, debugging and production management capabilities of streaming applications, such as: start-stop, status monitoring, checkpoint, etc. No support Streamis 0.1.0 [developing]
Exchangis A data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources, the upcoming Exchangis1. 0, will be connected with DSS workflow No support Exchangis 1.0.0 [developing]
Visualis A data visualization BI tool based on the second development of Davinci, an open source project of CreditEase, provides users with financial-level data visualization capabilities in terms of data security. Visualis 0.5.0[released] Visualis 1.0.0[developing]
Prophecis A one-stop machine learning platform that integrates multiple open source machine learning frameworks. Prophecis' MLFlow can be connected to DSS workflow through AppConn. Prophecis 0.2.2[released] Prophecis 0.3.0 [developing]

Download

Please go to the Linkis Releases Page to download a compiled distribution or a source code package of Linkis.

Compile and deploy

Please follow Compile Guide to compile Linkis from source code.
Please refer to Deployment Documents to do the deployment.

Examples and Guidance

You can find examples and guidance for how to use and manage Linkis in User Manual, Engine Usage Documents and API Documents.

Documentation

The documentation of linkis is in Linkis-Website Git Repository.

Architecture

Linkis services could be divided into three categories: computation governance services, public enhancement services and microservice governance services.

  • The computation governance services, support the 3 major stages of processing a task/request: submission -> preparation -> execution;
  • The public enhancement services, including the material library service, context service, and data source service;
  • The microservice governance services, including Spring Cloud Gateway, Eureka and Open Feign.

Below is the Linkis architecture diagram. You can find more detailed architecture docs in Linkis-Doc/Architecture. architecture

Based on Linkis the computation middleware, we've built a lot of applications and tools on top of it in the big data platform suite WeDataSphere. Below are the currently available open-source projects. More projects upcoming, please stay tuned.

wedatasphere_stack_Linkis

Contributing

Contributions are always welcomed, we need more contributors to build Linkis together. either code, or doc, or other supports that could help the community.
For code and documentation contributions, please follow the contribution guide.

Contact Us

Any questions or suggestions please kindly submit an issue.
You can scan the QR code below to join our WeChat and QQ group to get more immediate response.

introduction05

Meetup videos on Bilibili.

Who is Using Linkis

We opened an issue for users to feedback and record who is using Linkis.
Since the first release of Linkis in 2019, it has accumulated more than 700 trial companies and 1000+ sandbox trial users, which involving diverse industries, from finance, banking, tele-communication, to manufactory, internet companies and so on.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].