All Projects → WeBankFinTech → Dataspherestudio

WeBankFinTech / Dataspherestudio

Licence: apache-2.0
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Dataspherestudio

dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-97.57%)
Mutual labels:  hive, hadoop, hue, flink
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+402.76%)
Mutual labels:  spark, hadoop, flink, hive
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-68.87%)
Mutual labels:  spark, hadoop, etl, hive
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-89.46%)
Mutual labels:  spark, hadoop, flink, hive
Luigi Warehouse
A luigi powered analytics / warehouse stack
Stars: ✭ 72 (-93.97%)
Mutual labels:  spark, etl, hive, workflow
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (-31.63%)
Mutual labels:  spark, hadoop, flink, hive
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-92.3%)
Mutual labels:  spark, hadoop, flink, hive
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (-86.53%)
Mutual labels:  spark, hadoop, hive, hue
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (-30.88%)
Mutual labels:  spark, hadoop, flink, hive
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-98.33%)
Mutual labels:  spark, hadoop, flink
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-97.07%)
Mutual labels:  spark, hive, hadoop
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-96.99%)
Mutual labels:  spark, hive, hadoop
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (-95.98%)
Mutual labels:  hive, hadoop, hue
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-98.83%)
Mutual labels:  hive, hadoop, etl
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (-48.54%)
Mutual labels:  hive, hadoop, etl
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-97.99%)
Mutual labels:  hive, hadoop, etl
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-95.31%)
Mutual labels:  hive, hadoop, flink
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-98.83%)
Mutual labels:  spark, hadoop, hue
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-97.91%)
Mutual labels:  spark, hadoop, etl
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (-41.76%)
Mutual labels:  spark, hive, hue

DSS

License

English | 中文

Introduction

DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal.

Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.

DataSphere Studio is positioned as a data application development portal, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.

With the connection, reusability, and simplification capabilities of Linkis, DSS is born with financial-grade capabilities of high concurrency, high availability, multi-tenant isolation, and resource management.

UI preview

Please be patient, it will take some time to load gif.

DSS-V1.0 GIF

Core features

1. One-stop, full-process application development management UI

       DSS is highly integrated. Currently integrated systems include:

       a. Scriptis - Data Development IDE Tool.

       b. Visualis - Data Visualization Tool(Based on the open source project Davinci contributed by CreditEase)

       c. Qualitis - Data Quality Management Tool

       d. Azkaban - Batch workflow job scheduler

DSS one-stop video

2. AppJoint, based on Linkis,defines a unique design concept

       AppJoint——application joint, defining unified front-end and back-end integration specifications, can quickly and easily integrate with external data application systems, making them as part of DSS data application development.

       DSS arranges multiple AppJoints in series to form a workflow that supports real-time execution and scheduled execution. Users can complete the entire process development of data applications with simple drag and drop operations.

       Since AppJoint is integrated with Linkis, the external data application system shares the capabilities of resource management, concurrent limiting, and high performance. AppJoint also allows sharable context across system level and completely gets away from application silos.

3. Project, as the management unit

       With Project as the management unit, DSS organizes and manages the business applications of each data application system, and defines a set of common standards for collaborative development of projects across data application systems.

4. Integrated data application components

      a. Azkaban AppJoint —— Batch workflow job scheduler

         Many data applications developed by users usually require periodic scheduling capability.

         At present, the open source scheduling system in the community is pretty unfriendly to integrate with other data application systems.

         DSS implements Azkaban AppJoint, which allows users to publish DSS workflows to Azkaban for regular scheduling.

         DSS also defines standard and generic workflow parsing and publishing specifications for scheduling systems, allowing other scheduling systems to easily achieve low-cost integration with DSS.

Azkaban

      b. Scriptis AppJoint —— Data Development IDE Tool

         What is Scriptis?

         Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

         Scriptis AppJoint integrates the data development capabilities of Scriptis to DSS, and allows various script types of Scriptis to serve as nodes in the DSS workflow to participate in the application development process.

         Currently supports HiveSQL, SparkSQL, Pyspark, Scala and other script node types.

Scriptis

      c. Visualis AppJoint —— Data Visualization Tool

         What is Visualis?

         Visualis is a BI tool for data visualization. It provides financial-grade data visualization capabilities on the basis of data security and permissions, based on the open source project Davinci contributed by CreditEase.

         Visualis AppJoint integrates data visualization capabilities to DSS, and allows displays and dashboards, as nodes of DSS workflows, to be associated with upstream data market.

Visualis

      d. Qualitis AppJoint —— Data quality management Tool

         Qualitis AppJoint integrates data quality verification capabilities for DSS, allows Qualitis as a node in DSS workflow

Qualitis

      e. Data Sender——Sender AppJoint

         Sender AppJoint provides data delivery capability for DSS. Currently it supports the SendEmail node type, and the result sets of all other nodes can be sent via email.

         For example, the SendEmail node can directly send the screen shot of a display as an email.

      f. Signal AppJoint —— Signal Nodes

         Signal AppJoint is used to strengthen the correlation between business and process while keeping them decoupled.

         DataChecker Node:Checks whether a table or partition exists.

         EventSender Node: Messaging nodes across workflows and projects.

         EventReceiver: Receive nodes for messages across workflows and projects.

      g. Function node

         Empty nodes, sub workflow nodes.

Compared with similar systems

      DSS is an open source project leading the direction of data application development and management. The open source community currently does not have similar products.

Usage Scenarios

      DataSphere Studio is suitable for the following scenarios:

      1. Scenarios in which big data platform capability is being prepared or initialized but no data application tools are available.

      2. Scenarios in which users already have big data foundation platform capabilities but with only a few data application tools.

      3. Scenarios in which users have the ability of big data foundation platform and comprehensive data application tools, but suffers strong isolation and and high learning costs because those tools have not been integrated together.

      4. Scenarios in which users have the capabilities of big data foundation platform and comprehensive data application tools. but lacks unified and standardized specifications, while a part of these tools have been integrated.

Quick start

Click to Quick start

Architecture

DSS Architecture

Documents

Compiled documentation

User manual

Quick integration with DSS for external systems

Communication

communication

License

DSS is under the Apache 2.0 license. See the License file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].