All Projects → XavientInformationSystems → Data Ingestion Platform

XavientInformationSystems / Data Ingestion Platform

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Data Ingestion Platform

Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (+41.03%)
Mutual labels:  spark, flink, batch-processing
Bigdata Notebook
Stars: ✭ 100 (+156.41%)
Mutual labels:  spark, flink, storm
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+1069.23%)
Mutual labels:  spark, flink, storm
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-48.72%)
Mutual labels:  spark, flink
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+317.95%)
Mutual labels:  spark, flink
Sparkstreaming
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (+358.97%)
Mutual labels:  spark, flink
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+4658.97%)
Mutual labels:  spark, flink
Featran
A Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+976.92%)
Mutual labels:  spark, flink
Cloudflow
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+612.82%)
Mutual labels:  spark, flink
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+15305.13%)
Mutual labels:  spark, flink
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+14035.9%)
Mutual labels:  spark, flink
Ecommercerecommendsystem
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Stars: ✭ 139 (+256.41%)
Mutual labels:  spark, flink
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+4569.23%)
Mutual labels:  spark, flink
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+420.51%)
Mutual labels:  spark, storm
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+223.08%)
Mutual labels:  spark, flink
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+751.28%)
Mutual labels:  spark, storm
Streaming Readings
Streaming System 相关的论文读物
Stars: ✭ 554 (+1320.51%)
Mutual labels:  flink, storm
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+1766.67%)
Mutual labels:  spark, storm
Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+29074.36%)
Mutual labels:  spark, flink
Java learning practice
java 进阶之路:面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&Flink 等
Stars: ✭ 110 (+182.05%)
Mutual labels:  spark, flink

Data Ingestion Platform(DiP)

Check out the real time data ingestion using Data Ingestion Platform (DiP) which harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to give real time data ingestion and visualization.

DiP comes along with a UI which allows to switch between multiple data streaming engines and combines them under one single platform.

DiP Features

  • Multiple Sources
  • Multiple File Formats
  • Easy to use UI
  • Data Visualization
  • High Level API’s
  • Java, Scala , Client bindings

DiP Technology Stack

  • Source System – Web Client
  • Messaging System – Apache Kafka
  • Target System – HDFS, Apache HBase, Apache Hive
  • Reporting System – Apache Phoenix, Apache Zeppelin
  • Streaming API’s – Apache Apex, Apache Flink, Apache Spark and Apache Storm
  • Programming Language – Java
  • IDE – Eclipse
  • Build tool – Apache Maven
  • Operating System – CentOS 7

DiP Architecture

The DiP architecture has four blocks in the middle layer one for each streaming engine namely Apex Streaming, Flink Streaming, Spark Streaming and Storm Streaming respectively.

DiP comes with an easy to use UI that offers the following features –

  • Switch easily between the supported streaming engines just by clicking on a radio button.
  • Supports xml, json and tsv data formats
  • Use text area to enter data manually for getting processed
  • Process files for batch processing by simply uploading them

DiP on Apex

Apache Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. It processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way.

Blog link - https://techblog.xavient.com/real-time-data-ingestion-dip-apache-apex-co-dev-opportunity/ GitHub link - https://github.com/XavientInformationSystems/Data-Ingestion-Platform/tree/master/dataingest-apex

DiP on Flink

Apache Flink is an open source platform for distributed stream and batch data processing. Flink's core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

Blog link- https://techblog.xavient.com/data-ingestion-platformdip-real-time-data-analysis-flink-streaming/ GitHub link - https://github.com/XavientInformationSystems/Data-Ingestion-Platform/tree/master/dataingest-flink

DiP on SparkStreaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

Blog link - https://techblog.xavient.com/real-time-data-ingestion-dip-spark-streaming-co-dev-opportunity/ GitHub link - https://github.com/XavientInformationSystems/Data-Ingestion-Platform/tree/master/dataingest-spark

DiP on Storm

Apache Storm is a free and open source distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Blog link - https://techblog.xavient.com/real-time-data-ingestion-easy-and-simple-co-dev-opportunity/ GitHub link - https://github.com/XavientInformationSystems/Data-Ingestion-Platform/tree/master/dataingest-storm

Credits Xavient

Technical team Neeraj Sabharwal Mohiuddin Khan Inamdar Gautam Marya Puneet Singh Sumit Chauhan

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].