SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-94.91%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-96.42%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-91.46%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-98.65%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-97.5%)
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-99.65%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+102%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-97.05%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (-72.87%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-97.67%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-96.4%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-95.86%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-99.02%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (-99.62%)
Thunder⚡️ Nepxion Thunder is a distribution RPC framework based on Netty + Hessian + Kafka + ActiveMQ + Tibco + Zookeeper + Redis + Spring Web MVC + Spring Boot + Docker 多协议、多组件、多序列化的分布式RPC调用框架
Stars: ✭ 204 (-96.6%)
ZenkoZenko is the open source multi-cloud data controller: own and keep control of your data on any cloud.
Stars: ✭ 353 (-94.12%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (-98.72%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (-99.2%)
FirecampServerless Platform for the stateful services
Stars: ✭ 194 (-96.77%)
Lidea大型分布式系统实时监控平台
Stars: ✭ 28 (-99.53%)
logparserEasy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Stars: ✭ 139 (-97.69%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (-98.69%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-93.46%)
KafdropKafka Web UI
Stars: ✭ 3,158 (-47.44%)
disk基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (-99.12%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-93.13%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-94.39%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (-97.87%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-99.53%)
mangoCore utility library & data connectors designed for simpler usage in Scala
Stars: ✭ 41 (-99.32%)
coolplayflinkFlink: Stateful Computations over Data Streams
Stars: ✭ 14 (-99.77%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-94.47%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-99.72%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-99.62%)
common-datax基于DataX的通用数据同步微服务,一个Restful接口搞定所有通用数据同步
Stars: ✭ 51 (-99.15%)
flink-learnLearning Flink : Flink CEP,Flink Core,Flink SQL
Stars: ✭ 70 (-98.83%)
MoonboxMoonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (-92.94%)
Gather DeploymentGathers scalable tensorflow and infrastructure deployment
Stars: ✭ 326 (-94.57%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-99.48%)
darwinAvro Schema Evolution made easy
Stars: ✭ 26 (-99.57%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-99.7%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-99.6%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (-93.11%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-99.55%)
flokkrDocumentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (-99.5%)
OperatorsCollection of Kubernetes Operators built with KUDO.
Stars: ✭ 175 (-97.09%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (-98.47%)