DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (-28.83%)

Mutual labels: hadoop

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-56.44%)

Mutual labels: spark

Quill

Compile-time Language Integrated Queries for Scala

Stars: ✭ 1,998 (+1125.77%)

Mutual labels: spark

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识

Stars: ✭ 12,136 (+7345.4%)

Mutual labels: spark

Openuba

A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]

Stars: ✭ 127 (-22.09%)

Mutual labels: spark

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-40.49%)

Mutual labels: spark

Mlfeature

Feature engineering toolkit for Spark MLlib.

Stars: ✭ 12 (-92.64%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+1049.08%)

Mutual labels: spark

Atsd

Axibase Time Series Database Documentation

Stars: ✭ 68 (-58.28%)

Mutual labels: hadoop

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (+127.61%)

Mutual labels: spark

Spark Lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Stars: ✭ 114 (-30.06%)

Mutual labels: spark

Sidekick

High Performance HTTP Sidecar Load Balancer

Stars: ✭ 366 (+124.54%)

Mutual labels: spark

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (-58.9%)

Mutual labels: spark

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+121.47%)

Mutual labels: spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-15.95%)

Mutual labels: spark

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+122.09%)

Mutual labels: spark

Src

A light-weight distributed stream computing framework for Golang

Stars: ✭ 67 (-58.9%)

Mutual labels: hadoop

Sparkstreaming

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计；SpringBoot+Echarts实现数据可视化展示

Stars: ✭ 349 (+114.11%)

Mutual labels: spark

Spring Shiro Spark

Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试

Stars: ✭ 114 (-30.06%)

Mutual labels: spark

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (+111.66%)

Mutual labels: spark

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-60.12%)

Mutual labels: spark

Hadoop Common

Mirror of Apache Hadoop common

Stars: ✭ 155 (-4.91%)

Mutual labels: hadoop

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+102.45%)

Mutual labels: hadoop

Jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-60.74%)

Mutual labels: hadoop

Gather Deployment

Gathers scalable tensorflow and infrastructure deployment

Stars: ✭ 326 (+100%)

Mutual labels: hadoop

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (-30.67%)

Mutual labels: spark

Sparklint

A tool for monitoring and tuning Spark jobs for efficiency.