Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (+14.29%)

Mutual labels: hive, etl-framework

Datafaker

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

Stars: ✭ 327 (+1457.14%)

Mutual labels: hive, bigdata

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

Stars: ✭ 36 (+71.43%)

Mutual labels: hive, azkaban

Pyetl

python ETL framework

Stars: ✭ 33 (+57.14%)

Mutual labels: hive, etl-framework

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (+38.1%)

Mutual labels: hive, bigdata

TitanDataOperationSystem

最好的大数据项目。《Titan数据运营系统》，本项目是一个全栈闭环系统，我们有用作数据可视化的web系统，然后用flume-kafaka-flume进行日志的读取，在hive设计数仓，编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移，使用azkaban进行定时任务的调度，使用技术：Java/Scala语言，Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot，Bootstrap， Echart等；

Stars: ✭ 62 (+195.24%)

Mutual labels: hive, azkaban

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-14.29%)

Mutual labels: hive, pig

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (+500%)

Mutual labels: hive, bigdata

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-9.52%)

Mutual labels: bigdata, datascience

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+166.67%)

Mutual labels: hive, bigdata

react-redux-immutable-webpack-ssr-starter

React + React-Router 4 + Redux + ImmutableJS + Bootstrap + webpack 3 with with Server side rendering, Hot Reload and redux-devtools STARTER

Stars: ✭ 21 (+0%)

Mutual labels: starter-project

ml-time-series-analysis-on-sales-data

Time Series Decomposition techniques and random forest algorithm on sales data

Stars: ✭ 34 (+61.9%)

Mutual labels: datamining

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.

Stars: ✭ 29 (+38.1%)

Mutual labels: bigdata

UnROOT.jl

Native Julia I/O package to work with CERN ROOT files

Stars: ✭ 52 (+147.62%)

Mutual labels: bigdata

Anomaly Detection

anomaly detection with anomalize and Google Trends data

Stars: ✭ 38 (+80.95%)

Mutual labels: datascience

waggle-dance

Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.

Stars: ✭ 194 (+823.81%)

Mutual labels: hive

hivemind

Hive API server (offloads most API calls from hived) implemented using Python+SQL

Stars: ✭ 46 (+119.05%)

Mutual labels: hive

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-33.33%)

Mutual labels: hive

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (+0%)

Mutual labels: etl-framework

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (+80.95%)

Mutual labels: etl-framework

bqv

The simplest tool to manage views of BigQuery.

Stars: ✭ 22 (+4.76%)

Mutual labels: bigdata

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (+0%)

Mutual labels: etl-framework

vite-primevue-starter

VUE 3 Starter project for using primevue 3 with Vite 2 - Pages, Layouts, Validation

Stars: ✭ 37 (+76.19%)

Mutual labels: starter-project

cds

Data syncing in golang for ClickHouse.

Stars: ✭ 839 (+3895.24%)

Mutual labels: bigdata

awesome-open-mlops

The Fuzzy Labs guide to the universe of open source MLOps

Stars: ✭ 304 (+1347.62%)

Mutual labels: datascience

analytics-platform-ops

Ops and deployment resources for MOJ Analytics platform

Stars: ✭ 18 (-14.29%)

Mutual labels: datascience

DBFilesClient.NET

Deprecated: See DBClientFiles.NET

Stars: ✭ 14 (-33.33%)

Mutual labels: datamining

machine learning from scratch matlab python

Vectorized Machine Learning in Python 🐍 From Scratch

Stars: ✭ 28 (+33.33%)

Mutual labels: datascience

ts-detox-example

Example TypeScript + React-Native + Jest project that integrates Detox for writing end-to-end tests

Stars: ✭ 54 (+157.14%)

Mutual labels: starter-project

ga-fetcher

Fetch Google Analytics data with Google APIs in Node.js 🚠

Stars: ✭ 14 (-33.33%)

Mutual labels: starter-project

awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Stars: ✭ 11,093 (+52723.81%)

Mutual labels: bigdata

apiary

Apiary provides modules which can be combined to create a federated cloud data lake

Stars: ✭ 30 (+42.86%)

Mutual labels: hive

nl4dv

A python toolkit to create Visualizations (Vis) using natural language (NL) or add an NL interface to existing Vis.

Stars: ✭ 63 (+200%)

Mutual labels: datascience

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (+152.38%)

Mutual labels: hive

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (+185.71%)

Mutual labels: bigdata

datascience-environment

Docker Environment for data science

Stars: ✭ 18 (-14.29%)

Mutual labels: datascience

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+171.43%)

Mutual labels: bigdata

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-23.81%)

Mutual labels: hive

genero-nomes

Classifica nomes por gênero de acordo com API do IBGE

Stars: ✭ 33 (+57.14%)

Mutual labels: datascience

HiveJdbcStorageHandler

No description or website provided.

Stars: ✭ 21 (+0%)

Mutual labels: hive

d20datascience

Data science investigations into the mechanics of the world's greatest role playing game

Stars: ✭ 50 (+138.1%)

Mutual labels: datascience

enlite-starter

Enlite Starter - React Dashboard Starter Template with Firebase Auth

Stars: ✭ 28 (+33.33%)

Mutual labels: starter-project

real-estate-neighborhood-prediction

Code to repeat the experiments of "The economic value of neighborhoods: Predicting real estate prices from the urban environment"

Stars: ✭ 53 (+152.38%)

Mutual labels: datamining

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (+47.62%)

Mutual labels: hive

vulkn

Love your Data. Love the Environment. Love VULKИ.

Stars: ✭ 43 (+104.76%)

Mutual labels: bigdata

symfony-lts-docker-starter

🐳 Dockerized your Symfony project using a complete stack (Makefile, Docker-Compose, CI, bunch of quality insurance tools, tests ...) with a base according to up-to-date components and best practices.

Stars: ✭ 39 (+85.71%)

Mutual labels: starter-project

1-60 of 500 similar projects

›

next*5