All Projects → hadoop-etl-udfs → Similar Projects or Alternatives

366 Open source projects that are alternatives of or similar to hadoop-etl-udfs

Eyerissf

An Eyeriss Chip (researched by MIT, a CNN accelerator) simulator and New DNN framework "Hive"

Stars: ✭ 68 (+300%)

Mutual labels: hive

Stormtweetssentimentd3viz

Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.

Stars: ✭ 25 (+47.06%)

Mutual labels: hadoop

MLHadoop

This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (+194.12%)

Mutual labels: hadoop

albis

Albis: High-Performance File Format for Big Data Systems

Stars: ✭ 20 (+17.65%)

Mutual labels: parquet

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (+176.47%)

Mutual labels: hadoop

phoenix

Apache Phoenix / Hbase Spring Boot Microservices

Stars: ✭ 23 (+35.29%)

Mutual labels: hadoop

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+5288.24%)

Mutual labels: hadoop

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (+123.53%)

Mutual labels: hadoop

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+129.41%)

Mutual labels: hadoop

miniparquet

Library to read a subset of Parquet files

Stars: ✭ 38 (+123.53%)

Mutual labels: parquet

Docs4dev

后端开发常用框架文档及中文翻译，包含 Spring 系列文档（Spring, Spring Boot, Spring Cloud, Spring Security, Spring Session），大数据（Apache Hive, HBase, Apache Flume），日志（Log4j2, Logback），Http Server（NGINX，Apache），Python，数据库（OpenTSDB，MySQL，PostgreSQL）等最新官方文档以及对应的中文翻译。

Stars: ✭ 974 (+5629.41%)

Mutual labels: hive

Floating Elephants

Docker containers for Hadoop.

Stars: ✭ 19 (+11.76%)

Mutual labels: hadoop

hadoop-ecosystem

Visualizations of the Hadoop Ecosystem

Stars: ✭ 20 (+17.65%)

Mutual labels: hadoop

spark-waimai

基于spark的外卖大数据平台分析系统

Stars: ✭ 24 (+41.18%)

Mutual labels: hive

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (+1170.59%)

Mutual labels: parquet

lib mysqludf redis

Provides Mysql UDF commands to synchronize data from Mysql to Redis.

Stars: ✭ 20 (+17.65%)

Mutual labels: udf

Sqlite Parquet Vtable

A SQLite vtable extension to read Parquet files

Stars: ✭ 167 (+882.35%)

Mutual labels: parquet

hivemind

Hive API server (offloads most API calls from hived) implemented using Python+SQL

Stars: ✭ 46 (+170.59%)

Mutual labels: hive

parquet-extra

A collection of Apache Parquet add-on modules

Stars: ✭ 30 (+76.47%)

Mutual labels: parquet

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (+541.18%)

Mutual labels: parquet

hiveberg

Demonstration of a Hive Input Format for Iceberg

Stars: ✭ 22 (+29.41%)

Mutual labels: hive

docker-hadoop

Docker image for main Apache Hadoop components (Yarn/Hdfs)

Stars: ✭ 59 (+247.06%)

Mutual labels: hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-70.59%)

Mutual labels: hadoop

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (+405.88%)

Mutual labels: parquet

BigDataTools

tools for bigData

Stars: ✭ 36 (+111.76%)

Mutual labels: hive

columnify

Make record oriented data to columnar format.

Stars: ✭ 28 (+64.71%)

Mutual labels: parquet

Pyetl

python ETL framework

Stars: ✭ 33 (+94.12%)

Mutual labels: hive

Winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

Stars: ✭ 657 (+3764.71%)

Mutual labels: hadoop

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (+235.29%)

Mutual labels: parquet

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+5823.53%)

Mutual labels: parquet

Hiverunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

Stars: ✭ 225 (+1223.53%)

Mutual labels: hive

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-5.88%)

Mutual labels: parquet

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+3694.12%)

Mutual labels: hadoop

Skale

High performance distributed data processing engine

Stars: ✭ 390 (+2194.12%)

Mutual labels: parquet

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (+229.41%)

Mutual labels: hadoop

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (+1917.65%)

Mutual labels: parquet

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (+23.53%)

Mutual labels: hive

Databook

A facebook for data

Stars: ✭ 26 (+52.94%)

Mutual labels: hive

Tony

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 626 (+3582.35%)

Mutual labels: hadoop

Pystore

Fast data store for Pandas time-series data

Stars: ✭ 325 (+1811.76%)

Mutual labels: parquet

Ratatool

A tool for data sampling, data generation, and data diffing

Stars: ✭ 279 (+1541.18%)

Mutual labels: parquet

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (+64.71%)

Mutual labels: hadoop

Javapdf

🍣100本 Java电子书技术书籍PDF(以下载阅读为荣，以点赞收藏为耻)

Stars: ✭ 609 (+3482.35%)

Mutual labels: hadoop

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+33170.59%)

Mutual labels: hadoop

HybridBackend

Efficient training of deep recommenders on cloud.

Stars: ✭ 30 (+76.47%)

Mutual labels: parquet

Hadoop Attack Library

A collection of pentest tools and resources targeting Hadoop environments

Stars: ✭ 228 (+1241.18%)

Mutual labels: hadoop

meepo

异构存储数据迁移