All Projects → hadoop-data-ingestion-tool → Similar Projects or Alternatives

1515 Open source projects that are alternatives of or similar to hadoop-data-ingestion-tool

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+26847.06%)

Mutual labels: big-data, presto, hadoop

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+64552.94%)

Mutual labels: phoenix, big-data, hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+782.35%)

Mutual labels: big-data, hadoop, apache

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+14364.71%)

Mutual labels: presto, engine, impala

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+76117.65%)

Mutual labels: big-data, presto, hadoop

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (+335.29%)

Mutual labels: hadoop, impala, apache

TT Tech Space

TT Tech Research Notes

Stars: ✭ 21 (+23.53%)

Mutual labels: big-data, olap, greenplum

Hive

Apache Hive

Stars: ✭ 4,031 (+23611.76%)

Mutual labels: big-data, hadoop, apache

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Stars: ✭ 101 (+494.12%)

Mutual labels: big-data, presto, druid

Linkis

Stars: ✭ 2,323 (+13564.71%)

Mutual labels: presto, engine, impala

Tez

Apache Tez

Stars: ✭ 313 (+1741.18%)

Mutual labels: big-data, hadoop, apache

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+33170.59%)

Mutual labels: big-data, hadoop

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+129594.12%)

Mutual labels: big-data, hadoop

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (+11.76%)

Mutual labels: big-data, presto

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+570.59%)

Mutual labels: big-data, hadoop

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+5929.41%)

Mutual labels: big-data, hadoop

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+9423.53%)

Mutual labels: big-data, hadoop

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+652.94%)

Mutual labels: big-data, hadoop

Fili

Easily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.

Stars: ✭ 151 (+788.24%)

Mutual labels: big-data, druid

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (+100%)

Mutual labels: big-data, hadoop

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+941.18%)

Mutual labels: big-data, hadoop

Helicalinsight

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (+1158.82%)

Mutual labels: big-data, druid

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+9558.82%)

Mutual labels: big-data, hadoop

Presto Go Client

A Presto client for the Go programming language.

Stars: ✭ 183 (+976.47%)

Mutual labels: big-data, presto

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+1164.71%)

Mutual labels: big-data, hadoop

Clickhouse

ClickHouse® is a free analytics DBMS for big data

Stars: ✭ 21,089 (+123952.94%)

Mutual labels: big-data, olap

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (+2252.94%)

Mutual labels: big-data, hadoop

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (+2188.24%)

Mutual labels: big-data, hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-70.59%)

Mutual labels: big-data, hadoop

Ignite

Apache Ignite

Stars: ✭ 4,027 (+23588.24%)

Mutual labels: big-data, hadoop

masc

Microsoft's contributions for Spark with Apache Accumulo

Stars: ✭ 20 (+17.65%)

Mutual labels: big-data, apache

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (+235.29%)

Mutual labels: big-data, hadoop

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (+588.24%)

Mutual labels: big-data, hadoop

phoenix

Apache Phoenix / Hbase Spring Boot Microservices

Stars: ✭ 23 (+35.29%)

Mutual labels: phoenix, hadoop

phoenix-hibernate-dialect

An Apache Phoenix Hibernate dialect

Stars: ✭ 20 (+17.65%)

Mutual labels: phoenix, apache

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+723.53%)

Mutual labels: big-data, hadoop

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (+664.71%)

Mutual labels: big-data, hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+623.53%)

Mutual labels: presto, hadoop

Calcite

Apache Calcite

Stars: ✭ 2,816 (+16464.71%)

Mutual labels: big-data, hadoop

Couchdb Docker

Semi-official Apache CouchDB Docker images

Stars: ✭ 194 (+1041.18%)

Mutual labels: big-data, apache

Cboard

An easy to use, self-service open BI reporting and BI dashboard platform.

Stars: ✭ 2,795 (+16341.18%)

Mutual labels: big-data, olap

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+1841.18%)

Mutual labels: big-data, hadoop

big-data-lite

Samples to the Oracle Big Data Lite VM

Stars: ✭ 41 (+141.18%)

Mutual labels: big-data, hadoop

bigdatatutorial

Stars: ✭ 34 (+100%)

Mutual labels: hadoop, greenplum

yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

Stars: ✭ 44 (+158.82%)

Mutual labels: hadoop, apache

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (+1235.29%)

Mutual labels: big-data, olap

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-5.88%)

Mutual labels: big-data, hadoop

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (+94.12%)

Mutual labels: phoenix, big-data

couchdb-pkg

Apache CouchDB Packaging support files

Stars: ✭ 24 (+41.18%)

Mutual labels: big-data, apache

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (+23.53%)

Mutual labels: hadoop, impala

Phoenix

Mirror of Apache Phoenix

Stars: ✭ 867 (+5000%)

Mutual labels: phoenix, big-data

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-11.76%)

Mutual labels: big-data, hadoop

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+129.41%)

Mutual labels: big-data, hadoop

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (+88.24%)

Mutual labels: big-data, hadoop

nifi

Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Stars: ✭ 37 (+117.65%)

Mutual labels: big-data, apache

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (+35.29%)

Mutual labels: hadoop, impala

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+13782.35%)

Mutual labels: big-data, olap

Cloudbreak

A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.

Stars: ✭ 301 (+1670.59%)

Mutual labels: big-data, hadoop

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (+176.47%)

Mutual labels: big-data, hadoop

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+4758.82%)

Mutual labels: phoenix, hadoop

1-60 of 1515 similar projects

›

next*5