All Projects → Waimak → Similar Projects or Alternatives

661 Open source projects that are alternatives of or similar to Waimak

Hive

Apache Hive

Stars: ✭ 4,031 (+6618.33%)

Mutual labels: hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+713.33%)

Mutual labels: hadoop

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-58.33%)

Mutual labels: spark

MLHadoop

This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (-16.67%)

Mutual labels: hadoop

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (+518.33%)

Mutual labels: spark

clickhouse hadoop

Import data from clickhouse to hadoop with pure SQL

Stars: ✭ 26 (-56.67%)

Mutual labels: hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-91.67%)

Mutual labels: hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-60%)

Mutual labels: hadoop

Sidekick

High Performance HTTP Sidecar Load Balancer

Stars: ✭ 366 (+510%)

Mutual labels: spark

Pyspark Examples

Code examples on Apache Spark using python

Stars: ✭ 58 (-3.33%)

Mutual labels: spark

ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

Stars: ✭ 48 (-20%)

Mutual labels: hadoop

darwin

Avro Schema Evolution made easy

Stars: ✭ 26 (-56.67%)

Mutual labels: hadoop

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+501.67%)

Mutual labels: spark

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-48.33%)

Mutual labels: hadoop

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+1340%)

Mutual labels: data-engineering

Gis Tools For Hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Stars: ✭ 485 (+708.33%)

Mutual labels: hadoop

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (+8.33%)

Mutual labels: data-engineering

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (-36.67%)

Mutual labels: hadoop

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+503.33%)

Mutual labels: spark

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-68.33%)

Mutual labels: hadoop

Akkeeper

An easy way to deploy your Akka services to a distributed environment.

Stars: ✭ 30 (-50%)

Mutual labels: hadoop

h4sci-course

ETH PhD Program course

Stars: ✭ 19 (-68.33%)

Mutual labels: data-engineering

Dataform

Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift

Stars: ✭ 342 (+470%)

Mutual labels: data-engineering

pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 970 (+1516.67%)

Mutual labels: data-engineering

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+1191.67%)

Mutual labels: spark

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-61.67%)

Mutual labels: hadoop

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (+475%)

Mutual labels: spark

memex-gate

General Architecture for Text Engineering

Stars: ✭ 47 (-21.67%)

Mutual labels: hadoop

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-23.33%)

Mutual labels: spark

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (-6.67%)

Mutual labels: hadoop

Iql

An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)

Stars: ✭ 341 (+468.33%)

Mutual labels: spark

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+1150%)

Mutual labels: spark

dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models

Stars: ✭ 139 (+131.67%)

Mutual labels: data-engineering

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+450%)

Mutual labels: hadoop

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (-21.67%)

Mutual labels: data-engineering

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+1490%)

Mutual labels: spark

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (-38.33%)

Mutual labels: hadoop

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (+453.33%)

Mutual labels: spark

preprocessy

Python package for Customizable Data Preprocessing Pipelines

Stars: ✭ 34 (-43.33%)

Mutual labels: data-engineering

Sparkctr

CTR prediction model based on spark(LR, GBDT, DNN)

Stars: ✭ 740 (+1133.33%)

Mutual labels: spark

corc

An ORC File Scheme for the Cascading data processing platform.

Stars: ✭ 14 (-76.67%)

Mutual labels: hadoop

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+430%)

Mutual labels: hadoop

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (-5%)

Mutual labels: data-engineering

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-8.33%)

Mutual labels: spark

disk

基于hadoop+hbase+springboot实现分布式网盘系统

Stars: ✭ 53 (-11.67%)

Mutual labels: hadoop

Tez

Apache Tez

Stars: ✭ 313 (+421.67%)

Mutual labels: hadoop

LogAnalyzeHelper

论坛日志分析系统清洗程序(包含IP规则库，UDF开发，MapReduce程序，日志数据)

Stars: ✭ 33 (-45%)

Mutual labels: hadoop

Cdhproject

hadoop各组件使用，持续更新

Stars: ✭ 733 (+1121.67%)

Mutual labels: spark

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (-11.67%)

Mutual labels: data-engineering

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (+416.67%)

Mutual labels: spark

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-70%)

Mutual labels: hadoop

School Of Sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

Stars: ✭ 5,141 (+8468.33%)

Mutual labels: hadoop

growthbook

Open Source Feature Flagging and A/B Testing Platform

Stars: ✭ 2,342 (+3803.33%)

Mutual labels: data-engineering

Play Spark Scala

Stars: ✭ 51 (-15%)

Mutual labels: spark

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+1566.67%)

Mutual labels: hadoop

Casper

A compiler for automatically re-targeting sequential Java code to Apache Spark.

Stars: ✭ 45 (-25%)

Mutual labels: spark

smolder

HL7 Apache Spark Datasource

Stars: ✭ 33 (-45%)

Mutual labels: spark

visions

Type System for Data Analysis in Python