All Projects → Waimak → Similar Projects or Alternatives

661 Open source projects that are alternatives of or similar to Waimak

Hive
Apache Hive
Stars: ✭ 4,031 (+6618.33%)
Mutual labels:  hadoop
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+713.33%)
Mutual labels:  hadoop
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-58.33%)
Mutual labels:  spark
MLHadoop
This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
Stars: ✭ 50 (-16.67%)
Mutual labels:  hadoop
Spark Structured Streaming Book
The Internals of Spark Structured Streaming
Stars: ✭ 371 (+518.33%)
Mutual labels:  spark
clickhouse hadoop
Import data from clickhouse to hadoop with pure SQL
Stars: ✭ 26 (-56.67%)
Mutual labels:  hadoop
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-91.67%)
Mutual labels:  hadoop
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-60%)
Mutual labels:  hadoop
Sidekick
High Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+510%)
Mutual labels:  spark
Pyspark Examples
Code examples on Apache Spark using python
Stars: ✭ 58 (-3.33%)
Mutual labels:  spark
ibis
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-20%)
Mutual labels:  hadoop
darwin
Avro Schema Evolution made easy
Stars: ✭ 26 (-56.67%)
Mutual labels:  hadoop
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+501.67%)
Mutual labels:  spark
hive-jdbc-driver
An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-48.33%)
Mutual labels:  hadoop
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+1340%)
Mutual labels:  data-engineering
Gis Tools For Hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (+708.33%)
Mutual labels:  hadoop
beneath
Beneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+8.33%)
Mutual labels:  data-engineering
hadoop-crypto
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (-36.67%)
Mutual labels:  hadoop
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+503.33%)
Mutual labels:  spark
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-68.33%)
Mutual labels:  hadoop
Akkeeper
An easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-50%)
Mutual labels:  hadoop
h4sci-course
ETH PhD Program course
Stars: ✭ 19 (-68.33%)
Mutual labels:  data-engineering
Dataform
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+470%)
Mutual labels:  data-engineering
pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+1516.67%)
Mutual labels:  data-engineering
Sparklyr
R interface for Apache Spark
Stars: ✭ 775 (+1191.67%)
Mutual labels:  spark
liquibase-impala
Liquibase extension to add Impala Database support
Stars: ✭ 23 (-61.67%)
Mutual labels:  hadoop
Sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (+475%)
Mutual labels:  spark
memex-gate
General Architecture for Text Engineering
Stars: ✭ 47 (-21.67%)
Mutual labels:  hadoop
Spark As Service Using Embedded Server
This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server
Stars: ✭ 46 (-23.33%)
Mutual labels:  spark
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-6.67%)
Mutual labels:  hadoop
Iql
An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (+468.33%)
Mutual labels:  spark
Coding Now
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+1150%)
Mutual labels:  spark
dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+131.67%)
Mutual labels:  data-engineering
Ozone
Scalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+450%)
Mutual labels:  hadoop
uptasticsearch
An Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-21.67%)
Mutual labels:  data-engineering
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1490%)
Mutual labels:  spark
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-38.33%)
Mutual labels:  hadoop
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+453.33%)
Mutual labels:  spark
preprocessy
Python package for Customizable Data Preprocessing Pipelines
Stars: ✭ 34 (-43.33%)
Mutual labels:  data-engineering
Sparkctr
CTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (+1133.33%)
Mutual labels:  spark
corc
An ORC File Scheme for the Cascading data processing platform.
Stars: ✭ 14 (-76.67%)
Mutual labels:  hadoop
Cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+430%)
Mutual labels:  hadoop
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-5%)
Mutual labels:  data-engineering
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-8.33%)
Mutual labels:  spark
disk
基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (-11.67%)
Mutual labels:  hadoop
Tez
Apache Tez
Stars: ✭ 313 (+421.67%)
Mutual labels:  hadoop
LogAnalyzeHelper
论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
Stars: ✭ 33 (-45%)
Mutual labels:  hadoop
Cdhproject
hadoop各组件使用,持续更新
Stars: ✭ 733 (+1121.67%)
Mutual labels:  spark
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-11.67%)
Mutual labels:  data-engineering
Clickhouse Native Jdbc
ClickHouse Native Protocol JDBC implementation
Stars: ✭ 310 (+416.67%)
Mutual labels:  spark
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-70%)
Mutual labels:  hadoop
School Of Sre
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+8468.33%)
Mutual labels:  hadoop
growthbook
Open Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+3803.33%)
Mutual labels:  data-engineering
Play Spark Scala
Stars: ✭ 51 (-15%)
Mutual labels:  spark
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1566.67%)
Mutual labels:  hadoop
Casper
A compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-25%)
Mutual labels:  spark
smolder
HL7 Apache Spark Datasource
Stars: ✭ 33 (-45%)
Mutual labels:  spark
visions
Type System for Data Analysis in Python
Stars: ✭ 136 (+126.67%)
Mutual labels:  spark
arthur-redshift-etl
ELT Code for your Data Warehouse
Stars: ✭ 22 (-63.33%)
Mutual labels:  data-engineering
Data Engineering Book
Accumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+685%)
Mutual labels:  data-engineering
301-360 of 661 similar projects