All Projects → Repository → Similar Projects or Alternatives

1887 Open source projects that are alternatives of or similar to Repository

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-79.35%)

Mutual labels: hadoop, hbase, hdfs

BigDataTools

tools for bigData

Stars: ✭ 36 (-60.87%)

Mutual labels: hive, hbase, hdfs

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-84.78%)

Mutual labels: hive, hadoop, mapreduce

Model Serving Tutorial

Code and presentation for Strata Model Serving tutorial

Stars: ✭ 57 (-38.04%)

Mutual labels: kafka, spark, flink

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+1671.74%)

Mutual labels: hadoop, hdfs, spark

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (-77.17%)

Mutual labels: hive, hbase, hdfs

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-80.43%)

Mutual labels: hive, hadoop, mapreduce

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

Stars: ✭ 36 (-60.87%)

Mutual labels: hive, hbase, flink

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-85.87%)

Mutual labels: spark, hadoop, hdfs

Spark Hbase Connector

Connect Spark to HBase for reading and writing data with ease

Stars: ✭ 299 (+225%)

Mutual labels: spark, hbase

Surging

Surging is a micro-service engine that provides a lightweight, high-performance, modular RPC request pipeline. The service engine supports http, TCP, WS,Grpc, Thrift,Mqtt, UDP, and DNS protocols. It uses ZooKeeper and Consul as a registry, and integrates it. Hash, random, polling, Fair Polling as a load balancing algorithm, built-in service gove…

Stars: ✭ 3,088 (+3256.52%)

Mutual labels: zookeeper, kafka

Elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

Stars: ✭ 298 (+223.91%)

Mutual labels: spark, hadoop

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+232.61%)

Mutual labels: spark, hadoop

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Stars: ✭ 286 (+210.87%)

Mutual labels: hadoop, mapreduce

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+229.35%)

Mutual labels: kafka, spark

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+245.65%)

Mutual labels: hadoop, mapreduce

Gather Deployment

Gathers scalable tensorflow and infrastructure deployment

Stars: ✭ 326 (+254.35%)

Mutual labels: kafka, hadoop

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (+260.87%)

Mutual labels: kafka, spark

Zenko

Zenko is the open source multi-cloud data controller: own and keep control of your data on any cloud.

Stars: ✭ 353 (+283.7%)

Mutual labels: zookeeper, kafka

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+4879.35%)

Mutual labels: hadoop, hive

Gate And Cse Resources For Students

📚 📖 📚CSE GATE Resources for GATE and CSE Aspirants 😎 😁 . Show your ❤️ by ⭐️⭐️

Stars: ✭ 321 (+248.91%)

Mutual labels: algorithm, datastructures

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (+266.3%)

Mutual labels: spark, hadoop

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-15.22%)

Mutual labels: spark, hadoop

Proalgos Cpp

C++ implementations of well-known (and some rare) algorithms, while following good software development practices

Stars: ✭ 369 (+301.09%)

Mutual labels: algorithm, datastructures

Hive

Apache Hive

Stars: ✭ 4,031 (+4281.52%)

Mutual labels: hadoop, hive

Quickgraph

Generic Graph Data Structures and Algorithms for .NET

Stars: ✭ 386 (+319.57%)

Mutual labels: algorithm, datastructures

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (+294.57%)

Mutual labels: spark, hive

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+4044.57%)

Mutual labels: spark, hadoop

Kafka Connect Ui

Web tool for Kafka Connect |

Stars: ✭ 388 (+321.74%)

Mutual labels: kafka, hdfs

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+202.17%)

Mutual labels: spark, flink

Full Stack Notes

全栈工程师手册

Stars: ✭ 366 (+297.83%)

Mutual labels: zookeeper, kafka

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+327.17%)

Mutual labels: spark, hadoop

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+334.78%)

Mutual labels: spark, hadoop

Gpmall

【咕泡学院实战项目】-基于SpringBoot+Dubbo构建的电商平台-微服务架构、商城、电商、微服务、高并发、kafka、Elasticsearch

Stars: ✭ 4,241 (+4509.78%)

Mutual labels: zookeeper, kafka

Moonbox

Moonbox is a DVtaaS (Data Virtualization as a Service) Platform

Stars: ✭ 424 (+360.87%)

Mutual labels: spark, hive

Featran

A Scala feature transformation library for data science and machine learning

Stars: ✭ 420 (+356.52%)

Mutual labels: spark, flink

Yanagishima

Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL

Stars: ✭ 424 (+360.87%)

Mutual labels: spark, hive

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+348.91%)

Mutual labels: kafka, spark

Cookbook

🎉🎉🎉JAVA高级架构师技术栈==任何技能通过 “刻意练习” 都可以达到融会贯通的境界，就像烹饪一样，这里有一份JAVA开发技术手册，只需要增加自己练习的次数。🏃🏃🏃

Stars: ✭ 428 (+365.22%)

Mutual labels: zookeeper, kafka

Java Sourcecode Blogs

Java源码分析【源码笔记】专注于Java后端系列框架的源码分析，每周持续推出Java后端系列框架的源码分析文章。

Stars: ✭ 448 (+386.96%)

Mutual labels: zookeeper, kafka

Problem Solving Javascript

🔥 Crack you JS interviews ⚡ Collection of most common JS Interview questions with Unit Tests 🚀

Stars: ✭ 451 (+390.22%)

Mutual labels: algorithm, datastructures

Algorithms and data structures

180+ Algorithm & Data Structure Problems using C++

Stars: ✭ 4,667 (+4972.83%)

Mutual labels: algorithm, datastructures

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (+350%)

Mutual labels: spark, hadoop

Yauaa

Yet Another UserAgent Analyzer

Stars: ✭ 472 (+413.04%)

Mutual labels: flink, hive

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+12953.26%)

Mutual labels: spark, hadoop

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (+453.26%)

Mutual labels: spark, mapreduce

Javakeeper

✍️ Java 工程师必备架构体系知识总结：涵盖分布式、微服务、RPC等互联网公司常用架构，以及数据存储、缓存、搜索等必备技能

Stars: ✭ 502 (+445.65%)

Mutual labels: zookeeper, kafka

Books Recommendation

程序员进阶书籍（视频），持续更新（Programmer Books）

Stars: ✭ 558 (+506.52%)

Mutual labels: zookeeper, kafka

Algorithms And Data Structures In Java

Algorithms and Data Structures in Java

Stars: ✭ 498 (+441.3%)

Mutual labels: algorithm, datastructures

Treelib

An efficient implementation of tree data structure in python 2/3.

Stars: ✭ 540 (+486.96%)

Mutual labels: algorithm, datastructures

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+5746.74%)

Mutual labels: spark, hadoop

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+6047.83%)

Mutual labels: spark, hadoop

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+581.52%)

Mutual labels: kafka, spark

Hbase Rdd

Spark RDD to read, write and delete from HBase

Stars: ✭ 277 (+201.09%)

Mutual labels: spark, hbase

Newbie Plan

📚 Java 技术体系面试指南 , 旨在锻炼学习方法论的技术指南 🚀 数学，算法，基础框架，原理剖析，职业感悟，技术面试

Stars: ✭ 412 (+347.83%)

Mutual labels: algorithm, datastructures

Competitive Programming

📌 📚 Solution of competitive programming problems, code templates, Data Structures and Algorithms, hackathons, interviews and much more.

Stars: ✭ 496 (+439.13%)

Mutual labels: algorithm, datastructures

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+5892.39%)

Mutual labels: spark, flink

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+601.09%)

Mutual labels: spark, hadoop

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+656.52%)

Mutual labels: spark, hive

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-94.57%)

Mutual labels: hadoop, hdfs

61-120 of 1887 similar projects

‹

›

next*5