All Projects → Repository → Similar Projects or Alternatives

1887 Open source projects that are alternatives of or similar to Repository

wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-79.35%)
Mutual labels:  hadoop, hbase, hdfs
BigDataTools
tools for bigData
Stars: ✭ 36 (-60.87%)
Mutual labels:  hive, hbase, hdfs
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-84.78%)
Mutual labels:  hive, hadoop, mapreduce
Model Serving Tutorial
Code and presentation for Strata Model Serving tutorial
Stars: ✭ 57 (-38.04%)
Mutual labels:  kafka, spark, flink
Ibis
A pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+1671.74%)
Mutual labels:  hadoop, hdfs, spark
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-77.17%)
Mutual labels:  hive, hbase, hdfs
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-80.43%)
Mutual labels:  hive, hadoop, mapreduce
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-60.87%)
Mutual labels:  hive, hbase, flink
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-85.87%)
Mutual labels:  spark, hadoop, hdfs
Spark Hbase Connector
Connect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (+225%)
Mutual labels:  spark, hbase
Surging
Surging is a micro-service engine that provides a lightweight, high-performance, modular RPC request pipeline. The service engine supports http, TCP, WS,Grpc, Thrift,Mqtt, UDP, and DNS protocols. It uses ZooKeeper and Consul as a registry, and integrates it. Hash, random, polling, Fair Polling as a load balancing algorithm, built-in service gove…
Stars: ✭ 3,088 (+3256.52%)
Mutual labels:  zookeeper, kafka
Elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (+223.91%)
Mutual labels:  spark, hadoop
Spline
Data Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+232.61%)
Mutual labels:  spark, hadoop
Behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+210.87%)
Mutual labels:  hadoop, mapreduce
Zat
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+229.35%)
Mutual labels:  kafka, spark
Cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+245.65%)
Mutual labels:  hadoop, mapreduce
Gather Deployment
Gathers scalable tensorflow and infrastructure deployment
Stars: ✭ 326 (+254.35%)
Mutual labels:  kafka, hadoop
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+260.87%)
Mutual labels:  kafka, spark
Zenko
Zenko is the open source multi-cloud data controller: own and keep control of your data on any cloud.
Stars: ✭ 353 (+283.7%)
Mutual labels:  zookeeper, kafka
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+4879.35%)
Mutual labels:  hadoop, hive
Gate And Cse Resources For Students
📚 📖 📚CSE GATE Resources for GATE and CSE Aspirants 😎 😁 . Show your ❤️ by ⭐️⭐️
Stars: ✭ 321 (+248.91%)
Mutual labels:  algorithm, datastructures
Ytk Learn
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+266.3%)
Mutual labels:  spark, hadoop
Docker Spark
🚢 Docker image for Apache Spark
Stars: ✭ 78 (-15.22%)
Mutual labels:  spark, hadoop
Proalgos Cpp
C++ implementations of well-known (and some rare) algorithms, while following good software development practices
Stars: ✭ 369 (+301.09%)
Mutual labels:  algorithm, datastructures
Hive
Apache Hive
Stars: ✭ 4,031 (+4281.52%)
Mutual labels:  hadoop, hive
Quickgraph
Generic Graph Data Structures and Algorithms for .NET
Stars: ✭ 386 (+319.57%)
Mutual labels:  algorithm, datastructures
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+294.57%)
Mutual labels:  spark, hive
Bigdl
Building Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+4044.57%)
Mutual labels:  spark, hadoop
Kafka Connect Ui
Web tool for Kafka Connect |
Stars: ✭ 388 (+321.74%)
Mutual labels:  kafka, hdfs
Cloudflow
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+202.17%)
Mutual labels:  spark, flink
Full Stack Notes
全栈工程师手册
Stars: ✭ 366 (+297.83%)
Mutual labels:  zookeeper, kafka
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+327.17%)
Mutual labels:  spark, hadoop
Big data architect skills
一个大数据架构师应该掌握的技能
Stars: ✭ 400 (+334.78%)
Mutual labels:  spark, hadoop
Gpmall
【咕泡学院实战项目】-基于SpringBoot+Dubbo构建的电商平台-微服务架构、商城、电商、微服务、高并发、kafka、Elasticsearch
Stars: ✭ 4,241 (+4509.78%)
Mutual labels:  zookeeper, kafka
Moonbox
Moonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (+360.87%)
Mutual labels:  spark, hive
Featran
A Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+356.52%)
Mutual labels:  spark, flink
Yanagishima
Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (+360.87%)
Mutual labels:  spark, hive
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+348.91%)
Mutual labels:  kafka, spark
Cookbook
🎉🎉🎉JAVA高级架构师技术栈==任何技能通过 “刻意练习” 都可以达到融会贯通的境界,就像烹饪一样,这里有一份JAVA开发技术手册,只需要增加自己练习的次数。🏃🏃🏃
Stars: ✭ 428 (+365.22%)
Mutual labels:  zookeeper, kafka
Java Sourcecode Blogs
Java源码分析 【源码笔记】专注于Java后端系列框架的源码分析,每周持续推出Java后端系列框架的源码分析文章。
Stars: ✭ 448 (+386.96%)
Mutual labels:  zookeeper, kafka
Problem Solving Javascript
🔥 Crack you JS interviews ⚡ Collection of most common JS Interview questions with Unit Tests 🚀
Stars: ✭ 451 (+390.22%)
Mutual labels:  algorithm, datastructures
Algorithms and data structures
180+ Algorithm & Data Structure Problems using C++
Stars: ✭ 4,667 (+4972.83%)
Mutual labels:  algorithm, datastructures
Marmaray
Generic Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+350%)
Mutual labels:  spark, hadoop
Yauaa
Yet Another UserAgent Analyzer
Stars: ✭ 472 (+413.04%)
Mutual labels:  flink, hive
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+12953.26%)
Mutual labels:  spark, hadoop
Cdap
An open source framework for building data analytic applications.
Stars: ✭ 509 (+453.26%)
Mutual labels:  spark, mapreduce
Javakeeper
✍️ Java 工程师必备架构体系知识总结:涵盖分布式、微服务、RPC等互联网公司常用架构,以及数据存储、缓存、搜索等必备技能
Stars: ✭ 502 (+445.65%)
Mutual labels:  zookeeper, kafka
Books Recommendation
程序员进阶书籍(视频),持续更新(Programmer Books)
Stars: ✭ 558 (+506.52%)
Mutual labels:  zookeeper, kafka
Algorithms And Data Structures In Java
Algorithms and Data Structures in Java
Stars: ✭ 498 (+441.3%)
Mutual labels:  algorithm, datastructures
Treelib
An efficient implementation of tree data structure in python 2/3.
Stars: ✭ 540 (+486.96%)
Mutual labels:  algorithm, datastructures
Alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+5746.74%)
Mutual labels:  spark, hadoop
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+6047.83%)
Mutual labels:  spark, hadoop
Freestyle
A cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (+581.52%)
Mutual labels:  kafka, spark
Hbase Rdd
Spark RDD to read, write and delete from HBase
Stars: ✭ 277 (+201.09%)
Mutual labels:  spark, hbase
Newbie Plan
📚 Java 技术体系面试指南 , 旨在锻炼学习方法论的技术指南 🚀 数学,算法,基础框架,原理剖析,职业感悟,技术面试
Stars: ✭ 412 (+347.83%)
Mutual labels:  algorithm, datastructures
Competitive Programming
📌 📚 Solution of competitive programming problems, code templates, Data Structures and Algorithms, hackathons, interviews and much more.
Stars: ✭ 496 (+439.13%)
Mutual labels:  algorithm, datastructures
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+5892.39%)
Mutual labels:  spark, flink
Useractionanalyzeplatform
电商用户行为分析大数据平台
Stars: ✭ 645 (+601.09%)
Mutual labels:  spark, hadoop
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+656.52%)
Mutual labels:  spark, hive
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-94.57%)
Mutual labels:  hadoop, hdfs
61-120 of 1887 similar projects