DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-75%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (-84.22%)
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (-92.01%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+236.48%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-83.4%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+2555.12%)
Spiderman基于 scrapy-redis 的通用分布式爬虫框架
Stars: ✭ 392 (-19.67%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-69.26%)
Javafamily【Java面试+Java学习指南】 一份涵盖大部分Java程序员所需要掌握的核心知识。
Stars: ✭ 28,668 (+5774.59%)
Javakeeper✍️ Java 工程师必备架构体系知识总结:涵盖分布式、微服务、RPC等互联网公司常用架构,以及数据存储、缓存、搜索等必备技能
Stars: ✭ 502 (+2.87%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+144.88%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-84.84%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-87.91%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (-83.81%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (-70.08%)
disk基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (-89.14%)
Learningsummary涵盖大部分Java进阶需要掌握的知识,包括【微服务】【中间件】【缓存】【数据库优化】【搜索引擎】【分布式】等等,欢迎Star~
Stars: ✭ 201 (-58.81%)
Cookbook🎉🎉🎉JAVA高级架构师技术栈==任何技能通过 “刻意练习” 都可以达到融会贯通的境界,就像烹饪一样,这里有一份JAVA开发技术手册,只需要增加自己练习的次数。🏃🏃🏃
Stars: ✭ 428 (-12.3%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (-90.16%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (-95.29%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-88.52%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-86.68%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-96.52%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-95.29%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-95.08%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-93.65%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-94.47%)
mangoCore utility library & data connectors designed for simpler usage in Scala
Stars: ✭ 41 (-91.6%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (-73.77%)
cmuxA set of commands for managing CDH clusters using Cloudera Manager REST API.
Stars: ✭ 34 (-93.03%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-96.31%)
cobra-policytoolManage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-96.72%)
py-hdfs-mountMount HDFS with fuse, works with kerberos!
Stars: ✭ 13 (-97.34%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+2360.86%)
darwinAvro Schema Evolution made easy
Stars: ✭ 26 (-94.67%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (-81.15%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-93.03%)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-92.62%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-95.9%)
TILToday I Learned
Stars: ✭ 43 (-91.19%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-97.34%)
libPerl Utility Library for my other repos
Stars: ✭ 16 (-96.72%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+26.02%)
hadoop-docker-liteDocker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-95.08%)
HiveApache Hive
Stars: ✭ 4,031 (+726.02%)
BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (-41.39%)
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (-34.84%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-86.07%)
TitanDataOperationSystem最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
Stars: ✭ 62 (-87.3%)