Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-34.15%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+273.17%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (+263.41%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-9.76%)
Android NosqlLightweight, simple structured NoSQL database for Android
Stars: ✭ 284 (+246.34%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-93.9%)
Hadoop Mini Clustershadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Stars: ✭ 265 (+223.17%)
Basehttps://www.researchgate.net/profile/Rajah_Iyer
Stars: ✭ 48 (-41.46%)
pulsephData Pulse application log aggregation and monitoring
Stars: ✭ 13 (-84.15%)
Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+701.22%)
knitDeprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Stars: ✭ 53 (-35.37%)
ChukwaMirror of Apache Chukwa
Stars: ✭ 77 (-6.1%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-82.93%)
TonyTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 626 (+663.41%)
XLearning-GPUqihoo360 xlearning with GPU support; AI on Hadoop
Stars: ✭ 22 (-73.17%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1119.51%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-84.15%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+6797.56%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-17.07%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+6459.76%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-57.32%)
jumbo🐘 A local Hadoop cluster bootstrapper using Vagrant, Ansible, and Ambari.
Stars: ✭ 17 (-79.27%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+495.12%)
TILToday I Learned
Stars: ✭ 43 (-47.56%)
School Of SreAt LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+6169.51%)
TitanDataOperationSystem最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
Stars: ✭ 62 (-24.39%)
AkkeeperAn easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-63.41%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+26787.8%)
cloud云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (-41.46%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-21.95%)
cmuxA set of commands for managing CDH clusters using Cloudera Manager REST API.
Stars: ✭ 34 (-58.54%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+404.88%)
platys-modern-data-platformSupport for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
Stars: ✭ 35 (-57.32%)
Storm Camel ExampleReal-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.
Stars: ✭ 28 (-65.85%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-68.29%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1357.32%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-67.07%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+379.27%)
clickhouse hadoopImport data from clickhouse to hadoop with pure SQL
Stars: ✭ 26 (-68.29%)
IgniteApache Ignite
Stars: ✭ 4,027 (+4810.98%)
LikelikeAn implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-29.27%)
HiveApache Hive
Stars: ✭ 4,031 (+4815.85%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-1.22%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-4.88%)
Docker HadoopApache Hadoop docker image
Stars: ✭ 1,190 (+1351.22%)
Hadoop PotA scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Stars: ✭ 8 (-90.24%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+302.44%)