AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

Stars: ✭ 21 (-25%)

Mutual labels: bigdata

ambari-hdp-docker

Dockerfiles and Docker Compose for HDP 2.6 with Blueprints

Stars: ✭ 23 (-17.86%)

Mutual labels: hadoop

Docker Hadoop Cluster

Multiple node cluster on Docker for self development.

Stars: ✭ 82 (+192.86%)

Mutual labels: hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-32.14%)

Mutual labels: hadoop

Spark Streaming Monitoring With Lightning

Plot live-stats as graph from ApacheSpark application using Lightning-viz

Stars: ✭ 15 (-46.43%)

Mutual labels: bigdata

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+3171.43%)

Mutual labels: hadoop

10 Weeks

10-weeks of technology exploration

Stars: ✭ 22 (-21.43%)

Mutual labels: bigdata

skein

A tool and library for easily deploying applications on Apache YARN

Stars: ✭ 128 (+357.14%)

Mutual labels: hadoop

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+2578.57%)

Mutual labels: bigdata

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+2850%)

Mutual labels: hadoop

Running Elasticsearch Fun Profit

A book about running Elasticsearch

Stars: ✭ 664 (+2271.43%)

Mutual labels: bigdata

presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Stars: ✭ 91 (+225%)

Mutual labels: hadoop

Cds

Data syncing in golang for ClickHouse.

Stars: ✭ 501 (+1689.29%)

Mutual labels: bigdata

TiBigData

TiDB connectors for Flink/Hive/Presto

Stars: ✭ 192 (+585.71%)

Mutual labels: bigdata

Tensorbase

TensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.

Stars: ✭ 440 (+1471.43%)

Mutual labels: bigdata

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+2203.57%)

Mutual labels: hadoop

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (+1457.14%)

Mutual labels: bigdata

Hive Jdbc Uber Jar

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Stars: ✭ 188 (+571.43%)

Mutual labels: hadoop

Sidekick

High Performance HTTP Sidecar Load Balancer

Stars: ✭ 366 (+1207.14%)

Mutual labels: bigdata

Javapdf

🍣100本 Java电子书技术书籍PDF(以下载阅读为荣，以点赞收藏为耻)

Stars: ✭ 609 (+2075%)

Mutual labels: hadoop

Datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

Stars: ✭ 347 (+1139.29%)

Mutual labels: bigdata

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (+189.29%)

Mutual labels: hadoop

hadoop-ecosystem

Visualizations of the Hadoop Ecosystem

Stars: ✭ 20 (-28.57%)

Mutual labels: hadoop

Datafaker

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

Stars: ✭ 327 (+1067.86%)

Mutual labels: bigdata

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+2089.29%)

Mutual labels: hadoop

Janusgraph.cn

分布式图数据库 JanusGraph 中文社区，关于 JanusGraph 的一切

Stars: ✭ 273 (+875%)

Mutual labels: bigdata

Deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…

Stars: ✭ 12,277 (+43746.43%)

Mutual labels: hadoop

Ldetool

Code generator for fast log file parsers

Stars: ✭ 273 (+875%)

Mutual labels: bigdata

Hadoop study

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Stars: ✭ 567 (+1925%)

Mutual labels: hadoop

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (+807.14%)

Mutual labels: bigdata

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (+214.29%)

Mutual labels: hadoop

jigsaw-seed

这是组件库 Jigsaw-七巧板(https://github.com/rdkmaster/jigsaw) 的种子工程，建议所有新增的app都以这个工程作为种子开始构建。

Stars: ✭ 17 (-39.29%)

Mutual labels: bigdata

Gis Tools For Hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Stars: ✭ 485 (+1632.14%)

Mutual labels: hadoop

proteic

Streaming and static data visualization for the modern web.

Stars: ✭ 37 (+32.14%)

Mutual labels: bigdata

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+475%)

Mutual labels: hadoop

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+78.57%)

Mutual labels: bigdata

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+42789.29%)

Mutual labels: hadoop

learning notes

学习笔记

Stars: ✭ 18 (-35.71%)

Mutual labels: bigdata

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-25%)

Mutual labels: hadoop

pulsar-user-group-loc-cn

Workspace for China local user group.

Stars: ✭ 19 (-32.14%)

Mutual labels: bigdata

room-renting

用Python爬取安居客房源信息，并用高德地图进行可视化

Stars: ✭ 16 (-42.86%)

Mutual labels: bigdata

Hadoop Common

Mirror of Apache Hadoop common

Stars: ✭ 155 (+453.57%)

Mutual labels: hadoop

taller SparkR

Taller SparkR para las Jornadas de Usuarios de R

Stars: ✭ 12 (-57.14%)

Mutual labels: bigdata

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1350%)

Mutual labels: hadoop

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+103.57%)

Mutual labels: bigdata

chatnoir-resiliparse

A robust web archive analytics toolkit

Stars: ✭ 26 (-7.14%)

Mutual labels: bigdata

liquibase-impala

Liquibase extension to add Impala Database support