Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (+192.86%)
hadoop-cryptoLibrary for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (+35.71%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-35.71%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+3421.43%)
Hadoop PotA scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Stars: ✭ 8 (-71.43%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-25%)
ambari-hdp-dockerDockerfiles and Docker Compose for HDP 2.6 with Blueprints
Stars: ✭ 23 (-17.86%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-32.14%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+3171.43%)
10 Weeks10-weeks of technology exploration
Stars: ✭ 22 (-21.43%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+357.14%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+2578.57%)
prestoTeradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
Stars: ✭ 91 (+225%)
CdsData syncing in golang for ClickHouse.
Stars: ✭ 501 (+1689.29%)
TiBigDataTiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+585.71%)
TensorbaseTensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.
Stars: ✭ 440 (+1471.43%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (+1457.14%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+571.43%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+1207.14%)
Javapdf🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Stars: ✭ 609 (+2075%)
DatawaveDataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Stars: ✭ 347 (+1139.29%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (+189.29%)
DatafakerDatafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+1067.86%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+2089.29%)
Janusgraph.cn分布式图数据库 JanusGraph 中文社区,关于 JanusGraph 的一切
Stars: ✭ 273 (+875%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+43746.43%)
LdetoolCode generator for fast log file parsers
Stars: ✭ 273 (+875%)
Hadoop study定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (+1925%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (+807.14%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (+214.29%)
jigsaw-seed这是组件库 Jigsaw-七巧板(https://github.com/rdkmaster/jigsaw) 的种子工程,建议所有新增的app都以这个工程作为种子开始构建。
Stars: ✭ 17 (-39.29%)
Gis Tools For HadoopThe GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (+1632.14%)
proteicStreaming and static data visualization for the modern web.
Stars: ✭ 37 (+32.14%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+42789.29%)
room-renting用Python爬取安居客房源信息,并用高德地图进行可视化
Stars: ✭ 16 (-42.86%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (+453.57%)
taller SparkRTaller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-57.14%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1350%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-17.86%)
disqA library for manipulating bioinformatics sequencing formats in Apache Spark
Stars: ✭ 29 (+3.57%)
LogAnalyzeHelper论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
Stars: ✭ 33 (+17.86%)
greycatGreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (+271.43%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+4725%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-39.29%)