Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-41.31%)
Docker HadoopApache Hadoop docker image
Stars: ✭ 1,190 (+458.69%)
SpydraEphemeral Hadoop clusters using Google Compute Platform
Stars: ✭ 128 (-39.91%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-16.9%)
ChukwaMirror of Apache Chukwa
Stars: ✭ 77 (-63.85%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-42.72%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-17.37%)
Hive Third FunctionsSome useful custom hive udf functions, especial array, json, math, string functions.
Stars: ✭ 151 (-29.11%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-45.07%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-68.08%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (-28.64%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-66.2%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (-7.51%)
Eyerissf An Eyeriss Chip (researched by MIT, a CNN accelerator) simulator and New DNN framework "Hive"
Stars: ✭ 68 (-68.08%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-68.54%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-69.95%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-29.58%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+665.26%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-71.83%)
LikelikeAn implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-72.77%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+5525.82%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+5663.85%)
HadoopApache Hadoop
Stars: ✭ 12,177 (+5616.9%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-74.65%)
Hadoop SolrCode to index HDFS to Solr using MapReduce
Stars: ✭ 51 (-76.06%)
Basehttps://www.researchgate.net/profile/Rajah_Iyer
Stars: ✭ 48 (-77.46%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-46.48%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+381.22%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+369.48%)
Parquet RsApache Parquet implementation in Rust
Stars: ✭ 144 (-32.39%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-46.48%)
Jsr203 HadoopA Java NIO file system provider for HDFS
Stars: ✭ 35 (-83.57%)
Docs4dev后端开发常用框架文档及中文翻译,包含 Spring 系列文档(Spring, Spring Boot, Spring Cloud, Spring Security, Spring Session),大数据(Apache Hive, HBase, Apache Flume),日志(Log4j2, Logback),Http Server(NGINX,Apache),Python,数据库(OpenTSDB,MySQL,PostgreSQL)等最新官方文档以及对应的中文翻译。
Stars: ✭ 974 (+357.28%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+969.01%)
HiveFast. Scalable. Powerful. The Blockchain for Web 3.0
Stars: ✭ 142 (-33.33%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-46.95%)
Pyetlpython ETL framework
Stars: ✭ 33 (-84.51%)
AkkeeperAn easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-85.92%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+345.54%)
Storm Camel ExampleReal-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.
Stars: ✭ 28 (-86.85%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-33.8%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+771.36%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+302.35%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-23.47%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+297.65%)
Hadoop PotA scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Stars: ✭ 8 (-96.24%)
DatabookA facebook for data
Stars: ✭ 26 (-87.79%)
Php Thrift SqlA PHP library for connecting to Hive or Impala over Thrift
Stars: ✭ 107 (-49.77%)
Stormtweetssentimentd3vizComputes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.
Stars: ✭ 25 (-88.26%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+330.05%)