BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+1488.89%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (+511.11%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (+272.22%)
AkkeeperAn easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (+66.67%)
ob bulkstashBulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+527.78%)
TrampolineAdmin Spring Boot Locally
Stars: ✭ 325 (+1705.56%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+2611.11%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+122388.89%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+5172.22%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (+533.33%)
CloudbreakA tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.
Stars: ✭ 301 (+1572.22%)
Ecs DeployPowerful CLI tool to simplify Amazon ECS deployments, rollbacks & scaling
Stars: ✭ 541 (+2905.56%)
serverless-data-pipeline-samServerless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena
Stars: ✭ 78 (+333.33%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+411.11%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+116.67%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+611.11%)
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+1666.67%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+88.89%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+4661.11%)
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+116.67%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+711.11%)
CruiserA Pharo Tool to package applications
Stars: ✭ 41 (+127.78%)
interview-refresh-java-bigdataa one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
Stars: ✭ 25 (+38.89%)
vcredistLifecycle management for the Microsoft Visual C++ Redistributables
Stars: ✭ 91 (+405.56%)
SpringsScalaSample Projects for Creating Springs Web services in Scala
Stars: ✭ 16 (-11.11%)
QtRelease Windowspractice project,Helps with QT software deployment on Windows
Stars: ✭ 13 (-27.78%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (+16.67%)
Tempsλ A selfhostable serverless function runtime. Inspired by zeit now.
Stars: ✭ 15 (-16.67%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-11.11%)
maven-artifacts-uploadercommand line tool for uploading directory of maven artifacts to nexus 3.x repository
Stars: ✭ 30 (+66.67%)
catacombThe simplest machine learning library for launching UIs, running evaluations, and comparing model performance.
Stars: ✭ 13 (-27.78%)
node-casperjs-aws-lambdaBase scaffolding app for a casperjs/phantomjs app running on Amazon (AWS) Lambda
Stars: ✭ 52 (+188.89%)
etranErlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments
Stars: ✭ 19 (+5.56%)
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (+22.22%)
HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (+50%)
primrosePrimrose modeling framework for simple production models
Stars: ✭ 33 (+83.33%)
yodaSimple tool to dockerize and manage deployment of your project
Stars: ✭ 69 (+283.33%)
aws-compute-decision-treeA decision tree to help you decide on the right AWS compute service for your needs.
Stars: ✭ 25 (+38.89%)
maven-resourceMaven Repository Manager Concourse Resource
Stars: ✭ 22 (+22.22%)
terraform-aws-route53A Terraform module to create a Route53 Domain Name System (DNS) on Amazon Web Services (AWS). https://aws.amazon.com/route53/
Stars: ✭ 39 (+116.67%)
atguigu ssm crudAtguigu-SSM-CRUD 一个最基本的CRUD系统,采用IDEA+Maven搭建,具备前后端交互功能,前端采用BootStrap+Ajax异步请求DOM渲染,后端采用SpringMVC+MyBatis+Mysql8.0+Servlet+Jsp,符合REST风格URL规范,并加入了Hibernate提供的数据校验功能,支持PageHelper的分页功能,很适合SSM阶段性练习。同时用到了很多前端操作以及BootStrap组件,也有利于学习JS和前端框架。
Stars: ✭ 52 (+188.89%)