xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-2.63%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-44.74%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-15.79%)
giraffeGracefully Integrated Remote Access For Files and Execution
Stars: ✭ 50 (+31.58%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (+55.26%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (+418.42%)
phishcatchA browser extension and API server for detecting corporate password use on external websites
Stars: ✭ 75 (+97.37%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+394.74%)
corcAn ORC File Scheme for the Cascading data processing platform.
Stars: ✭ 14 (-63.16%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+32207.89%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+284.21%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (+307.89%)
openPDCOpen Source Phasor Data Concentrator
Stars: ✭ 109 (+186.84%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (+300%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-15.79%)
HadoopApache Hadoop
Stars: ✭ 12,177 (+31944.74%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+223.68%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+268.42%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-60.53%)
big-data-exploration[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
Stars: ✭ 43 (+13.16%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (+236.84%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (+26.32%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+236.84%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-50%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (+228.95%)
beanszooDistributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-68.42%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+207.89%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+2.63%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+205.26%)
go-baseappA lightweight starting point for Go web servers
Stars: ✭ 61 (+60.53%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (+197.37%)
palantir-java-formatA modern, lambda-friendly, 120 character Java formatter.
Stars: ✭ 203 (+434.21%)
witchcraft-go-serverA highly opinionated Go embedded application server for RESTy APIs
Stars: ✭ 47 (+23.68%)
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (+178.95%)
amalgomateGo tool for combining multiple different main packages into a single program or library
Stars: ✭ 19 (-50%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-55.26%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+142.11%)
ambari-hdp-dockerDockerfiles and Docker Compose for HDP 2.6 with Blueprints
Stars: ✭ 23 (-39.47%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (+115.79%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (-39.47%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (+113.16%)
Devops Bash Tools550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Stars: ✭ 226 (+494.74%)
prestoTeradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
Stars: ✭ 91 (+139.47%)
memex-gateGeneral Architecture for Text Engineering
Stars: ✭ 47 (+23.68%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+236.84%)
Hadoop Attack LibraryA collection of pentest tools and resources targeting Hadoop environments
Stars: ✭ 228 (+500%)