Devops Bash Tools550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
LuigiLuigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Hadoop ConnectorsLibraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
NutchApache Nutch is an extensible and scalable web crawler
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
PrestoThe official home of the Presto distributed SQL query engine for big data
Eel SdkBig Data Toolkit for the JVM
GafferA large-scale entity and relation database supporting aggregation of properties
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
SpydraEphemeral Hadoop clusters using Google Compute Platform
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
IbisA pandas-like deferred expression system, with first-class SQL support
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
WaterdropProduction Ready Data Integration Product, documentation:
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
AntsdbAntsDB is a low latency, high concurrency, MySQL compliant SQL layer for HBase
CamusMirror of Linkedin's Camus