TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+178.99%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-84.35%)
CloudbreakA tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.
Stars: ✭ 301 (-81.67%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-81.85%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-93.12%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-98.48%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-79.48%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (-79.9%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-79.11%)
TezApache Tez
Stars: ✭ 313 (-80.94%)
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (-93.54%)
IgniteApache Ignite
Stars: ✭ 4,027 (+145.25%)
hadoop-docker-liteDocker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-98.54%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-76.31%)
Gremlin ScalaScala wrapper for Apache TinkerPop 3 Graph DSL
Stars: ✭ 462 (-71.86%)
EliasdbEliasDB a graph-based database.
Stars: ✭ 611 (-62.79%)
Awesome GraphA curated list of resources for graph databases and graph computing tools
Stars: ✭ 717 (-56.33%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+56.88%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+13.03%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-92.87%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (-70.28%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (-44.21%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+235.75%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+631.36%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+592.94%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-99.33%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-54.63%)
bigtableTypeScript Bigtable Client with 🔋🔋 included.
Stars: ✭ 13 (-99.21%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (-42.2%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-99.7%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-98.23%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-98.36%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+1825.58%)
IndradbA graph database written in rust
Stars: ✭ 1,035 (-36.97%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-96.71%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (-37.58%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-96.35%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-96.47%)
Movies Javascript BoltNeo4j Movies Example with webpack-in-browser app using the neo4j-javascript-driver
Stars: ✭ 123 (-92.51%)
Movies Java BoltNeo4j Movies Example application with SparkJava backend using the neo4j-java-driver
Stars: ✭ 66 (-95.98%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-96.04%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-95.86%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+10.78%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (-39.1%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-93%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-95.25%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-93.06%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (-27.22%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-95.19%)
Neo4jGraphs for Everyone
Stars: ✭ 9,582 (+483.56%)
CogA Persistent Embedded Graph Database for Python
Stars: ✭ 90 (-94.52%)
RedisgraphA graph database as a Redis module
Stars: ✭ 1,292 (-21.32%)
TinkerpopApache TinkerPop - a graph computing framework
Stars: ✭ 1,309 (-20.28%)