Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-43.86%)

Mutual labels: hadoop

Tony

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 626 (+449.12%)

Mutual labels: hadoop

Hadoop Mapreduce

Mirror of Apache Hadoop MapReduce

Stars: ✭ 88 (-22.81%)

Mutual labels: hadoop

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+4861.4%)

Mutual labels: hadoop

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (+871.93%)

Mutual labels: parquet

Gather Deployment

Gathers scalable tensorflow and infrastructure deployment

Stars: ✭ 326 (+185.96%)

Mutual labels: hadoop

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+1528.07%)

Mutual labels: hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+328.07%)

Mutual labels: hadoop

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-49.12%)

Mutual labels: parquet

School Of Sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

Stars: ✭ 5,141 (+4409.65%)

Mutual labels: hadoop

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-24.56%)

Mutual labels: parquet

Presto Ethereum

Presto Ethereum Connector -- SQL on Ethereum

Stars: ✭ 450 (+294.74%)

Mutual labels: presto

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-50%)

Mutual labels: hadoop

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+5170.18%)

Mutual labels: hadoop

Bigdata Notebook

Stars: ✭ 100 (-12.28%)

Mutual labels: hadoop

Hadoop Solr

Code to index HDFS to Solr using MapReduce

Stars: ✭ 51 (-55.26%)

Mutual labels: hadoop

Akkeeper

An easy way to deploy your Akka services to a distributed environment.

Stars: ✭ 30 (-73.68%)

Mutual labels: hadoop

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+178.95%)

Mutual labels: hadoop

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (+250.88%)

Mutual labels: hadoop

Sparksql Protobuf

Read SparkSQL parquet file as RDD[Protobuf]

Stars: ✭ 82 (-28.07%)

Mutual labels: parquet

Node Parquet

NodeJS module to access apache parquet format files

Stars: ✭ 46 (-59.65%)

Mutual labels: parquet

Skale

High performance distributed data processing engine

Stars: ✭ 390 (+242.11%)

Mutual labels: parquet

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (-3.51%)

Mutual labels: hadoop

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+3244.74%)

Mutual labels: hadoop

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+783.33%)

Mutual labels: parquet

Sqlpad

Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC

Stars: ✭ 4,113 (+3507.89%)

Mutual labels: presto

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (-28.95%)

Mutual labels: hadoop

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+226.32%)

Mutual labels: hadoop

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-67.54%)

Mutual labels: hadoop

Parquet Cpp

Apache Parquet

Stars: ✭ 339 (+197.37%)

Mutual labels: parquet

Antsdb

AntsDB is a low latency, high concurrency, MySQL compliant SQL layer for HBase

Stars: ✭ 99 (-13.16%)

Mutual labels: hadoop

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+189.47%)

Mutual labels: hadoop

Jsr203 Hadoop

A Java NIO file system provider for HDFS

Stars: ✭ 35 (-69.3%)

Mutual labels: hadoop

Pystore

Fast data store for Pandas time-series data

Stars: ✭ 325 (+185.09%)

Mutual labels: parquet

Learn machine learning

Road to Machine Learning

Stars: ✭ 81 (-28.95%)

Mutual labels: hadoop

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (-74.56%)

Mutual labels: parquet

Tez

Apache Tez

Stars: ✭ 313 (+174.56%)

Mutual labels: hadoop

Hadoop Book

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White

Stars: ✭ 3,317 (+2809.65%)

Mutual labels: hadoop

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+168.42%)

Mutual labels: hadoop

Kglab

Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

Stars: ✭ 98 (-14.04%)

Mutual labels: parquet

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-31.58%)

Mutual labels: hadoop

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+732.46%)

Mutual labels: hadoop

Cloudbreak

A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.

Stars: ✭ 301 (+164.04%)

Mutual labels: hadoop

Elasticsearch loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Stars: ✭ 300 (+163.16%)

Mutual labels: parquet

Storm Camel Example

Real-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.

Stars: ✭ 28 (-75.44%)

Mutual labels: hadoop

Elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.