H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+5286.67%)

Mutual labels: spark

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (+207.62%)

Mutual labels: bigdata

Play Spark Scala

Stars: ✭ 51 (-51.43%)

Mutual labels: spark

datasphere-service

an open source dataworks platform

Stars: ✭ 20 (-80.95%)

Mutual labels: bigdata

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+5150.48%)

Mutual labels: spark

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-80%)

Mutual labels: bigdata

bqv

The simplest tool to manage views of BigQuery.

Stars: ✭ 22 (-79.05%)

Mutual labels: bigdata

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+5022.86%)

Mutual labels: spark

vulkn

Love your Data. Love the Environment. Love VULKИ.

Stars: ✭ 43 (-59.05%)

Mutual labels: bigdata

Apache Spark Internals

The Internals of Apache Spark

Stars: ✭ 1,045 (+895.24%)

Mutual labels: spark

BigDataTools

tools for bigData

Stars: ✭ 36 (-65.71%)

Mutual labels: bigdata

Sparklearning

Learning Apache spark,including code and data .Most part can run local.

Stars: ✭ 558 (+431.43%)

Mutual labels: spark

UnROOT.jl

Native Julia I/O package to work with CERN ROOT files

Stars: ✭ 52 (-50.48%)

Mutual labels: bigdata

Home

ApacheCN 开源组织：公告、介绍、成员、活动、交流方式

Stars: ✭ 1,199 (+1041.9%)

Mutual labels: spark

cds

Data syncing in golang for ClickHouse.

Stars: ✭ 839 (+699.05%)

Mutual labels: bigdata

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+412.38%)

Mutual labels: spark

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-42.86%)

Mutual labels: bigdata

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-56.19%)

Mutual labels: spark

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (-46.67%)

Mutual labels: bigdata

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+388.57%)

Mutual labels: spark

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-73.33%)

Mutual labels: bigdata

Biglasso

biglasso: Extending Lasso Model Fitting to Big Data in R

Stars: ✭ 87 (-17.14%)

Mutual labels: bigdata

columnify

Make record oriented data to columnar format.

Stars: ✭ 28 (-73.33%)

Mutual labels: bigdata

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+382.86%)

Mutual labels: spark

Notes

This is a learning note | Java基础，JVM，源码，大数据，面经

Stars: ✭ 69 (-34.29%)

Mutual labels: bigdata

Delta Architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

Stars: ✭ 43 (-59.05%)

Mutual labels: spark

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-82.86%)

Mutual labels: bigdata

Pointblank

Data validation and organization of metadata for data frames and database tables

Stars: ✭ 480 (+357.14%)

Mutual labels: spark

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-72.38%)

Mutual labels: bigdata

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-20.95%)

Mutual labels: spark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-40%)

Mutual labels: spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-78.1%)

Mutual labels: spark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-78.1%)

Mutual labels: spark

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (-38.1%)

Mutual labels: bigdata

Spark

Cross-platform real-time collaboration client optimized for business and organizations.

Stars: ✭ 471 (+348.57%)

Mutual labels: spark

jhdf

A pure Java HDF5 library

Stars: ✭ 83 (-20.95%)

Mutual labels: bigdata

Gatk

Official code repository for GATK versions 4 and up

Stars: ✭ 1,002 (+854.29%)

Mutual labels: spark

dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Stars: ✭ 135 (+28.57%)

Mutual labels: bigdata

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+334.29%)

Mutual labels: spark

greycat

GreyCat - Data Analytics, Temporal data, What-if, Live machine learning

Stars: ✭ 104 (-0.95%)

Mutual labels: bigdata

lectures-hse-spark

Масштабируемое машинное обучение и анализ больших данных с Apache Spark

Stars: ✭ 20 (-80.95%)

Mutual labels: bigdata

Tensorbase

TensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.

Stars: ✭ 440 (+319.05%)

Mutual labels: bigdata

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+850.48%)

Mutual labels: spark

chatnoir-resiliparse

A robust web archive analytics toolkit

Stars: ✭ 26 (-75.24%)

Mutual labels: bigdata

PersonNotes

个人笔记集中营，快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧

Stars: ✭ 61 (-41.9%)

Mutual labels: bigdata

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+1038.1%)

Mutual labels: spark

laravel-spark-camera

Profile Photo Camera support for Laravel Spark

Stars: ✭ 30 (-71.43%)

Mutual labels: spark

Digitrecognizer

Java Convolutional Neural Network example for Hand Writing Digit Recognition

Stars: ✭ 23 (-78.1%)

Mutual labels: spark

DetEdit

A graphical user interface for annotating and editing events detected in long-term acoustic monitoring data

Stars: ✭ 20 (-80.95%)

Mutual labels: bigdata

jigsaw-seed

这是组件库 Jigsaw-七巧板(https://github.com/rdkmaster/jigsaw) 的种子工程，建议所有新增的app都以这个工程作为种子开始构建。

Stars: ✭ 17 (-83.81%)

Mutual labels: bigdata

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+972.38%)

Mutual labels: spark

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+772.38%)

Mutual labels: spark

sparkProjectTemplate.g8

Template for Spark Projects

Stars: ✭ 77 (-26.67%)

Mutual labels: spark

Book

本项目收藏这些年来看过或者听过的一些不错的书籍，在整理文件时看见这些，发现删掉有点可惜，放着又太浪费空间，本着分享的原则，就把它们共享出来，一方面给需要的读者提供这些书籍，另一方面也是一种像知识库的积累吧

Stars: ✭ 47 (-55.24%)

Mutual labels: spark

10 Weeks

10-weeks of technology exploration

Stars: ✭ 22 (-79.05%)

Mutual labels: bigdata

301-360 of 529 similar projects

first

‹

›