H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+2237.19%)

Mutual labels: big-data

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-88.84%)

Mutual labels: big-data

Belajarpython.com

Open Source Indonesian Python Programming Tutorial Site

Stars: ✭ 141 (-41.74%)

Mutual labels: big-data

hazelcast-csharp-client

Hazelcast .NET Client

Stars: ✭ 98 (-59.5%)

Mutual labels: big-data

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+2178.1%)

Mutual labels: big-data

corpusexplorer2.0

Korpuslinguistik war noch nie so einfach...

Stars: ✭ 16 (-93.39%)

Mutual labels: big-data

Kudu

Mirror of Apache Kudu

Stars: ✭ 1,360 (+461.98%)

Mutual labels: big-data

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (-89.26%)

Mutual labels: big-data

Scanner

Efficient video analysis at scale

Stars: ✭ 569 (+135.12%)

Mutual labels: big-data

merkle-db

High-scalability analytics database built on immutable merkle-trees

Stars: ✭ 44 (-81.82%)

Mutual labels: big-data

Flume

Mirror of Apache Flume

Stars: ✭ 2,200 (+809.09%)

Mutual labels: big-data

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (-6.2%)

Mutual labels: big-data

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (+130.17%)

Mutual labels: big-data

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (-83.88%)

Mutual labels: big-data

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-59.92%)

Mutual labels: big-data

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (-86.36%)

Mutual labels: big-data

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (+118.18%)

Mutual labels: big-data

cdp-service

cdp数据平台，帮助企业充分了解客户，实现千人千面的精准营销。

Stars: ✭ 30 (-87.6%)

Mutual labels: big-data

Hazelcast Go Client

Hazelcast IMDG Go Client

Stars: ✭ 140 (-42.15%)

Mutual labels: big-data

sgd

An R package for large scale estimation with stochastic gradient descent

Stars: ✭ 55 (-77.27%)

Mutual labels: big-data

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+2027.69%)

Mutual labels: big-data

ytpriv

YT metadata exporter

Stars: ✭ 28 (-88.43%)

Mutual labels: big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+452.89%)

Mutual labels: big-data

scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Stars: ✭ 887 (+266.53%)

Mutual labels: big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+109.5%)

Mutual labels: big-data

accumulo-docker

Apache Accumulo Docker

Stars: ✭ 17 (-92.98%)

Mutual labels: big-data

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-11.16%)

Mutual labels: big-data

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (-85.12%)

Mutual labels: big-data

Stream Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+1790.91%)

Mutual labels: big-data

incubator-tez

Mirror of Apache Tez (Incubating)

Stars: ✭ 60 (-75.21%)

Mutual labels: big-data

Reef

Mirror of Apache REEF

Stars: ✭ 92 (-61.98%)

Mutual labels: big-data

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)