Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Stars: ✭ 1,063 (-9.76%)

Mutual labels: big-data

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-97.96%)

Mutual labels: big-data

Carbondata

Mirror of Apache CarbonData

Stars: ✭ 1,158 (-1.7%)

Mutual labels: big-data

Parquet Format

Apache Parquet

Stars: ✭ 800 (-32.09%)

Mutual labels: big-data

Couchdb Couch

Mirror of Apache CouchDB

Stars: ✭ 43 (-96.35%)

Mutual labels: big-data

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (-36.76%)

Mutual labels: big-data

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-94.99%)

Mutual labels: big-data

Data Science Career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

Stars: ✭ 630 (-46.52%)

Mutual labels: big-data

Esper Tv

Esper instance for TV news analysis

Stars: ✭ 37 (-96.86%)

Mutual labels: big-data

Skymap

High-throughput gene to knowledge mapping through massive integration of public sequencing data.

Stars: ✭ 29 (-97.54%)

Mutual labels: big-data

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+380.14%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-95.16%)

Mutual labels: big-data

Awesome Scalability

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

Stars: ✭ 36,688 (+3014.43%)

Mutual labels: big-data

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-94.48%)

Mutual labels: big-data

K8s Ingress Claim

An admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.

Stars: ✭ 14 (-98.81%)

Mutual labels: big-data

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-95.42%)

Mutual labels: big-data

Dremio Oss

Dremio - the missing link in modern data

Stars: ✭ 862 (-26.83%)

Mutual labels: big-data

Countly Sdk Cordova

Countly Product Analytics SDK for Cordova, Icenium and Phonegap

Stars: ✭ 69 (-94.14%)

Mutual labels: big-data

Accumulo

Apache Accumulo

Stars: ✭ 857 (-27.25%)

Mutual labels: big-data

Oodt

Mirror of Apache OODT

Stars: ✭ 52 (-95.59%)

Mutual labels: big-data

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (-27.5%)

Mutual labels: big-data

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (-4.41%)

Mutual labels: big-data

Pretzel

Javascript full-stack framework for Big Data visualisation and analysis

Stars: ✭ 26 (-97.79%)

Mutual labels: big-data

Trck

Query engine for TrailDB

Stars: ✭ 48 (-95.93%)

Mutual labels: big-data

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (-98.39%)

Mutual labels: big-data

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-93.97%)

Mutual labels: big-data

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (-30.65%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (-12.99%)

Mutual labels: big-data

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (-33.19%)

Mutual labels: big-data

Nabhash

An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data

Stars: ✭ 62 (-94.74%)

Mutual labels: big-data

Storm

Mirror of Apache Storm

Stars: ✭ 6,297 (+434.55%)

Mutual labels: big-data

Attaca

Robust, distributed version control for large files.

Stars: ✭ 41 (-96.52%)

Mutual labels: big-data

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+459.25%)

Mutual labels: big-data

Hazelcast Cpp Client

Hazelcast IMDG C++ Client

Stars: ✭ 67 (-94.31%)

Mutual labels: big-data

Samza

Mirror of Apache Samza

Stars: ✭ 676 (-42.61%)

Mutual labels: big-data

Analysispreservation.cern.ch

Source code for the CERN Analysis Preservation portal

Stars: ✭ 37 (-96.86%)

Mutual labels: big-data

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (-47.11%)

Mutual labels: big-data

Attic Lens

Mirror of Apache Lens

Stars: ✭ 58 (-95.08%)

Mutual labels: big-data

Metrics

Measure behavior of Java applications

Stars: ✭ 35 (-97.03%)

Mutual labels: big-data

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (-0.25%)

Mutual labels: big-data

Appdocs

Application Performance Optimization Summary

Stars: ✭ 1,169 (-0.76%)

Mutual labels: big-data

Flink Shaded

Apache Flink shaded artifacts repository

Stars: ✭ 67 (-94.31%)

Mutual labels: big-data

Ymcache

YMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.

Stars: ✭ 58 (-95.08%)

Mutual labels: big-data

Predictionio Template Text Classifier

Text Classification Engine

Stars: ✭ 30 (-97.45%)

Mutual labels: big-data

1-60 of 369 similar projects

›

next*5