Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+200336.36%)

Mutual labels: spark, mapreduce

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (+4527.27%)

Mutual labels: spark, mapreduce

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning

Stars: ✭ 6,458 (+58609.09%)

Mutual labels: spark

Mathext

mathext implements basic elementary functions not included in the Go standard library [DEPRECATED]

Stars: ✭ 18 (+63.64%)

Mutual labels: scientific-computing

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+6672.73%)

Mutual labels: spark

Future

🚀 R package: future: Unified Parallel and Distributed Processing in R for Everyone

Stars: ✭ 735 (+6581.82%)

Mutual labels: parallelization

Spark Swagger

Spark (http://sparkjava.com/) support for Swagger (https://swagger.io/)

Stars: ✭ 25 (+127.27%)

Mutual labels: spark

Gush

Fast and distributed workflow runner using ActiveJob and Redis

Stars: ✭ 894 (+8027.27%)

Mutual labels: parallelization

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+6518.18%)

Mutual labels: spark

Casadi

CasADi is a symbolic framework for numeric optimization implementing automatic differentiation in forward and reverse modes on sparse matrix-valued computational graphs. It supports self-contained C-code generation and interfaces state-of-the-art codes such as SUNDIALS, IPOPT etc. It can be used from C++, Python or Matlab/Octave.

Stars: ✭ 714 (+6390.91%)

Mutual labels: scientific-computing

Parquet Generator

Parquet file generator

Stars: ✭ 16 (+45.45%)

Mutual labels: spark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+6327.27%)

Mutual labels: spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+6227.27%)

Mutual labels: spark

Pbmcapply

Tracking the progress of mc*apply with progress bar.

Stars: ✭ 25 (+127.27%)

Mutual labels: parallelization

Chronicler

Scala toolchain for InfluxDB

Stars: ✭ 24 (+118.18%)

Mutual labels: spark

Sparkling Water

Sparkling Water provides H2O functionality inside Spark cluster

Stars: ✭ 887 (+7963.64%)

Mutual labels: spark

Mfem

Lightweight, general, scalable C++ library for finite element methods

Stars: ✭ 667 (+5963.64%)

Mutual labels: scientific-computing

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+6945.45%)

Mutual labels: spark

Spark Scala Tutorial

A free tutorial for Apache Spark.

Stars: ✭ 907 (+8145.45%)

Mutual labels: spark

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+6718.18%)

Mutual labels: spark

Coursera Uw Machine Learning Clustering Retrieval

Stars: ✭ 25 (+127.27%)

Mutual labels: mapreduce

Sparkctr

CTR prediction model based on spark(LR, GBDT, DNN)

Stars: ✭ 740 (+6627.27%)

Mutual labels: spark

Edge

Extreme-scale Discontinuous Galerkin Environment (EDGE)

Stars: ✭ 18 (+63.64%)

Mutual labels: scientific-computing

Cdhproject

hadoop各组件使用，持续更新

Stars: ✭ 733 (+6563.64%)

Mutual labels: spark

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+7600%)

Mutual labels: spark

Frameless

Expressive types for Spark.

Stars: ✭ 717 (+6418.18%)

Mutual labels: spark

Hail

Scalable genomic data analysis.

Stars: ✭ 706 (+6318.18%)

Mutual labels: spark

Reflow

A language and runtime for distributed, incremental data processing in the cloud

Stars: ✭ 706 (+6318.18%)

Mutual labels: scientific-computing

Big Data Scala Spark

Coursera's big data course with Scala and Spark

Stars: ✭ 16 (+45.45%)

Mutual labels: spark

Learn Julia The Hard Way

Learn Julia the hard way!

Stars: ✭ 679 (+6072.73%)

Mutual labels: scientific-computing

Ruptures

ruptures: change point detection in Python

Stars: ✭ 654 (+5845.45%)

Mutual labels: scientific-computing

Distributed Computing

distributed_computing include mapreduce kvstore etc.

Stars: ✭ 654 (+5845.45%)

Mutual labels: mapreduce

Ocaml Odepack

Binding to the ODEPACK FORTRAN library

Stars: ✭ 6 (-45.45%)

Mutual labels: scientific-computing

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (+109.09%)

Mutual labels: spark

Corral

🐎 A serverless MapReduce framework written for AWS Lambda

Stars: ✭ 648 (+5790.91%)

Mutual labels: mapreduce

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+7409.09%)

Mutual labels: spark

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+5763.64%)

Mutual labels: spark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+5654.55%)

Mutual labels: spark

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+7327.27%)

Mutual labels: spark

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+5600%)

Mutual labels: spark

Vexcl

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

Stars: ✭ 626 (+5590.91%)

Mutual labels: scientific-computing

Core

The core source repository for the Cherab project.

Stars: ✭ 26 (+136.36%)

Mutual labels: scientific-computing

Digitrecognizer

Java Convolutional Neural Network example for Hand Writing Digit Recognition

Stars: ✭ 23 (+109.09%)

Mutual labels: spark

Linfa

A Rust machine learning framework.

Stars: ✭ 812 (+7281.82%)

Mutual labels: scientific-computing

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+50718.18%)

Mutual labels: spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+51318.18%)

Mutual labels: spark

Itk

Insight Toolkit (ITK) -- Official Repository. ITK builds on a proven, spatially-oriented architecture for processing, segmentation, and registration of scientific images in two, three, or more dimensions.

Stars: ✭ 801 (+7181.82%)

Mutual labels: scientific-computing

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+5454.55%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+50018.18%)

Mutual labels: spark

1-60 of 677 similar projects

›

next*5