All Projects → Rsparkling → Similar Projects or Alternatives

1552 Open source projects that are alternatives of or similar to Rsparkling

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+8601.54%)

Mutual labels: data-science, spark, big-data, h2o

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+4583.08%)

Mutual labels: data-science, spark, big-data

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (+2723.08%)

Mutual labels: data-science, spark, h2o

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+33820%)

Mutual labels: data-science, spark, big-data

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+133.85%)

Mutual labels: data-science, spark, big-data

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+21.54%)

Mutual labels: data-science, spark, big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1958.46%)

Mutual labels: data-science, spark, big-data

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+249.23%)

Mutual labels: data-science, spark

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+70.77%)

Mutual labels: big-data, spark

Enabling queries on compressed data.

Stars: ✭ 257 (+295.38%)

Mutual labels: spark, big-data

Data Science Cookbook

🎓 Jupyter notebooks from UFC data science course

Stars: ✭ 60 (-7.69%)

Mutual labels: data-science, spark

Data Science Live Book

An open source book to learn data science, data analysis and machine learning, suitable for all ages!

Stars: ✭ 193 (+196.92%)

Mutual labels: data-science, big-data

Gwu data mining

Materials for GWU DNSC 6279 and DNSC 6290.

Stars: ✭ 217 (+233.85%)

Mutual labels: data-science, h2o

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-80%)

Mutual labels: big-data, spark

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-78.46%)

Mutual labels: big-data, spark

Interactive and Reactive Data Science using Scala and Spark.

Stars: ✭ 3,081 (+4640%)

Mutual labels: data-science, spark

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+5904.62%)

Mutual labels: spark, big-data

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (+300%)

Mutual labels: data-science, spark

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+5766.15%)

Mutual labels: spark, big-data

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+456.92%)

Mutual labels: spark, big-data

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+535.38%)

Mutual labels: data-science, spark

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-1.54%)

Mutual labels: data-science, spark

Interpretable machine learning with python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

Stars: ✭ 530 (+715.38%)

Mutual labels: data-science, h2o

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (+756.92%)

Mutual labels: data-science, big-data

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+8381.54%)

Mutual labels: spark, big-data

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-9.23%)

Mutual labels: data-science, big-data

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-10.77%)

Mutual labels: data-science, spark

Scalable Data Science Platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

Stars: ✭ 158 (+143.08%)

Mutual labels: data-science, spark

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-12.31%)

Mutual labels: spark, big-data

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+1046.15%)

Mutual labels: spark, big-data

Sciblog support

Support content for my blog

Stars: ✭ 694 (+967.69%)

Mutual labels: data-science, big-data

Sparkling Water

Sparkling Water provides H2O functionality inside Spark cluster

Stars: ✭ 887 (+1264.62%)

Mutual labels: spark, h2o

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (+40%)

Mutual labels: big-data, spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+46.15%)

Mutual labels: big-data, spark

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

Stars: ✭ 153 (+135.38%)

Mutual labels: data-science, big-data

A curated list of research, applications and projects built using the H2O Machine Learning platform

Stars: ✭ 293 (+350.77%)

Mutual labels: data-science, h2o

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (+335.38%)

Mutual labels: data-science, big-data

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+1632.31%)

Mutual labels: spark, big-data

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+6947.69%)

Mutual labels: data-science, big-data

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-15.38%)

Mutual labels: data-science, spark

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+455.38%)

Mutual labels: spark, big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (+536.92%)

Mutual labels: data-science, big-data

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+680%)

Mutual labels: spark, big-data

Quiz & Assignment of Coursera

Stars: ✭ 454 (+598.46%)

Mutual labels: data-science, big-data

Reproducible Data Science at Scale!

Stars: ✭ 5,305 (+8061.54%)

Mutual labels: data-science, big-data

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (+1213.85%)

Mutual labels: data-science, big-data

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+873.85%)

Mutual labels: data-science, spark

Data Science Career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

Stars: ✭ 630 (+869.23%)

Mutual labels: data-science, big-data

H2O.ai Machine Learning Interpretability Resources

Stars: ✭ 428 (+558.46%)

Mutual labels: data-science, h2o

A heterogeneous Apache Spark framework.

Stars: ✭ 11 (-83.08%)

Mutual labels: spark, big-data

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+1416.92%)

Mutual labels: data-science, spark

Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]

Stars: ✭ 854 (+1213.85%)

Mutual labels: data-science, big-data

Verteego Data Suite

Stars: ✭ 9 (-86.15%)

Mutual labels: data-science, h2o

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+48543.08%)

Mutual labels: spark, big-data

Javascript full-stack framework for Big Data visualisation and analysis

Stars: ✭ 26 (-60%)

Mutual labels: data-science, big-data

Robust, distributed version control for large files.

Stars: ✭ 41 (-36.92%)

Mutual labels: data-science, big-data

The Accelerator is a tool for fast and reproducible processing of large amounts of data.

Stars: ✭ 137 (+110.77%)

Mutual labels: data-science, big-data

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (+126.15%)

Mutual labels: data-science, spark

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (+546.15%)

Mutual labels: spark, big-data

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-60%)

Mutual labels: data-science, spark

1-60 of 1552 similar projects