All Projects → Rsparkling → Similar Projects or Alternatives

1552 Open source projects that are alternatives of or similar to Rsparkling

H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+8601.54%)
Mutual labels:  data-science, spark, big-data, h2o
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+4583.08%)
Mutual labels:  data-science, spark, big-data
Benchm Ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+2723.08%)
Mutual labels:  data-science, spark, h2o
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+33820%)
Mutual labels:  data-science, spark, big-data
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+133.85%)
Mutual labels:  data-science, spark, big-data
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+21.54%)
Mutual labels:  data-science, spark, big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1958.46%)
Mutual labels:  data-science, spark, big-data
Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+249.23%)
Mutual labels:  data-science, spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+70.77%)
Mutual labels:  big-data, spark
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (+295.38%)
Mutual labels:  spark, big-data
Data Science Cookbook
🎓 Jupyter notebooks from UFC data science course
Stars: ✭ 60 (-7.69%)
Mutual labels:  data-science, spark
Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (+196.92%)
Mutual labels:  data-science, big-data
Gwu data mining
Materials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+233.85%)
Mutual labels:  data-science, h2o
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-80%)
Mutual labels:  big-data, spark
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-78.46%)
Mutual labels:  big-data, spark
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+4640%)
Mutual labels:  data-science, spark
Delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+5904.62%)
Mutual labels:  spark, big-data
Sk Dist
Distributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+300%)
Mutual labels:  data-science, spark
Bigdl
Building Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+5766.15%)
Mutual labels:  spark, big-data
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+456.92%)
Mutual labels:  spark, big-data
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+535.38%)
Mutual labels:  data-science, spark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-1.54%)
Mutual labels:  data-science, spark
Interpretable machine learning with python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Stars: ✭ 530 (+715.38%)
Mutual labels:  data-science, h2o
Nipype
Workflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+756.92%)
Mutual labels:  data-science, big-data
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+8381.54%)
Mutual labels:  spark, big-data
Verticapy
VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-9.23%)
Mutual labels:  data-science, big-data
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-10.77%)
Mutual labels:  data-science, spark
Scalable Data Science Platform
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Stars: ✭ 158 (+143.08%)
Mutual labels:  data-science, spark
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-12.31%)
Mutual labels:  spark, big-data
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+1046.15%)
Mutual labels:  spark, big-data
Sciblog support
Support content for my blog
Stars: ✭ 694 (+967.69%)
Mutual labels:  data-science, big-data
Sparkling Water
Sparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+1264.62%)
Mutual labels:  spark, h2o
spark-acid
ACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+40%)
Mutual labels:  big-data, spark
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+46.15%)
Mutual labels:  big-data, spark
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+135.38%)
Mutual labels:  data-science, big-data
Awesome H2o
A curated list of research, applications and projects built using the H2O Machine Learning platform
Stars: ✭ 293 (+350.77%)
Mutual labels:  data-science, h2o
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+335.38%)
Mutual labels:  data-science, big-data
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (+1632.31%)
Mutual labels:  spark, big-data
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+6947.69%)
Mutual labels:  data-science, big-data
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-15.38%)
Mutual labels:  data-science, spark
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+455.38%)
Mutual labels:  spark, big-data
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+536.92%)
Mutual labels:  data-science, big-data
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (+680%)
Mutual labels:  spark, big-data
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (+598.46%)
Mutual labels:  data-science, big-data
Pachyderm
Reproducible Data Science at Scale!
Stars: ✭ 5,305 (+8061.54%)
Mutual labels:  data-science, big-data
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+1213.85%)
Mutual labels:  data-science, big-data
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+873.85%)
Mutual labels:  data-science, spark
Data Science Career
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+869.23%)
Mutual labels:  data-science, big-data
Mli Resources
H2O.ai Machine Learning Interpretability Resources
Stars: ✭ 428 (+558.46%)
Mutual labels:  data-science, h2o
Sparkjni
A heterogeneous Apache Spark framework.
Stars: ✭ 11 (-83.08%)
Mutual labels:  spark, big-data
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1416.92%)
Mutual labels:  data-science, spark
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+1213.85%)
Mutual labels:  data-science, big-data
Vds
Verteego Data Suite
Stars: ✭ 9 (-86.15%)
Mutual labels:  data-science, h2o
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+48543.08%)
Mutual labels:  spark, big-data
Pretzel
Javascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-60%)
Mutual labels:  data-science, big-data
Attaca
Robust, distributed version control for large files.
Stars: ✭ 41 (-36.92%)
Mutual labels:  data-science, big-data
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+110.77%)
Mutual labels:  data-science, big-data
Datacompy
Pandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (+126.15%)
Mutual labels:  data-science, spark
Listenbrainz Server
Server for the ListenBrainz project
Stars: ✭ 420 (+546.15%)
Mutual labels:  spark, big-data
Tiledb Vcf
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-60%)
Mutual labels:  data-science, spark
1-60 of 1552 similar projects