Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-91.85%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (-26.31%)
Cookbook 2ndIPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (-47.38%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+322.72%)
Sci PypeA Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.
Stars: ✭ 90 (-93.27%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-88.79%)
Cookbook 2nd CodeCode of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (-59.57%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-91.55%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-28.7%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-88.57%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-44.32%)
Quantitative NotebooksEducational notebooks on quantitative finance, algorithmic trading, financial modelling and investment strategy
Stars: ✭ 356 (-73.39%)
Nteract📘 The interactive computing suite for you! ✨
Stars: ✭ 5,713 (+326.98%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-66.07%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-94.1%)
DtaleVisualizer for pandas data structures
Stars: ✭ 2,864 (+114.05%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-92%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-99.03%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-95.22%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-94.84%)
Datacamp🍧 A repository that contains courses I have taken on DataCamp
Stars: ✭ 69 (-94.84%)
Allstate capstoneAllstate Kaggle Competition ML Capstone Project
Stars: ✭ 72 (-94.62%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+1547.83%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-69.13%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-68.16%)
Data Science Your WayWays of doing Data Science Engineering and Machine Learning in R and Python
Stars: ✭ 530 (-60.39%)
Jupyter pivottablejsDrag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
Stars: ✭ 428 (-68.01%)
Data ScienceCollection of useful data science topics along with code and articles
Stars: ✭ 315 (-76.46%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-52.69%)
Hyperlearn50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (-10.01%)
NotebooksA collection of Jupyter/IPython notebooks
Stars: ✭ 78 (-94.17%)
TutorialsCatBoost tutorials repository
Stars: ✭ 563 (-57.92%)
Ipython DashboardA stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.
Stars: ✭ 664 (-50.37%)
ArticlesA repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
Stars: ✭ 350 (-73.84%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+296.49%)
Nbstripoutstrip output from Jupyter and IPython notebooks
Stars: ✭ 738 (-44.84%)
Show astAn IPython notebook plugin for visualizing ASTs.
Stars: ✭ 76 (-94.32%)
LambdaschooldatascienceCompleted assignments and coding challenges from the Lambda School Data Science program.
Stars: ✭ 22 (-98.36%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-98.21%)
JupytemplateTemplates for jupyter notebooks
Stars: ✭ 85 (-93.65%)
ResourcesPyMC3 educational resources
Stars: ✭ 930 (-30.49%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-36.17%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+522.5%)
Ansible JupyterhubAnsible role to setup jupyterhub server (deprecated)
Stars: ✭ 14 (-98.95%)
SkdataPython tools for data analysis
Stars: ✭ 16 (-98.8%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (-35.43%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (-25.41%)
TelepythTelegram notification with IPython magics.
Stars: ✭ 54 (-95.96%)