Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+1.87%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1150.47%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (+80.37%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+893.46%)
Scikit Mobilityscikit-mobility: mobility analysis in Python
Stars: ✭ 339 (+216.82%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (+5.61%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+19.63%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-35.51%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-26.17%)
TablesawJava dataframe and visualization library
Stars: ✭ 2,785 (+2502.8%)
Hyperlearn50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+1025.23%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (+905.61%)
CollapseAdvanced and Fast Data Transformation in R
Stars: ✭ 184 (+71.96%)
Scikit Learnscikit-learn: machine learning in Python
Stars: ✭ 48,322 (+45060.75%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1629.91%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+4181.31%)
Openml RR package to interface with OpenML
Stars: ✭ 81 (-24.3%)
DatascienceCurated list of Python resources for data science.
Stars: ✭ 3,051 (+2751.4%)
XlearnHigh performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Stars: ✭ 2,968 (+2673.83%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+4857.94%)
Imbalanced LearnA Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+5149.53%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+324.3%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+821.5%)
Deeplearning NotesNotes for Deep Learning Specialization Courses led by Andrew Ng.
Stars: ✭ 126 (+17.76%)
ChoochooTraining Diary
Stars: ✭ 186 (+73.83%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-87.85%)
Datacamp🍧 A repository that contains courses I have taken on DataCamp
Stars: ✭ 69 (-35.51%)
Awesome BigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+9692.52%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+698.13%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+286.92%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+42.99%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+265.42%)
SocratA Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-75.7%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+7684.11%)
SoccergraphrSoccer Analytics in R using OPTA data
Stars: ✭ 42 (-60.75%)
MathematicavsrExample projects, code, and documents for comparing Mathematica with R.
Stars: ✭ 41 (-61.68%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (+861.68%)
25daysinmachinelearningI will update this repository to learn Machine learning with python with statistics content and materials
Stars: ✭ 53 (-50.47%)
TiledbThe Universal Storage Engine
Stars: ✭ 1,072 (+901.87%)
OpenrefineOpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+7872.9%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-61.68%)
Ppd599USC urban data science course series with Python and Jupyter
Stars: ✭ 1,062 (+892.52%)
LifetimesLifetime value in Python
Stars: ✭ 1,082 (+911.21%)
DatacomparerdataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-45.79%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-42.06%)
Neural prophetNeuralProphet - A simple forecasting model based on Neural Networks in PyTorch
Stars: ✭ 1,125 (+951.4%)
GraphiaA visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-37.38%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-39.25%)
Linkedingiveaway👨🏽🏫You can learn about anything over here. What Giveaways I do and why it's important in today's modern world. Are you interested in Giveaway's?🔋
Stars: ✭ 67 (-37.38%)