Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+7336.61%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1552.68%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-2.68%)
Ai Expert RoadmapRoadmap to becoming an Artificial Intelligence Expert in 2021
Stars: ✭ 15,441 (+13686.61%)
DataprepDataPrep — The easiest way to prepare data in Python
Stars: ✭ 639 (+470.54%)
ResourcesPyMC3 educational resources
Stars: ✭ 930 (+730.36%)
Model Describermodel-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-80.36%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+671.43%)
OpenrefineOpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+7516.96%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+37966.07%)
DataprooferA proofreader for your data
Stars: ✭ 628 (+460.71%)
Imbalanced LearnA Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+4915.18%)
SkdataPython tools for data analysis
Stars: ✭ 16 (-85.71%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-4.46%)
Cookbook 2ndIPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+528.57%)
TiledbThe Universal Storage Engine
Stars: ✭ 1,072 (+857.14%)
DatacomparerdataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-48.21%)
Datacamp🍧 A repository that contains courses I have taken on DataCamp
Stars: ✭ 69 (-38.39%)
Dream3dData Analysis program and framework for materials science data analytics, based on the managing framework SIMPL framework.
Stars: ✭ 73 (-34.82%)
DexDex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (+1005.36%)
Cookbook 2nd CodeCode of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (+383.04%)
Data Science Your WayWays of doing Data Science Engineering and Machine Learning in R and Python
Stars: ✭ 530 (+373.21%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+4636.61%)
RumaleRumale is a machine learning library in Ruby
Stars: ✭ 526 (+369.64%)
NfstreamNFStream: a Flexible Network Data Analysis Framework.
Stars: ✭ 622 (+455.36%)
ElkiELKI Data Mining Toolkit
Stars: ✭ 613 (+447.32%)
DapyEasy-to-use data analysis / manipulation framework for humans
Stars: ✭ 523 (+366.96%)
LuxPython API for Intelligent Visual Data Discovery
Stars: ✭ 787 (+602.68%)
DataframeC++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+639.29%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+662.5%)
SocratA Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-76.79%)
FlyteAccelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+1008.93%)
Fklearnfklearn: Functional Machine Learning
Stars: ✭ 1,305 (+1065.18%)
Knowledge RepoA next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+4325%)
MathematicavsrExample projects, code, and documents for comparing Mathematica with R.
Stars: ✭ 41 (-63.39%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (+860.71%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+780.36%)
GraphiaA visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-40.18%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1094.64%)
Janitorsimple tools for data cleaning in R
Stars: ✭ 981 (+775.89%)
Gopup数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+997.32%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-29.46%)
Kaggle CompetitionsThere are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-23.21%)
Hyperlearn50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+975%)
GopGoPlus - The Go+ language for engineering, STEM education, and data science
Stars: ✭ 7,829 (+6890.18%)
Awesome RA curated list of awesome R packages, frameworks and software.
Stars: ✭ 4,858 (+4237.5%)
Mlcourse.aiOpen Machine Learning Course
Stars: ✭ 7,963 (+7009.82%)
Tsrepr TSrepr: R package for time series representations
Stars: ✭ 75 (-33.04%)