SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-33.05%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+4068.64%)
SocratA Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-77.97%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+735.59%)
ResourcesPyMC3 educational resources
Stars: ✭ 930 (+688.14%)
Mlcourse.aiOpen Machine Learning Course
Stars: ✭ 7,963 (+6648.31%)
Janitorsimple tools for data cleaning in R
Stars: ✭ 981 (+731.36%)
Seaborn TutorialThis repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-3.39%)
MlboxMLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+916.1%)
Hyperlearn50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+920.34%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-4.24%)
DataframeC++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+601.69%)
Scikit Learnscikit-learn: machine learning in Python
Stars: ✭ 48,322 (+40850.85%)
Ai Expert RoadmapRoadmap to becoming an Artificial Intelligence Expert in 2021
Stars: ✭ 15,441 (+12985.59%)
Steppy ToolkitCurated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-82.2%)
XdaR package for exploratory data analysis
Stars: ✭ 112 (-5.08%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+623.73%)
DatacomparerdataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-50.85%)
GraphiaA visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-43.22%)
Tsrepr TSrepr: R package for time series representations
Stars: ✭ 75 (-36.44%)
Datacamp🍧 A repository that contains courses I have taken on DataCamp
Stars: ✭ 69 (-41.53%)
Fklearnfklearn: Functional Machine Learning
Stars: ✭ 1,305 (+1005.93%)
BlurrData transformations for the ML era
Stars: ✭ 96 (-18.64%)
Cookbook 2ndIPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+496.61%)
SkdataPython tools for data analysis
Stars: ✭ 16 (-86.44%)
DataprooferA proofreader for your data
Stars: ✭ 628 (+432.2%)
Model Describermodel-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-81.36%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+36030.51%)
NfstreamNFStream: a Flexible Network Data Analysis Framework.
Stars: ✭ 622 (+427.12%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1468.64%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+6958.47%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+632.2%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-9.32%)
Mlj.jlA Julia machine learning framework
Stars: ✭ 982 (+732.2%)
MathematicavsrExample projects, code, and documents for comparing Mathematica with R.
Stars: ✭ 41 (-65.25%)
ElkiELKI Data Mining Toolkit
Stars: ✭ 613 (+419.49%)
OpenrefineOpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+7129.66%)
Drake ExamplesExample workflows for the drake R package
Stars: ✭ 57 (-51.69%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (+811.86%)
Dream3dData Analysis program and framework for materials science data analytics, based on the managing framework SIMPL framework.
Stars: ✭ 73 (-38.14%)
TiledbThe Universal Storage Engine
Stars: ✭ 1,072 (+808.47%)
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+1002.54%)
FlyteAccelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+952.54%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1033.9%)
DexDex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (+949.15%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-7.63%)
Imbalanced LearnA Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+4660.17%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+400%)
Gopup数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+941.53%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+1184.75%)