All Projects → justmarkham → Scikit Learn Tips

justmarkham / Scikit Learn Tips

🤖⚡️ scikit-learn tips

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scikit Learn Tips

Data Science Cookbook
🎓 Jupyter notebooks from UFC data science course
Stars: ✭ 60 (-95.01%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Scikit Learn Videos
Jupyter notebooks from the scikit-learn video series
Stars: ✭ 3,254 (+170.49%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-81.88%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+0.08%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Crime Analysis
Association Rule Mining from Spatial Data for Crime Analysis
Stars: ✭ 20 (-98.34%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Data Science Projects With Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
Stars: ✭ 198 (-83.54%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Sklearn Evaluation
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
Stars: ✭ 294 (-75.56%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+82.63%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Python Machine Learning Book 2nd Edition
The "Python Machine Learning (2nd edition)" book code repository and info resource
Stars: ✭ 6,422 (+433.83%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Data Science Portfolio
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (-53.53%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Python Hierarchical Clustering Exercises
Exercises for hierarchical clustering with Python 3 and scipy as Jupyter Notebooks
Stars: ✭ 62 (-94.85%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Machinelearningcourse
A collection of notebooks of my Machine Learning class written in python 3
Stars: ✭ 35 (-97.09%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-83.87%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Stars: ✭ 2,477 (+105.9%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Virgilio
Virgilio is developed and maintained by these awesome people. You can email us virgilio.datascience (at) gmail.com or join the Discord chat.
Stars: ✭ 13,200 (+997.26%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (-76.14%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Python Machine Learning Book
The "Python Machine Learning (1st edition)" book code repository and info resource
Stars: ✭ 11,428 (+849.96%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Ml Workspace
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Stars: ✭ 2,337 (+94.26%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Thesemicolon
This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.
Stars: ✭ 345 (-71.32%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn
Python for ml
brief introduction to Python for machine learning
Stars: ✭ 29 (-97.59%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn

🤖⚡ scikit-learn tips

New tips are posted on LinkedIn, Twitter, and Facebook.

👉 Sign up to receive 2 video tips by email every week! 👈

List of all tips

Click to discuss the tip on LinkedIn, click to view the Jupyter notebook for a tip, or click to watch the tip video on YouTube:

# Description Links
1 Use ColumnTransformer to apply different preprocessing to different columns
2 Seven ways to select columns using ColumnTransformer
3 What is the difference between "fit" and "transform"?
4 Use "fit_transform" on training data, but "transform" (only) on testing/new data
5 Four reasons to use scikit-learn (not pandas) for ML preprocessing
6 Encode categorical features using OneHotEncoder or OrdinalEncoder
7 Handle unknown categories with OneHotEncoder by encoding them as zeros
8 Use Pipeline to chain together multiple steps
9 Add a missing indicator to encode "missingness" as a feature
10 Set a "random_state" to make your code reproducible
11 Impute missing values using KNNImputer or IterativeImputer
12 What is the difference between Pipeline and make_pipeline?
13 Examine the intermediate steps in a Pipeline
14 HistGradientBoostingClassifier natively supports missing values
15 Three reasons not to use drop='first' with OneHotEncoder
16 Use cross_val_score and GridSearchCV on a Pipeline
17 Try RandomizedSearchCV if GridSearchCV is taking too long
18 Display GridSearchCV or RandomizedSearchCV results in a DataFrame
19 Important tuning parameters for LogisticRegression
20 Plot a confusion matrix
21 Compare multiple ROC curves in a single plot
22 Use the correct methods for each type of Pipeline
23 Display the intercept and coefficients for a linear model
24 Visualize a decision tree two different ways
25 Prune a decision tree to avoid overfitting
26 Use stratified sampling with train_test_split
27 Two ways to impute missing values for a categorical feature
28 Save a model or Pipeline using joblib
29 Vectorize two text columns in a ColumnTransformer
30 Four ways to examine the steps of a Pipeline
31 Shuffle your dataset when using cross_val_score
32 Use AUC to evaluate multiclass problems
33 Use FunctionTransformer to convert functions into transformers
34 Add feature selection to a Pipeline
35 Don't use .values when passing a pandas object to scikit-learn
36 Most parameters should be passed as keyword arguments
37 Create an interactive diagram of a Pipeline in Jupyter
38 Get the feature names output by a ColumnTransformer
39 Load a toy dataset into a DataFrame
40 Estimators only print parameters that have been changed
41 Drop the first category from binary features (only) with OneHotEncoder
42 Passthrough some columns and drop others in a ColumnTransformer
43 Use OrdinalEncoder instead of OneHotEncoder with tree-based models
44 Speed up GridSearchCV using parallel processing
45 Create feature interactions using PolynomialFeatures
46 Ensemble multiple models using VotingClassifer or VotingRegressor
47 Tune the parameters of a VotingClassifer or VotingRegressor
48 Access part of a Pipeline using slicing
49 Tune multiple models simultaneously with GridSearchCV
50 Adapt this pattern to solve many Machine Learning problems

You can interact with all of these notebooks online using Binder:

Note: Some of the tips do not include any code, and can only be viewed on LinkedIn.

Who creates these tips?

Hi! I'm Kevin Markham, the founder of Data School. I've been teaching data science in Python since 2014. I create these tips because I love using scikit-learn and I want to help others use it more effectively.

How can I get better at scikit-learn?

I teach three courses:

👉 Find out which course is right for you! 👈

Do you have any other tips?

Yes! In 2019, I posted 100 pandas tricks. I also created a video featuring my top 25 pandas tricks.

© 2020 Data School. All rights reserved.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].