All Projects → tgsmith61591 → Skoot

tgsmith61591 / Skoot

Licence: mit
A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Skoot

Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+4294%)
Mutual labels:  data-science, pandas, scikit-learn
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (+400%)
Mutual labels:  data-science, pandas, scikit-learn
Data Science Projects With Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
Stars: ✭ 198 (+296%)
Mutual labels:  data-science, pandas, scikit-learn
Foxcross
AsyncIO serving for data science models
Stars: ✭ 18 (-64%)
Mutual labels:  data-science, pandas, scikit-learn
Python for ml
brief introduction to Python for machine learning
Stars: ✭ 29 (-42%)
Mutual labels:  data-science, pandas, scikit-learn
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+2932%)
Mutual labels:  data-science, pandas, scikit-learn
Crime Analysis
Association Rule Mining from Spatial Data for Crime Analysis
Stars: ✭ 20 (-60%)
Mutual labels:  data-science, pandas, scikit-learn
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+6204%)
Mutual labels:  data-science, pandas, scikit-learn
Data Science Portfolio
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (+1018%)
Mutual labels:  data-science, pandas, scikit-learn
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+43996%)
Mutual labels:  data-science, pandas, scikit-learn
Pymc Example Project
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
Stars: ✭ 90 (+80%)
Mutual labels:  data-science, pandas, scikit-learn
Machinelearningcourse
A collection of notebooks of my Machine Learning class written in python 3
Stars: ✭ 35 (-30%)
Mutual labels:  data-science, pandas, scikit-learn
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (+12%)
Mutual labels:  data-science, pandas, scikit-learn
Python Cheat Sheet
Python Cheat Sheet NumPy, Matplotlib
Stars: ✭ 1,739 (+3378%)
Mutual labels:  data-science, pandas, scikit-learn
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (+474%)
Mutual labels:  data-science, pandas, scikit-learn
Alphapy
Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
Stars: ✭ 564 (+1028%)
Mutual labels:  data-science, pandas, scikit-learn
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+15826%)
Mutual labels:  data-science, pandas, scikit-learn
Machinelearningstocks
Using python and scikit-learn to make stock predictions
Stars: ✭ 897 (+1694%)
Mutual labels:  data-science, scikit-learn
Sklearn Porter
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
Stars: ✭ 1,014 (+1928%)
Mutual labels:  data-science, scikit-learn
Boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-54%)
Mutual labels:  data-science, pandas

codecov Supported versions Supported versions Supported versions CircleCI Build Status

skoot

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to expedite data munging and pre-processing tasks that can tend to take up so much of data science practitioners' time. See the documentation for more info.

Note that skoot is the preferred alternative to the now deprecated skutil library

Two minutes to model-readiness

Real world data is nasty. Most data scientists spend the majority of their time tackling data cleansing tasks. With skoot, we can automate away so much of the bespoke hacking solutions that consume data scientists' time.

In this example, we'll examine a common dataset (the adult dataset from the UCI machine learning repo) that requires significant pre-processing.

from skoot.datasets import load_adult_df
from skoot.feature_selection import FeatureFilter
from skoot.decomposition import SelectivePCA
from skoot.preprocessing import DummyEncoder
from skoot.utils.dataframe import get_numeric_columns
from skoot.utils.dataframe import get_categorical_columns
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# load the dataset with the skoot-native loader & split it
adult = load_adult_df(tgt_name="target")
y = adult.pop("target")
X_train, X_test, y_train, y_test = train_test_split(
    adult, y, random_state=42, test_size=0.2)
    
# get numeric and categorical feature names
num_cols = get_numeric_columns(X_train).columns
obj_cols = get_categorical_columns(X_train).columns

# remove the education-num from the num_cols since we're going to remove it
num_cols = num_cols[~(num_cols == "education-num")]
    
# build a pipeline
pipe = Pipeline([
    # drop out the ordinal level that's otherwise equal to "education"
    ("dropper", FeatureFilter(cols=["education-num"])),
    
    # decompose the numeric features with PCA
    ("pca", SelectivePCA(cols=num_cols)),
    
    # dummy encode the categorical features
    ("dummy", DummyEncoder(cols=obj_cols, handle_unknown="ignore")),
    
    # and a simple classifier class
    ("clf", RandomForestClassifier(n_estimators=100, random_state=42))
])

pipe.fit(X_train, y_train)

# produce predictions
preds = pipe.predict(X_test)
print("Test accuracy: %.3f" % accuracy_score(y_test, preds))

For more tutorials, check out the documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].