Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tgsmith61591 → Skoot

tgsmith61591 / Skoot

Licence: mit

A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning data-science pandas scikit-learn

Projects that are alternatives of or similar to Skoot

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (+4294%)

Mutual labels: data-science, pandas, scikit-learn

Datacamp Python Data Science Track

All the slides, accompanying code and exercises all stored in this repo. 🎈

Stars: ✭ 250 (+400%)

Mutual labels: data-science, pandas, scikit-learn

Data Science Projects With Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

Stars: ✭ 198 (+296%)

Mutual labels: data-science, pandas, scikit-learn

AsyncIO serving for data science models

Stars: ✭ 18 (-64%)

Mutual labels: data-science, pandas, scikit-learn

brief introduction to Python for machine learning

Stars: ✭ 29 (-42%)

Mutual labels: data-science, pandas, scikit-learn

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+2932%)

Mutual labels: data-science, pandas, scikit-learn

Association Rule Mining from Spatial Data for Crime Analysis

Stars: ✭ 20 (-60%)

Mutual labels: data-science, pandas, scikit-learn

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+6204%)

Mutual labels: data-science, pandas, scikit-learn

Data Science Portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Stars: ✭ 559 (+1018%)

Mutual labels: data-science, pandas, scikit-learn

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+43996%)

Mutual labels: data-science, pandas, scikit-learn

Pymc Example Project

Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.

Stars: ✭ 90 (+80%)

Mutual labels: data-science, pandas, scikit-learn

Machinelearningcourse

A collection of notebooks of my Machine Learning class written in python 3

Stars: ✭ 35 (-30%)

Mutual labels: data-science, pandas, scikit-learn

Ds and ml projects

Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.

Stars: ✭ 56 (+12%)

Mutual labels: data-science, pandas, scikit-learn

Python Cheat Sheet

Python Cheat Sheet NumPy, Matplotlib

Stars: ✭ 1,739 (+3378%)

Mutual labels: data-science, pandas, scikit-learn

Compilation of R and Python programming codes on the Data Professor YouTube channel.

Stars: ✭ 287 (+474%)

Mutual labels: data-science, pandas, scikit-learn

Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost

Stars: ✭ 564 (+1028%)

Mutual labels: data-science, pandas, scikit-learn

Open Machine Learning Course

Stars: ✭ 7,963 (+15826%)

Mutual labels: data-science, pandas, scikit-learn

Machinelearningstocks

Using python and scikit-learn to make stock predictions

Stars: ✭ 897 (+1694%)

Mutual labels: data-science, scikit-learn

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Stars: ✭ 1,014 (+1928%)

Mutual labels: data-science, scikit-learn

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Stars: ✭ 23 (-54%)

Mutual labels: data-science, pandas

View All Similar Projects ➔

skoot

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to expedite data munging and pre-processing tasks that can tend to take up so much of data science practitioners' time. See the documentation for more info.

Note that skoot is the preferred alternative to the now deprecated skutil library

Two minutes to model-readiness

Real world data is nasty. Most data scientists spend the majority of their time tackling data cleansing tasks. With skoot, we can automate away so much of the bespoke hacking solutions that consume data scientists' time.

In this example, we'll examine a common dataset (the adult dataset from the UCI machine learning repo) that requires significant pre-processing.

from skoot.datasets import load_adult_df
from skoot.feature_selection import FeatureFilter
from skoot.decomposition import SelectivePCA
from skoot.preprocessing import DummyEncoder
from skoot.utils.dataframe import get_numeric_columns
from skoot.utils.dataframe import get_categorical_columns
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# load the dataset with the skoot-native loader & split it
adult = load_adult_df(tgt_name="target")
y = adult.pop("target")
X_train, X_test, y_train, y_test = train_test_split(
    adult, y, random_state=42, test_size=0.2)
    
# get numeric and categorical feature names
num_cols = get_numeric_columns(X_train).columns
obj_cols = get_categorical_columns(X_train).columns

# remove the education-num from the num_cols since we're going to remove it
num_cols = num_cols[~(num_cols == "education-num")]
    
# build a pipeline
pipe = Pipeline([
    # drop out the ordinal level that's otherwise equal to "education"
    ("dropper", FeatureFilter(cols=["education-num"])),
    
    # decompose the numeric features with PCA
    ("pca", SelectivePCA(cols=num_cols)),
    
    # dummy encode the categorical features
    ("dummy", DummyEncoder(cols=obj_cols, handle_unknown="ignore")),
    
    # and a simple classifier class
    ("clf", RandomForestClassifier(n_estimators=100, random_state=42))
])

pipe.fit(X_train, y_train)

# produce predictions
preds = pipe.predict(X_test)
print("Test accuracy: %.3f" % accuracy_score(y_test, preds))

For more tutorials, check out the documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 50

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗