Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → alegonz → Baikal

alegonz / Baikal

Licence: bsd-3-clause

A graph-based functional API for building complex scikit-learn pipelines.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning data-science scikit-learn

Projects that are alternatives of or similar to Baikal

Compilation of R and Python programming codes on the Data Professor YouTube channel.

Stars: ✭ 287 (-49.91%)

Mutual labels: data-science, scikit-learn

This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.

Stars: ✭ 345 (-39.79%)

Mutual labels: data-science, scikit-learn

AutoGluon: AutoML for Text, Image, and Tabular Data

Stars: ✭ 3,920 (+584.12%)

Mutual labels: data-science, scikit-learn

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (-54.62%)

Mutual labels: data-science, scikit-learn

Data Science Portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Stars: ✭ 559 (-2.44%)

Mutual labels: data-science, scikit-learn

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

Stars: ✭ 265 (-53.75%)

Mutual labels: data-science, scikit-learn

Scikit Learn Videos

Jupyter notebooks from the scikit-learn video series

Stars: ✭ 3,254 (+467.89%)

Mutual labels: data-science, scikit-learn

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (-61.95%)

Mutual labels: data-science, scikit-learn

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+3747.82%)

Mutual labels: data-science, scikit-learn

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

Stars: ✭ 412 (-28.1%)

Mutual labels: data-science, scikit-learn

Datacamp Python Data Science Track

All the slides, accompanying code and exercises all stored in this repo. 🎈

Stars: ✭ 250 (-56.37%)

Mutual labels: data-science, scikit-learn

Framework for setting up predictive analytics services

Stars: ✭ 481 (-16.06%)

Mutual labels: data-science, scikit-learn

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+450.09%)

Mutual labels: data-science, scikit-learn

MLOps for AWS SageMaker. www.sagifyml.com

Stars: ✭ 277 (-51.66%)

Mutual labels: data-science, scikit-learn

a delightful machine learning tool that allows you to train, test, and use models without writing code

Stars: ✭ 2,956 (+415.88%)

Mutual labels: data-science, scikit-learn

Sklearn Evaluation

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

Stars: ✭ 294 (-48.69%)

Mutual labels: data-science, scikit-learn

Library for Semi-Automated Data Science

Stars: ✭ 198 (-65.45%)

Mutual labels: data-science, scikit-learn

A library for debugging/inspecting machine learning classifiers and explaining their predictions

Stars: ✭ 2,477 (+332.29%)

Mutual labels: data-science, scikit-learn

A unified framework for machine learning with time series

Stars: ✭ 4,741 (+727.4%)

Mutual labels: data-science, scikit-learn

Best Of Ml Python

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

Stars: ✭ 6,057 (+957.07%)

Mutual labels: data-science, scikit-learn

View All Similar Projects ➔

A graph-based functional API for building complex scikit-learn pipelines

baikal is written in pure Python. It supports Python 3.5 and above.

Note: baikal is still a young project and there might be backward incompatible changes. The next development steps and backwards-incompatible changes are announced and discussed in this issue. Please subscribe to it if you use baikal.

What is baikal?

baikal is a graph-based, functional API for building complex machine learning pipelines of objects that implement the scikit-learn API. It is mostly inspired on the excellent Keras API for Deep Learning, and borrows a few concepts from the TensorFlow framework and the (perhaps lesser known) graphkit package.

baikal aims to provide an API that allows to build complex, non-linear machine learning pipelines that look like this:

with code that looks like this:

x1 = Input()
x2 = Input()
y_t = Input()

y1 = ExtraTreesClassifier()(x1, y_t)
y2 = RandomForestClassifier()(x2, y_t)
z = PowerTransformer()(x2)
z = PCA()(z)
y3 = LogisticRegression()(z, y_t)

ensemble_features = Stack()([y1, y2, y3])
y = SVC()(ensemble_features, y_t)

model = Model([x1, x2], y, y_t)

What can I do with it?

With baikal you can

build non-linear pipelines effortlessly
handle multiple inputs and outputs
add steps that operate on targets as part of the pipeline
nest pipelines
use prediction probabilities (or any other kind of output) as inputs to other steps in the pipeline
query intermediate outputs, easing debugging
freeze steps that do not require fitting
define and add custom steps easily
plot pipelines

All with boilerplate-free, readable code.

Why baikal?

The pipeline above (to the best of the author's knowledge) cannot be easily built using scikit-learn's composite estimators API as you encounter some limitations:

It is aimed at linear pipelines
- You could add some step parallelism with the ColumnTransformer API, but this is limited to transformer objects.
Classifiers/Regressors can only be used at the end of the pipeline.
- This means we cannot use the predicted labels (or their probabilities) as features to other classifiers/regressors.
- You could leverage mlxtend's StackingClassifier and come up with some clever combination of the above composite estimators (Pipelines, ColumnTransformers, and StackingClassifiers, etc), but you might end up with code that feels hard-to-follow and verbose.
Cannot handle multiple input/multiple output models.

Perhaps you could instead define a big, composite estimator class that integrates each of the pipeline steps through composition. This, however, most likely will require

writing big __init__ methods to control each of the internal steps' knobs;
being careful with get_params and set_params if you want to use, say, GridSearchCV;
and adding some boilerplate code if you want to access the outputs of intermediate steps for debugging.

By using baikal as shown in the example above, code can be more readable, less verbose and closer to our mental representation of the pipeline. baikal also provides an API to fit, predict with, and query the entire pipeline with single commands.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 573

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗