All Projects → microsoft → Nimbusml

microsoft / Nimbusml

Licence: other
Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Nimbusml

Sk Dist
Distributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-1.89%)
Mutual labels:  data-science, scikit-learn, ml
Machinejs
[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
Stars: ✭ 412 (+55.47%)
Mutual labels:  data-science, scikit-learn, ml
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+144.53%)
Mutual labels:  data-science, scikit-learn, ml
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-26.79%)
Mutual labels:  data-science, scikit-learn, ml
Machinelearningcourse
A collection of notebooks of my Machine Learning class written in python 3
Stars: ✭ 35 (-86.79%)
Mutual labels:  data-science, scikit-learn, ml
Lale
Library for Semi-Automated Data Science
Stars: ✭ 198 (-25.28%)
Mutual labels:  data-science, scikit-learn
Eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Stars: ✭ 2,477 (+834.72%)
Mutual labels:  data-science, scikit-learn
Data Science Free
Free Resources For Data Science created by Shubham Kumar
Stars: ✭ 232 (-12.45%)
Mutual labels:  data-science, ml
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+1015.47%)
Mutual labels:  data-science, scikit-learn
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+1089.43%)
Mutual labels:  data-science, scikit-learn
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (-5.66%)
Mutual labels:  data-science, scikit-learn
Data Science Projects With Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
Stars: ✭ 198 (-25.28%)
Mutual labels:  data-science, scikit-learn
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (-2.26%)
Mutual labels:  data-science, ml
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-17.74%)
Mutual labels:  data-science, scikit-learn
Virgilio
Virgilio is developed and maintained by these awesome people. You can email us virgilio.datascience (at) gmail.com or join the Discord chat.
Stars: ✭ 13,200 (+4881.13%)
Mutual labels:  data-science, scikit-learn
Hyperactive
A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.
Stars: ✭ 182 (-31.32%)
Mutual labels:  data-science, scikit-learn
ML-For-Beginners
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Stars: ✭ 40,023 (+15003.02%)
Mutual labels:  scikit-learn, ml
NimbusML-Samples
Samples for NimbusML, a Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.
Stars: ✭ 31 (-88.3%)
Mutual labels:  scikit-learn, ml
Awesome Mlops
😎 A curated list of awesome MLOps tools
Stars: ✭ 258 (-2.64%)
Mutual labels:  data-science, ml
Scikit Plot
An intuitive library to add plotting functionality to scikit-learn objects.
Stars: ✭ 2,162 (+715.85%)
Mutual labels:  data-science, scikit-learn

NimbusML

nimbusml is a Python module that provides Python bindings for ML.NET.

ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel, and others. nimbusml was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.

nimbusml enables training ML.NET pipelines or integrating ML.NET components directly into scikit-learn pipelines. It adheres to existing scikit-learn conventions, allowing simple interoperability between nimbusml and scikit-learn components, while adding a suite of fast, highly optimized, and scalable algorithms, transforms, and components written in C++ and C#.

See examples below showing interoperability with scikit-learn. A more detailed example in the documentation shows how to use a nimbusml component in a scikit-learn pipeline, and create a pipeline using only nimbusml components.

nimbusml supports numpy.ndarray, scipy.sparse_cst, and pandas.DataFrame as inputs. In addition, nimbusml also supports streaming from files without loading the dataset into memory with FileDataStream, which allows training on data significantly exceeding memory.

Documentation can be found here and additional notebook samples can be found here.

Installation

nimbusml runs on Windows, Linux, and macOS.

nimbusml requires Python 2.7, 3.5, 3.6, 3.7 64 bit version only.

Install nimbusml using pip with:

pip install nimbusml

nimbusml has been reported to work on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7.

Examples

Here is an example of how to train a model to predict sentiment from text samples (based on this ML.NET example). The full code for this example is here.

from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.text import NGramFeaturizer

train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = FileDataStream.read_csv(train_file, sep='\t')
test_data = FileDataStream.read_csv(test_file, sep='\t')

pipeline = Pipeline([ # nimbusml pipeline
    NGramFeaturizer(columns={'Features': ['Text']}),
    FastTreesBinaryClassifier(feature=['Features'], label='Label')
])

# fit and predict
pipeline.fit(train_data)
results = pipeline.predict(test_data)

Instead of creating an nimbusml pipeline, you can also integrate components into scikit-learn pipelines:

from sklearn.pipeline import Pipeline
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = pd.read_csv(train_file, sep='\t')
test_data = pd.read_csv(test_file, sep='\t')

pipeline = Pipeline([ # sklearn pipeline
    ('tfidf', TfidfVectorizer()), # sklearn transform
    ('clf', FastTreesBinaryClassifier()) # nimbusml learner
])

# fit and predict
pipeline.fit(train_data["Text"], train_data["Label"])
results = pipeline.predict(test_data["Text"])

Many additional examples and tutorials can be found in the documentation.

Building

To build nimbusml from source please visit our developer guide.

Contributing

The contributions guide can be found here.

Support

If you have an idea for a new feature or encounter a problem, please open an issue in this repository or ask your question on Stack Overflow.

License

NimbusML is licensed under the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].