All Projects → IntelPython → Daal4py

IntelPython / Daal4py

Licence: apache-2.0
sources for daal4py - a convenient Python API to oneDAL

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Daal4py

Onedal
oneAPI Data Analytics Library (oneDAL)
Stars: ✭ 382 (+238.05%)
Mutual labels:  hacktoberfest, data-analysis, machine-learning-algorithms
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-80.53%)
Mutual labels:  data-analysis, scikit-learn, machine-learning-algorithms
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+2515.93%)
Mutual labels:  data-analysis, scikit-learn, machine-learning-algorithms
Skll
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
Stars: ✭ 523 (+362.83%)
Mutual labels:  hacktoberfest, scikit-learn
Articles
A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
Stars: ✭ 350 (+209.73%)
Mutual labels:  data-analysis, machine-learning-algorithms
Algorithmsanddatastructure
Algorithms And DataStructure Implemented In Python & CPP, Give a Star 🌟If it helps you
Stars: ✭ 400 (+253.98%)
Mutual labels:  hacktoberfest, machine-learning-algorithms
Data-Analysis
Different types of data analytics projects : EDA, PDA, DDA, TSA and much more.....
Stars: ✭ 22 (-80.53%)
Mutual labels:  machine-learning-algorithms, data-analysis
Spring2017 proffosterprovost
Introduction to Data Science
Stars: ✭ 18 (-84.07%)
Mutual labels:  data-analysis, machine-learning-algorithms
Awesome Python Data Science
Probably the best curated list of data science software in Python.
Stars: ✭ 812 (+618.58%)
Mutual labels:  data-analysis, scikit-learn
100 Days Of Ml Code
100 Days of ML Coding
Stars: ✭ 33,641 (+29670.8%)
Mutual labels:  scikit-learn, machine-learning-algorithms
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-50.44%)
Mutual labels:  scikit-learn, machine-learning-algorithms
Zat
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+168.14%)
Mutual labels:  data-analysis, scikit-learn
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+6946.9%)
Mutual labels:  data-analysis, scikit-learn
Modal
A modular active learning framework for Python
Stars: ✭ 1,148 (+915.93%)
Mutual labels:  scikit-learn, machine-learning-algorithms
Machinejs
[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
Stars: ✭ 412 (+264.6%)
Mutual labels:  scikit-learn, machine-learning-algorithms
Netket
Machine learning algorithms for many-body quantum systems
Stars: ✭ 256 (+126.55%)
Mutual labels:  hacktoberfest, machine-learning-algorithms
C
Collection of various algorithms in mathematics, machine learning, computer science, physics, etc implemented in C for educational purposes.
Stars: ✭ 11,897 (+10428.32%)
Mutual labels:  hacktoberfest, machine-learning-algorithms
genie
Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (-81.42%)
Mutual labels:  machine-learning-algorithms, data-analysis
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-69.91%)
Mutual labels:  machine-learning-algorithms, data-analysis
Innovative Hacktober
Make a pull request. Let's hack the ocktober in an innovative way.
Stars: ✭ 34 (-69.91%)
Mutual labels:  hacktoberfest, machine-learning-algorithms

daal4py - A Convenient Python API to the Intel(R) oneAPI Data Analytics Library

Build Status Coverity Scan Build Status Join the community on GitHub Discussions PyPI Version Conda Version

A simplified API to Intel(R) oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel(R) oneAPI Data Analytics Library for either direct usage or integration into one's own framework and extending this beyond by providing drop-in paching for scikit-learn.

Running full scikit-learn test suite with daal4py optimization patches:

  • CircleCI when applied to scikit-learn from PyPi
  • CircleCI when applied to build from master branch

👀 Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of daal4py. Here are our latest blogs:

🔗 Important links

💬 Support

Report issues, ask questions, and provide suggestions using:

You may reach out to project maintainers privately at [email protected]

🛠 Installation

daal4py is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.

# PyPi
pip install daal4py
# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install daal4py -c conda-forge
# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python)
conda install daal4py -c intel
[Click to expand] ℹ️ Supported configurations

📦 PyPi channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU, GPU] [CPU, GPU] [CPU, GPU] [CPU, GPU]
Windows [CPU, GPU] [CPU, GPU] [CPU, GPU] [CPU, GPU]
OsX [CPU] [CPU] [CPU] [CPU]

📦 Anaconda Cloud: Conda-Forge channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU] [CPU] [CPU] [CPU]
Windows [CPU] [CPU] [CPU] [CPU]
OsX

📦 Anaconda Cloud: Intel channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU, GPU]
Windows [CPU, GPU]
OsX [CPU]

You can build daal4py from sources as well.

⚡️ Get Started

Accelerate scikit-learn with the core functionality of daal4py without changing the code.

Intel CPU optimizations patching

import numpy as np
from daal4py.sklearn import patch_sklearn
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Intel CPU/GPU optimizations patching

import numpy as np
from daal4py.sklearn import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

🚀 Scikit-learn patching

Speedups of daal4py-powered Scikit-learn over the original Scikit-learn
Technical details: float type: float64; HW: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.23.1, Intel® oneDAl (2021.1 Beta 10)

daal4py patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, daal4py fallbacks into stock Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.

⚠️ We support optimizations for the last four versions of scikit-learn. The latest release of daal4py-2021.1 supports scikit-learn 0.21.X, 0.22.X, 0.23.X and 0.24.X.

[Click to expand] 🔥 Applying the daal4py patch will impact the following existing scikit-learn algorithms:
Task Functionality Parameters support Data support
Classification SVC All parameters except kernel = 'poly' and 'sigmoid'. No limitations.
RandomForestClassifier All parameters except warmstart = True and cpp_alpha != 0, criterion != 'gini'. Multi-output and sparse data is not supported.
KNeighborsClassifier All parameters except metric != 'euclidean' or minkowski with p = 2. Multi-output and sparse data is not supported.
LogisticRegression / LogisticRegressionCV All parameters except solver != 'lbfgs' or 'newton-cg', class_weight != None, sample_weight != None. Only dense data is supported.
Regression RandomForestRegressor All parameters except warmstart = True and cpp_alpha != 0, criterion != 'mse'. Multi-output and sparse data is not supported.
KNeighborsRegressor All parameters except metric != 'euclidean' or minkowski with p = 2. Sparse data is not supported.
LinearRegression All parameters except normalize != False and sample_weight != None. Only dense data is supported, #observations should be >= #features.
Ridge All parameters except normalize != False, solver != 'auto' and sample_weight != None. Only dense data is supported, #observations should be >= #features.
ElasticNet All parameters except sample_weight != None. Multi-output and sparse data is not supported, #observations should be >= #features.
Lasso All parameters except sample_weight != None. Multi-output and sparse data is not supported, #observations should be >= #features.
Clustering KMeans All parameters except precompute_distances and sample_weight != None. No limitations.
DBSCAN All parameters except metric != 'euclidean' or minkowski with p = 2. Only dense data is supported.
Dimensionality reduction PCA All parameters except svd_solver != 'full'. No limitations.
TSNE All parameters except metric != 'euclidean' or minkowski with p = 2. Sparse data is not supported.
Unsupervised NearestNeighbors All parameters except metric != 'euclidean' or minkowski with p = 2. Sparse data is not supported.
Other train_test_split All parameters are supported. Only dense data is supported.
assert_all_finite All parameters are supported. Only dense data is supported.
pairwise_distance With metric='cosine' and 'correlation'. Only dense data is supported.

Scenarios that are only available in the master branch (not released yet):

Task Functionality Parameters support Data support
Other roc_auc_score Parameters average, sample_weight, max_fpr and multi_class are not supported. No limitations.

📜 scikit-learn verbose

To find out which implementation of the algorithm is currently used (daal4py or stock Scikit-learn), set the environment variable:

  • On Linux and Mac OS: export IDP_SKLEARN_VERBOSE=INFO
  • On Windows: set IDP_SKLEARN_VERBOSE=INFO

For example, for DBSCAN you get one of these print statements depending on which implementation is used:

  • INFO: sklearn.cluster.DBSCAN.fit: uses Intel(R) oneAPI Data Analytics Library solver
  • INFO: sklearn.cluster.DBSCAN.fit: uses original Scikit-learn solver

Read more in the documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].