All Projects → IntelPython → scikit-learn_bench

IntelPython / scikit-learn_bench

Licence: Apache-2.0 license
scikit-learn_bench benchmarks various implementations of machine learning algorithms across data analytics frameworks. It currently support the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to scikit-learn bench

BugZoo
Keep your bugs contained. A platform for studying historical software bugs.
Stars: ✭ 49 (-30.99%)
Mutual labels:  benchmarks
wat
How fast are computers?
Stars: ✭ 26 (-63.38%)
Mutual labels:  benchmarks
jsperf
JavaScript Performance Benchmarks
Stars: ✭ 15 (-78.87%)
Mutual labels:  benchmarks
Jctools
jctools.github.io/jctools
Stars: ✭ 2,833 (+3890.14%)
Mutual labels:  benchmarks
go-ml-benchmarks
⏱ Benchmarks of machine learning inference for Go
Stars: ✭ 27 (-61.97%)
Mutual labels:  benchmarks
skiplist-survey
A comparison of skip lists written in Go
Stars: ✭ 47 (-33.8%)
Mutual labels:  benchmarks
TSForecasting
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.
Stars: ✭ 53 (-25.35%)
Mutual labels:  benchmarks
dvc-bench
Benchmarks for DVC
Stars: ✭ 17 (-76.06%)
Mutual labels:  benchmarks
Static-Sort
A simple C++ header-only library for fastest sorting of small arrays. Generates sorting networks on compile time via templates.
Stars: ✭ 30 (-57.75%)
Mutual labels:  benchmarks
miopen-benchmark
benchmarking miopen
Stars: ✭ 17 (-76.06%)
Mutual labels:  benchmarks
server-benchmarks
🚀 Cross-platform transparent benchmarks for HTTP/2 Web Servers at 2020-2023
Stars: ✭ 78 (+9.86%)
Mutual labels:  benchmarks
anybench
CPU Benchmarks Set
Stars: ✭ 54 (-23.94%)
Mutual labels:  benchmarks
LinqBenchmarks
Benchmarking LINQ and alternative implementations
Stars: ✭ 138 (+94.37%)
Mutual labels:  benchmarks
Benchmarks
Some benchmarks of different languages
Stars: ✭ 2,108 (+2869.01%)
Mutual labels:  benchmarks
scala-benchmarks
An independent set of benchmarks for testing common Scala idioms.
Stars: ✭ 65 (-8.45%)
Mutual labels:  benchmarks
IGUANA
IGUANA is a benchmark execution framework for querying HTTP endpoints and CLI Applications such as Triple Stores. Contact: [email protected]
Stars: ✭ 22 (-69.01%)
Mutual labels:  benchmarks
trie-perf
Performance shootout of various trie implementations
Stars: ✭ 18 (-74.65%)
Mutual labels:  benchmarks
pdb-benchmarks
Benchmarking common tasks on proteins in various languages and packages
Stars: ✭ 33 (-53.52%)
Mutual labels:  benchmarks
detect-gpu
Classifies GPUs based on their 3D rendering benchmark score allowing the developer to provide sensible default settings for graphically intensive applications.
Stars: ✭ 749 (+954.93%)
Mutual labels:  benchmarks
HTAPBench
Benchmark suite to evaluate HTAP database engines
Stars: ✭ 15 (-78.87%)
Mutual labels:  benchmarks

Machine Learning Benchmarks

Build Status

Machine Learning Benchmarks contains implementations of machine learning algorithms across data analytics frameworks. Scikit-learn_bench can be extended to add new frameworks and algorithms. It currently supports the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.

Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis. Here are our latest blogs:

Table of content

How to create conda environment for benchmarking

Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework.

pip install -r sklearn_bench/requirements.txt
# or
conda install -c intel scikit-learn scikit-learn-intelex pandas tqdm
conda install -c conda-forge scikit-learn daal4py pandas tqdm
conda install -c rapidsai -c conda-forge cuml pandas cudf tqdm
pip install -r xgboost_bench/requirements.txt
# or
conda install -c conda-forge xgboost scikit-learn pandas tqdm

Running Python benchmarks with runner script

Run python runner.py --configs configs/config_example.json [--output-file result.json --verbose INFO --report] to launch benchmarks.

Options:

  • --configs: specify the path to a configuration file or a folder that contains configuration files.
  • --no-intel-optimized: use Scikit-learn without Intel(R) Extension for Scikit-learn*. Now available for scikit-learn benchmarks. By default, the runner uses Intel(R) Extension for Scikit-learn.
  • --output-file: specify the name of the output file for the benchmark result. The default name is result.json
  • --report: create an Excel report based on benchmark results. The openpyxl library is required.
  • --dummy-run: run configuration parser and dataset generation without benchmarks running.
  • --verbose: WARNING, INFO, DEBUG. Print out additional information when the benchmarks are running. The default is INFO.
Level Description
DEBUG etailed information, typically of interest only when diagnosing problems. Usually at this level the logging output is so low level that it’s not useful to users who are not familiar with the software’s internals.
INFO Confirmation that things are working as expected.
WARNING An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

Benchmarks currently support the following frameworks:

  • scikit-learn
  • daal4py
  • cuml
  • xgboost

The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms.

You can configure benchmarks by editing a config file. Check config.json schema for more details.

Benchmark supported algorithms

algorithm benchmark name sklearn (CPU) sklearn (GPU) daal4py cuml xgboost
DBSCAN dbscan
RandomForestClassifier df_clfs
RandomForestRegressor df_regr
pairwise_distances distances
KMeans kmeans
KNeighborsClassifier knn_clsf
LinearRegression linear
LogisticRegression log_reg
PCA pca
Ridge ridge
SVM svm
train_test_split train_test_split
GradientBoostingClassifier gbt
GradientBoostingRegressor gbt

Scikit-learn benchmakrs

When you run scikit-learn benchmarks on CPU, Intel(R) Extension for Scikit-learn is used by default. Use the --no-intel-optimized option to run the benchmarks without the extension.

For the algorithms with both CPU and GPU support, you may use the same configuration file to run the scikit-learn benchmarks on CPU and GPU.

Algorithm parameters

You can launch benchmarks for each algorithm separately. To do this, go to the directory with the benchmark:

cd <framework>

Run the following command:

python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>

The list of supported parameters for each algorithm you can find here:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].