All Projects → Allardvm → LightGBM.jl

Allardvm / LightGBM.jl

Licence: other
LightGBM.jl provides a high-performance Julia interface for Microsoft's LightGBM.

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to LightGBM.jl

decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (+52.5%)
Mutual labels:  gbm, lightgbm
HyperGBM
A full pipeline AutoML tool for tabular data
Stars: ✭ 172 (+330%)
Mutual labels:  gbm, lightgbm
fast retraining
Show how to perform fast retraining with LightGBM in different business cases
Stars: ✭ 56 (+40%)
Mutual labels:  gbm, lightgbm
Lightgbm
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Stars: ✭ 13,293 (+33132.5%)
Mutual labels:  gbm, lightgbm
stackgbm
🌳 Stacked Gradient Boosting Machines
Stars: ✭ 24 (-40%)
Mutual labels:  gbm, lightgbm
C3T
C3T: Crash Course Category Theory - A friendly non-mathematician's approach to beginners of Category Theory. 🐱
Stars: ✭ 26 (-35%)
Mutual labels:  julia-language
openBF
1D blood flow model
Stars: ✭ 16 (-60%)
Mutual labels:  julia-language
KissABC.jl
Pure julia implementation of Multiple Affine Invariant Sampling for efficient Approximate Bayesian Computation
Stars: ✭ 28 (-30%)
Mutual labels:  julia-language
RobustTrees
[ICML 2019, 20 min long talk] Robust Decision Trees Against Adversarial Examples
Stars: ✭ 62 (+55%)
Mutual labels:  gbm
MathTeXEngine.jl
A latex math mode engine in pure Julia.
Stars: ✭ 61 (+52.5%)
Mutual labels:  julia-language
lightgbmExplainer
An R package that makes lightgbm models fully interpretable (take reference from https://github.com/AppliedDataSciencePartners/xgboostExplainer)
Stars: ✭ 22 (-45%)
Mutual labels:  lightgbm
CausalityTools.jl
Algorithms for causal inference and the detection of dynamical coupling from time series, and for approximation of the transfer operator and invariant measures.
Stars: ✭ 45 (+12.5%)
Mutual labels:  julia-language
DICOM.jl
Julia package for reading and writing DICOM (Digital Imaging and Communications in Medicine) files
Stars: ✭ 45 (+12.5%)
Mutual labels:  julia-language
JuliaPackageWithRustDep.jl
Example of a Julia Package with Rust dependency.
Stars: ✭ 65 (+62.5%)
Mutual labels:  julia-language
LatticeQCD.jl
A native Julia code for lattice QCD with dynamical fermions in 4 dimension.
Stars: ✭ 85 (+112.5%)
Mutual labels:  julia-language
Julia-sublime
Julia syntax highlighting for Sublime Text
Stars: ✭ 106 (+165%)
Mutual labels:  julia-language
mlforecast
Scalable machine 🤖 learning for time series forecasting.
Stars: ✭ 96 (+140%)
Mutual labels:  lightgbm
ml-pipeline
Using Kafka-Python to illustrate a ML production pipeline
Stars: ✭ 90 (+125%)
Mutual labels:  lightgbm
DynamicHMCExamples.jl
Examples for Bayesian inference using DynamicHMC.jl and related packages.
Stars: ✭ 33 (-17.5%)
Mutual labels:  julia-language
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (+32.5%)
Mutual labels:  lightgbm

LightGBM.jl

License

This repository has been archived, since the package has moved to a new repository.

LightGBM.jl provides a high-performance Julia interface for Microsoft's LightGBM. The packages adds several convenience features, including automated cross-validation and exhaustive search procedures, and automatically converts all LightGBM parameters that refer to indices (e.g. categorical_feature) from Julia's one-based indices to C's zero-based indices. All major operating systems (Windows, Linux, and Mac OS X) are supported.

Installation

Install the latest version of LightGBM by following the installation steps on: (https://github.com/Microsoft/LightGBM/wiki/Installation-Guide). Note that because LightGBM's C API is still under development, Upstream changes can lead to temporary incompatibilities between this package and the latest LightGBM master. To avoid this, you can build against Allardvm/LightGBM, which contains the latest LightGBM version that has been confirmed to work with this package.

Then add the package to Julia with:

Pkg.clone("https://github.com/Allardvm/LightGBM.jl.git")

To use the package, set the environment variable LIGHTGBM_PATH to point to the LightGBM directory prior to loading LightGBM.jl. This can be done for the duration of a single Julia session with:

ENV["LIGHTGBM_PATH"] = "../LightGBM"

To test the package, first set the environment variable LIGHTGBM_PATH and then call:

Pkg.test("LightGBM")

Getting started

ENV["LIGHTGBM_PATH"] = "../LightGBM"
using LightGBM

# Load LightGBM's binary classification example.
binary_test = readdlm(ENV["LIGHTGBM_PATH"] * "/examples/binary_classification/binary.test", '\t')
binary_train = readdlm(ENV["LIGHTGBM_PATH"] * "/examples/binary_classification/binary.train", '\t')
X_train = binary_train[:, 2:end]
y_train = binary_train[:, 1]
X_test = binary_test[:, 2:end]
y_test = binary_test[:, 1]

# Create an estimator with the desired parameters—leave other parameters at the default values.
estimator = LGBMBinary(num_iterations = 100,
                       learning_rate = .1,
                       early_stopping_round = 5,
                       feature_fraction = .8,
                       bagging_fraction = .9,
                       bagging_freq = 1,
                       num_leaves = 1000,
                       metric = ["auc", "binary_logloss"])

# Fit the estimator on the training data and return its scores for the test data.
fit(estimator, X_train, y_train, (X_test, y_test))

# Predict arbitrary data with the estimator.
predict(estimator, X_train)

# Cross-validate using a two-fold cross-validation iterable providing training indices.
splits = (collect(1:3500), collect(3501:7000))
cv(estimator, X_train, y_train, splits)

# Exhaustive search on an iterable containing all combinations of learning_rate ∈ {.1, .2} and
# bagging_fraction ∈ {.8, .9}
params = [Dict(:learning_rate => learning_rate,
               :bagging_fraction => bagging_fraction) for
          learning_rate in (.1, .2),
          bagging_fraction in (.8, .9)]
search_cv(estimator, X_train, y_train, splits, params)

# Save and load the fitted model.
filename = pwd() * "/finished.model"
savemodel(estimator, filename)
loadmodel(estimator, filename)

Exports

Functions

fit(estimator, X, y[, test...]; [verbosity = 1, is_row_major = false])

Fit the estimator with features data X and label y using the X-y pairs in test as validation sets.

Return a dictionary with an entry for each validation set. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value at each evaluation of the metric.

Arguments

  • estimator::LGBMEstimator: the estimator to be fit.
  • X::Matrix{TX<:Real}: the features data.
  • y::Vector{Ty<:Real}: the labels.
  • test::Tuple{Matrix{TX},Vector{Ty}}...: optionally contains one or more tuples of X-y pairs of the same types as X and y that should be used as validation sets.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.
  • is_row_major::Bool: keyword argument that indicates whether or not X is row-major. true indicates that it is row-major, false indicates that it is column-major (Julia's default).
  • weights::Vector{Tw<:Real}: the training weights.
  • init_score::Vector{Ti<:Real}: the init scores.

predict(estimator, X; [predict_type = 0, num_iterations = -1, verbosity = 1, is_row_major = false])

Return an array with the labels that the estimator predicts for features data X.

Arguments

  • estimator::LGBMEstimator: the estimator to use in the prediction.
  • X::Matrix{T<:Real}: the features data.
  • predict_type::Integer: keyword argument that controls the prediction type. 0 for normal scores with transform (if needed), 1 for raw scores, 2 for leaf indices.
  • num_iterations::Integer: keyword argument that sets the number of iterations of the model to use in the prediction. < 0 for all iterations.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.
  • is_row_major::Bool: keyword argument that indicates whether or not X is row-major. true indicates that it is row-major, false indicates that it is column-major (Julia's default).

cv(estimator, X, y, splits; [verbosity = 1]) (Experimental—interface may change)

Cross-validate the estimator with features data X and label y. The iterable splits provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset.

Return a dictionary with an entry for the validation dataset and, if the parameter is_training_metric is set in the estimator, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.

Arguments

  • estimator::LGBMEstimator: the estimator to be fit.
  • X::Matrix{TX<:Real}: the features data.
  • y::Vector{Ty<:Real}: the labels.
  • splits: the iterable providing arrays of indices for the training dataset.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.

search_cv(estimator, X, y, splits, params; [verbosity = 1]) (Experimental—interface may change)

Exhaustive search over the specified sets of parameter values for the estimator with features data X and label y. The iterable splits provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset.

Return an array with a tuple for each set of parameters value, where the first entry is a set of parameter values and the second entry the cross-validation outcome of those values. This outcome is a dictionary with an entry for the validation dataset and, if the parameter is_training_metric is set in the estimator, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.

Arguments

  • estimator::LGBMEstimator: the estimator to be fit.
  • X::Matrix{TX<:Real}: the features data.
  • y::Vector{Ty<:Real}: the labels.
  • splits: the iterable providing arrays of indices for the training dataset.
  • params: the iterable providing dictionaries of pairs of parameters (Symbols) and values to configure the estimator with.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.

savemodel(estimator, filename; [num_iteration = -1])

Save the fitted model in estimator as filename.

Arguments

  • estimator::LGBMEstimator: the estimator to use in the prediction.
  • filename::String: the name of the file to save the model in.
  • num_iteration::Integer: keyword argument that sets the number of iterations of the model that should be saved. < 0 for all iterations.

loadmodel(estimator, filename)

Load the fitted model filename into estimator. Note that this only loads the fitted model—not the parameters or data of the estimator whose model was saved as filename.

Arguments

  • estimator::LGBMEstimator: the estimator to use in the prediction.
  • filename::String: the name of the file that contains the model.

Estimators

LGBMRegression <: LGBMEstimator

LGBMRegression(; [num_iterations = 10,
                  learning_rate = .1,
                  num_leaves = 127,
                  max_depth = -1,
                  tree_learner = "serial",
                  num_threads = Sys.CPU_CORES,
                  histogram_pool_size = -1.,
                  min_data_in_leaf = 100,
                  min_sum_hessian_in_leaf = 10.,
                  feature_fraction = 1.,
                  feature_fraction_seed = 2,
                  bagging_fraction = 1.,
                  bagging_freq = 0,
                  bagging_seed = 3,
                  early_stopping_round = 0,
                  max_bin = 255,
                  data_random_seed = 1,
                  init_score = "",
                  is_sparse = true,
                  save_binary = false,
                  is_unbalance = false,
                  metric = ["l2"],
                  metric_freq = 1,
                  is_training_metric = false,
                  ndcg_at = Int[],
                  num_machines = 1,
                  local_listen_port = 12400,
                  time_out = 120,
                  machine_list_file = ""])

Return an LGBMRegression estimator.

LGBMBinary <: LGBMEstimator

LGBMBinary(; [num_iterations = 10,
              learning_rate = .1,
              num_leaves = 127,
              max_depth = -1,
              tree_learner = "serial",
              num_threads = Sys.CPU_CORES,
              histogram_pool_size = -1.,
              min_data_in_leaf = 100,
              min_sum_hessian_in_leaf = 10.,
              feature_fraction = 1.,
              feature_fraction_seed = 2,
              bagging_fraction = 1.,
              bagging_freq = 0,
              bagging_seed = 3,
              early_stopping_round = 0,
              max_bin = 255,
              data_random_seed = 1,
              init_score = "",
              is_sparse = true,
              save_binary = false,
              sigmoid = 1.,
              is_unbalance = false,
              metric = ["binary_logloss"],
              metric_freq = 1,
              is_training_metric = false,
              ndcg_at = Int[],
              num_machines = 1,
              local_listen_port = 12400,
              time_out = 120,
              machine_list_file = ""])

Return an LGBMBinary estimator.

LGBMMulticlass <: LGBMEstimator

LGBMMulticlass(; [num_iterations = 10,
                  learning_rate = .1,
                  num_leaves = 127,
                  max_depth = -1,
                  tree_learner = "serial",
                  num_threads = Sys.CPU_CORES,
                  histogram_pool_size = -1.,
                  min_data_in_leaf = 100,
                  min_sum_hessian_in_leaf = 10.,
                  feature_fraction = 1.,
                  feature_fraction_seed = 2,
                  bagging_fraction = 1.,
                  bagging_freq = 0,
                  bagging_seed = 3,
                  early_stopping_round = 0,
                  max_bin = 255,
                  data_random_seed = 1,
                  init_score = "",
                  is_sparse = true,
                  save_binary = false,
                  is_unbalance = false,
                  metric = ["multi_logloss"],
                  metric_freq = 1,
                  is_training_metric = false,
                  ndcg_at = Int[],
                  num_machines = 1,
                  local_listen_port = 12400,
                  time_out = 120,
                  machine_list_file = "",
                  num_class = 1])

Return an LGBMMulticlass estimator.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].