All Projects → SimonBlanke → Hyperactive

SimonBlanke / Hyperactive

Licence: mit
A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Hyperactive

Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (+756.59%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, xgboost, hyperparameter-optimization, feature-engineering, automated-machine-learning
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+256.04%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, xgboost, hyperparameter-optimization, feature-engineering, optimization
Tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+4503.3%)
Mutual labels:  data-science, scikit-learn, xgboost, hyperparameter-optimization, feature-engineering, automated-machine-learning
Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (+428.02%)
Mutual labels:  data-science, scikit-learn, xgboost, hyperparameter-optimization, feature-engineering, automated-machine-learning
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+5778.02%)
Mutual labels:  data-science, hyperparameter-optimization, neural-architecture-search, feature-engineering, bayesian-optimization, automated-machine-learning
Lale
Library for Semi-Automated Data Science
Stars: ✭ 198 (+8.79%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, hyperparameter-optimization, automated-machine-learning
Auto Sklearn
Automated Machine Learning with scikit-learn
Stars: ✭ 5,916 (+3150.55%)
Mutual labels:  meta-learning, scikit-learn, hyperparameter-optimization, bayesian-optimization, automated-machine-learning
Autogluon
AutoGluon: AutoML for Text, Image, and Tabular Data
Stars: ✭ 3,920 (+2053.85%)
Mutual labels:  data-science, scikit-learn, hyperparameter-optimization, neural-architecture-search, automated-machine-learning
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+558.79%)
Mutual labels:  data-science, xgboost, automated-machine-learning, optimization
Xcessiv
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.
Stars: ✭ 1,255 (+589.56%)
Mutual labels:  data-science, scikit-learn, hyperparameter-optimization, automated-machine-learning
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-81.32%)
Mutual labels:  hyperparameter-optimization, feature-engineering, bayesian-optimization, automated-machine-learning
My Data Competition Experience
本人多次机器学习与大数据竞赛Top5的经验总结,满满的干货,拿好不谢
Stars: ✭ 271 (+48.9%)
Mutual labels:  data-science, xgboost, hyperparameter-optimization, feature-engineering
Hpbandster
a distributed Hyperband implementation on Steroids
Stars: ✭ 456 (+150.55%)
Mutual labels:  hyperparameter-optimization, neural-architecture-search, bayesian-optimization, automated-machine-learning
Scikit Optimize
Sequential model-based optimization with a `scipy.optimize` interface
Stars: ✭ 2,258 (+1140.66%)
Mutual labels:  bayesian-optimization, optimization, scikit-learn, hyperparameter-optimization
Featuretools
An open source python library for automated feature engineering
Stars: ✭ 5,891 (+3136.81%)
Mutual labels:  data-science, scikit-learn, feature-engineering, automated-machine-learning
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+369.23%)
Mutual labels:  artificial-intelligence, data-science, feature-engineering, automated-machine-learning
Data Science Best Resources
Carefully curated resource links for data science in one place
Stars: ✭ 1,104 (+506.59%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+561.54%)
Mutual labels:  data-science, scikit-learn, optimization
Blurr
Data transformations for the ML era
Stars: ✭ 96 (-47.25%)
Mutual labels:  artificial-intelligence, data-science, feature-engineering
Remixautoml
R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.
Stars: ✭ 159 (-12.64%)
Mutual labels:  xgboost, feature-engineering, automated-machine-learning



img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :)

A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.


Hyperactive:






What's new?





Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows listings of the capabilities of Hyperactive, where each of the items links to an example:


Optimization Techniques Tested and Supported Packages Optimization Applications
Local Search:
Global Search:
Population Methods:
Sequential Methods:
Machine Learning:
Deep Learning:
Parallel Computing:
Feature Engineering: Machine Learning: Deep Learning: Meta-data: Miscellaneous:

The examples above are not necessarily done with realistic datasets or training procedures. The purpose is fast execution of the solution proposal and giving the user ideas for interesting usecases.


Hyperactive is very easy to use:

Regular training Hyperactive
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston


data = load_boston()
X, y = data.data, data.target


gbr = DecisionTreeRegressor(max_depth=10)
score = cross_val_score(gbr, X, y, cv=3).mean()










from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

def model(opt):
    gbr = DecisionTreeRegressor(max_depth=opt["max_depth"])
    return cross_val_score(gbr, X, y, cv=3).mean()


search_space = {"max_depth": list(range(3, 25))}

hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()

Installation

The most recent version of Hyperactive is available on PyPi:

pyversions PyPI version PyPI version

pip install hyperactive

Example

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

# define the model in a function
def model(opt):
    # pass the suggested parameter to the machine learning model
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"]
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    # return a single numerical value, which gets maximized
    return scores.mean()


# search space determines the ranges of parameters you want the optimizer to search through
search_space = {"n_estimators": list(range(10, 200, 5))}

# start the optimization run
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()

Hyperactive API reference

Hyperactive(verbosity, distribution)
  • verbosity = ["progress_bar", "print_results", "print_times"]

    • (list, False)
    • The verbosity list determines what part of the optimization information will be printed in the command line.
  • distribution = {"multiprocessing": {"initializer": tqdm.set_lock, "initargs": (tqdm.get_lock(),),}}

    • (str, dict, callable)

    • Access the parallel processing in three ways:

      • Via a str "multiprocessing" or "joblib" to choose one of the two.
      • Via a dictionary with one key "multiprocessing" or "joblib" and a value that is the input argument of Pool and Parallel. The default argument is a good example of this.
      • Via your own parallel processing function that will be used instead of those for multiprocessing and joblib. The wrapper-function must work similar to the following two functions:

      Multiprocessing:

      def multiprocessing_wrapper(process_func, search_processes_paras, **kwargs):
        n_jobs = len(search_processes_paras)
      
        pool = Pool(n_jobs, **kwargs)
        results = pool.map(process_func, search_processes_paras)
      
        return results
      

      Joblib:

      def joblib_wrapper(process_func, search_processes_paras, **kwargs):
          n_jobs = len(search_processes_paras)
      
          jobs = [
              delayed(process_func)(**info_dict)
              for info_dict in search_processes_paras
          ]
          results = Parallel(n_jobs=n_jobs, **kwargs)(jobs)
      
          return results
      
.add_search(objective_function, search_space, n_iter, optimizer, n_jobs, initialize, max_score, random_state, memory, memory_warm_start)
  • objective_function

    • (callable)
    • The objective function defines the optimization problem. The optimization algorithm will try to maximize the numerical value that is returned by the objective function by trying out different parameters from the search space.
  • search_space

    • (dict)
    • Defines the space were the optimization algorithm can search for the best parameters for the given objective function.
  • n_iter

    • (int)
    • The number of iterations that will be performed during the optimization run. The entire iteration consists of the optimization-step, which decides the next parameter that will be evaluated and the evaluation-step, which will run the objective function with the chosen parameter and return the score.
  • optimizer = "default"

    • (object)

    • Instance of optimization class that can be imported from Hyperactive. "default" corresponds to the random search optimizer. The following classes can be imported and used:

      • HillClimbingOptimizer
      • StochasticHillClimbingOptimizer
      • RepulsingHillClimbingOptimizer
      • RandomSearchOptimizer
      • RandomRestartHillClimbingOptimizer
      • RandomAnnealingOptimizer
      • SimulatedAnnealingOptimizer
      • ParallelTemperingOptimizer
      • ParticleSwarmOptimizer
      • EvolutionStrategyOptimizer
      • BayesianOptimizer
      • TreeStructuredParzenEstimators
      • DecisionTreeOptimizer
      • EnsembleOptimizer
    • Example:

      ...
      
      opt_hco = HillClimbingOptimizer(epsilon=0.08)
      hyper = Hyperactive()
      hyper.add_search(..., optimizer=opt_hco)
      hyper.run()
      
      ...
      
  • n_jobs = 1

    • (int)
    • Number of jobs to run in parallel. Those jobs are optimization runs that work independent from another (no information sharing). If n_jobs == -1 the maximum available number of cpu cores is used.
  • initialize = {"grid": 4, "random": 2, "vertices": 4}

    • (dict)
    • The initialization dictionary automatically determines a number of parameters that will be evaluated in the first n iterations (n is the sum of the values in initialize). The initialize keywords are the following:
      • grid

        • Initializes positions in a grid like pattern. Positions that cannot be put into a grid are randomly positioned.
      • vertices

        • Initializes positions at the vertices of the search space. Positions that cannot be put into a vertices are randomly positioned.
      • random

        • Number of random initialized positions
      • warm_start

        • List of parameter dictionaries that marks additional start points for the optimization run.
  • max_score = None

    • (float, None)
    • Maximum score until the optimization stops. The score will be checked after each completed iteration.
  • random_state = None

    • (int, None)
    • Random state for random processes in the random, numpy and scipy module.
  • memory = True

    • (bool)
    • Whether or not to use the "memory"-feature. The memory is a dictionary, which gets filled with parameters and scores during the optimization run. If the optimizer encounters a parameter that is already in the dictionary it just extracts the score instead of reevaluating the objective function (which can take a long time).
  • memory_warm_start = None

    • (pandas dataframe, None)

    • Pandas dataframe that contains score and parameter information that will be automatically loaded into the memory-dictionary.

      example:

      score x1 x2 x...
      0.756 0.1 0.2 ...
      0.823 0.3 0.1 ...
      ... ... ... ...
      ... ... ... ...
.run(max_time)
  • max_time = None
    • (float, None)
    • Maximum number of seconds until the optimization stops. The time will be checked after each completed iteration.

.best_para(objective_function)
  • objective_function

    • (callable)
  • returnes: dictionary

  • Parameter dictionary of the best score of the given objective_function found in the previous optimization run.

    example:

    {
      'x1': 0.2, 
      'x2': 0.3,
    }
    
.best_score(objective_function)
  • objective_function
    • (callable)
  • returns: int or float
  • Numerical value of the best score of the given objective_function found in the previous optimization run.
.results(objective_function)
  • objective_function

    • (callable)
  • returns: Pandas dataframe

  • The dataframe contains score, parameter information, iteration times and evaluation times of the given objective_function found in the previous optimization run.

    example:

    score x1 x2 x... eval_times iter_times
    0.756 0.1 0.2 ... 0.953 1.123
    0.823 0.3 0.1 ... 0.948 1.101
    ... ... ... ... ... ...
    ... ... ... ... ... ...

Optimizer Classes

Each of the following optimizer classes can be initialized and passed to the "add_search"-method via the "optimizer"-argument. During this initialization the optimizer class accepts additional paramters. You can read more about each optimization-strategy and its parameters in the Optimization Tutorial.

  • HillClimbingOptimizer
  • RepulsingHillClimbingOptimizer
  • SimulatedAnnealingOptimizer
  • RandomSearchOptimizer
  • RandomRestartHillClimbingOptimizer
  • RandomAnnealingOptimizer
  • ParallelTemperingOptimizer
  • ParticleSwarmOptimizer
  • EvolutionStrategyOptimizer
  • BayesianOptimizer
  • TreeStructuredParzenEstimators
  • DecisionTreeOptimizer

Roadmap

v2.0.0 ✔️
  • [x] Change API
v2.1.0 ✔️
  • [x] Save memory of evaluations for later runs (long term memory)
  • [x] Warm start sequence based optimizers with long term memory
  • [x] Gaussian process regressors from various packages (gpy, sklearn, GPflow, ...) via wrapper
v2.2.0 ✔️
  • [x] Add basic dataset meta-features to long term memory
  • [x] Add helper-functions for memory
    • [x] connect two different model/dataset hashes
    • [x] split two different model/dataset hashes
    • [x] delete memory of model/dataset
    • [x] return best known model for dataset
    • [x] return search space for best model
    • [x] return best parameter for best model
v2.3.0 ✔️
  • [x] Tree-structured Parzen Estimator
  • [x] Decision Tree Optimizer
  • [x] add "max_sample_size" and "skip_retrain" parameter for sbom to decrease optimization time
v3.0.0 ✔️
  • [x] New API
    • [x] expand usage of objective-function
    • [x] No passing of training data into Hyperactive
    • [x] Removing "long term memory"-support (better to do in separate package)
    • [x] More intuitive selection of optimization strategies and parameters
    • [x] Separate optimization algorithms into other package
    • [x] expand api so that optimizer parameter can be changed at runtime
    • [x] add extensive testing procedure (similar to Gradient-Free-Optimizers)
v3.1.0
  • [ ] New implementation of dashboard for visualization of search-data
v3.2.0
  • [ ] New implementation of "long term memory" for search-data storage and usage

Experimental algorithms

The following algorithms are of my own design and, to my knowledge, do not yet exist in the technical literature. If any of these algorithms already exist I would like you to share it with me in an issue.

Random Annealing

A combination between simulated annealing and random search.


FAQ

Known Errors + Solutions

Read this before opening a bug-issue

Are you sure the bug is located in Hyperactive?

Look at the error message from the command line. If one of the last messages look like this:

  • File "/.../gradient_free_optimizers/...", line ...

Then you should post the bug report in:

Otherwise you can post the bug report in Hyperactive

MemoryError: Unable to allocate ... for an array with shape (...)

This is expected of the current implementation of smb-optimizers. For all Sequential model based algorithms you have to keep your eyes on the search space size:

search_space_size = 1
for value_ in search_space.values():
    search_space_size *= len(value_)
    
print("search_space_size", search_space_size)

Reduce the search space size to resolve this error.

TypeError: cannot pickle '_thread.RLock' object

Setting distribution to "joblib" may fix this problem:

hyper = Hyperactive(distribution="joblib")
Command line full of warnings

Very often warnings from sklearn or numpy. Those warnings do not correlate with bad performance from Hyperactive. Your code will most likely run fine. Those warnings are very difficult to silence.

Put this at the very top of your script:

def warn(*args, **kwargs):
    pass


import warnings

warnings.warn = warn

References

[dto] Scikit-Optimize


Citing Hyperactive

@Misc{hyperactive2019,
  author =   {{Simon Blanke}},
  title =    {{Hyperactive}: A hyperparameter optimization and meta-learning toolbox for machine-/deep-learning models.},
  howpublished = {\url{https://github.com/SimonBlanke}},
  year = {since 2019}
}

License

LICENSE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].