All Projects → pat-s → 2019-feature-selection

pat-s / 2019-feature-selection

Licence: Unknown, MIT licenses found Licenses found Unknown LICENSE MIT LICENSE.md
Research project

Programming Languages

r
7636 projects
TeX
3793 projects

Projects that are alternatives of or similar to 2019-feature-selection

GPS
code for "A global pathway selection algorithm for the reduction of detailed chemical kinetic mechanisms" (Gao et al., CNF'16)
Stars: ✭ 18 (-30.77%)
Mutual labels:  feature-selection
bess
Best Subset Selection algorithm for Regression, Classification, Count, Survival analysis
Stars: ✭ 14 (-46.15%)
Mutual labels:  feature-selection
L0Learn
Efficient Algorithms for L0 Regularized Learning
Stars: ✭ 74 (+184.62%)
Mutual labels:  feature-selection
feature engine
Feature engineering package with sklearn like functionality
Stars: ✭ 758 (+2815.38%)
Mutual labels:  feature-selection
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (+7.69%)
Mutual labels:  feature-selection
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-15.38%)
Mutual labels:  feature-selection
GeneticAlgorithmForFeatureSelection
Search the best feature subset for you classification mode
Stars: ✭ 82 (+215.38%)
Mutual labels:  feature-selection
exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-11.54%)
Mutual labels:  feature-selection
Reinforcement-Learning-Feature-Selection
Feature selection for maximizing expected cumulative reward
Stars: ✭ 27 (+3.85%)
Mutual labels:  feature-selection
Ball
Statistical Inference and Sure Independence Screening via Ball Statistics
Stars: ✭ 22 (-15.38%)
Mutual labels:  feature-selection
Mlr
Machine Learning in R
Stars: ✭ 1,542 (+5830.77%)
Mutual labels:  feature-selection
rcompendium
📦 Create a package or compendium structure
Stars: ✭ 26 (+0%)
Mutual labels:  research-compendium
GraphOfDocs
GraphOfDocs: Representing multiple documents as a single graph
Stars: ✭ 13 (-50%)
Mutual labels:  feature-selection
adapt
Awesome Domain Adaptation Python Toolbox
Stars: ✭ 46 (+76.92%)
Mutual labels:  feature-selection
msda
Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (+207.69%)
Mutual labels:  feature-selection
qbso-fs
Python implementation of QBSO-FS : a Reinforcement Learning based Bee Swarm Optimization metaheuristic for Feature Selection problem.
Stars: ✭ 47 (+80.77%)
Mutual labels:  feature-selection
thesis
PhD thesis
Stars: ✭ 25 (-3.85%)
Mutual labels:  research-compendium
NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Stars: ✭ 797 (+2965.38%)
Mutual labels:  feature-selection
mrmr
mRMR (minimum-Redundancy-Maximum-Relevance) for automatic feature selection at scale.
Stars: ✭ 170 (+553.85%)
Mutual labels:  feature-selection
pomdp-intro
📓 A separate repo for first pomdp paper (AmNat)
Stars: ✭ 17 (-34.62%)
Mutual labels:  research-compendium

Monitoring forest health using hyperspectral imagery: Does feature selection improve the performance of machine-learning techniques?

Research study

R version License

See https://pat-s.github.io/2019-feature-selection/ for a detailed description including HTML result documents.

Project structure

📔 code/: R scripts

📔 docs/00-manuscripts/ieee: LaTeX manuscripts

📔 R/: R functions

📔 _drake.R: {drake} config file. Specifies execution order of all steps to reproduce this study.

📔 analysis/: Reporting documents (R Markdown)

📔 docs/: HTML docs created via {workflowr} using the .Rmd sources from the analysis/ directory.

The data is hosted at Zenodo and automatically downloaded and processed when invoking the workflow via drake::r_make().

Reproducibility

This study makes heavy use of the R packages {drake}, {renv} and {workflowr} to streamline workflow execution, manage R package versions and the creation of a research website to complete the study.

By calling drake::r_make() from the repository root, the creation of R objects used in this study is initiated (including data download from Zenodo). Intermediate/single objects can be computed by specifying their names explicitly in drake_config(targets = <target name>) in _drake.R.

While most targets are cheap to compute, the modeling part is pretty expensive. These were run on a High-Performance-Computing (HPC) system and attempting to create those on a local desktop machine is not recommended.

Known Issues

Parts of this work (and some targets) depend on the download of Sentinel2 images. For this task the R package {getSpatialData} was used. After a required refactoring to the latest version of the package in November 2020 (due to outdated/non-working functionality with the initial package version of {getSpatialData} from 2019), the Sentinel2 download is currently broken.

This issue does not affect the recreation of the targets used for the scientific manuscript.

Creating targets with {drake}

(Before creating any target/object, make sure to call renv::restore() to install all required packages.)

Calling r_make() will create targets specified in drake_config(targets = <target>) in _drake.R with the additional drake settings specified.

Important: If you do have access to a Slurm cluster, set options(clustermq.scheduler = "slurm") in _drake.R (around l.73).

Required disk space

The data/ folder will contain data about 5.5GB in size.

Important intermediate targets

Out of the 400+ intermediate targets/objects in this project, the following targets are considered important, i.e. they might want to be created/inspected in more detail.

Note that most reports require some/all fitted models. Creating these (e.g. target benchmark_no_models) is a costly process and takes several days on a HPC and way longer on a single machine.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].