Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

EpistasisLab / Rebate

Licence: mit

Relief Based Algorithms of ReBATE implemented in Python with Cython optimization. This repository is no longer being updated. Please see scikit-rebate.

Programming Languages

python

139335 projects - #7 most used programming language

cython

566 projects

Labels

data-science

Projects that are alternatives of or similar to Rebate

Chrispher.github.com

Data Science

Stars: ✭ 8 (-72.41%)

Mutual labels: data-science

Bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

Stars: ✭ 877 (+2924.14%)

Mutual labels: data-science

Steppy Toolkit

Curated set of transformers that make your work with steppy faster and more effective 🔭

Stars: ✭ 21 (-27.59%)

Mutual labels: data-science

Autodl

Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]

Stars: ✭ 854 (+2844.83%)

Mutual labels: data-science

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+2879.31%)

Mutual labels: data-science

Pydataset

Instant access to many datasets in Python.

Stars: ✭ 880 (+2934.48%)

Mutual labels: data-science

Data Science Interview Questions And Answers

Data science interview questions with answers. Not ideally (yet)

Stars: ✭ 842 (+2803.45%)

Mutual labels: data-science

Workshop

课题组每周研讨会

Stars: ✭ 28 (-3.45%)

Mutual labels: data-science

Pydata.kr

PyData Korea 공식 홈페이지입니다. (준비중)

Stars: ✭ 13 (-55.17%)

Mutual labels: data-science

Ethereumdb

Stars: ✭ 21 (-27.59%)

Mutual labels: data-science

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (+2844.83%)

Mutual labels: data-science

Awesome Google Colab

Google Colaboratory Notebooks and Repositories (by @firmai)

Stars: ✭ 863 (+2875.86%)

Mutual labels: data-science

Clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

Stars: ✭ 887 (+2958.62%)

Mutual labels: data-science

Vds

Verteego Data Suite

Stars: ✭ 9 (-68.97%)

Mutual labels: data-science

Intro Python

Python pour Statistique et Science des Données -- Syntaxe, Trafic de Données, Graphes, Programmation, Apprentissage

Stars: ✭ 21 (-27.59%)

Mutual labels: data-science

Awesome Fraud Detection Papers

A curated list of data mining papers about fraud detection.

Stars: ✭ 843 (+2806.9%)

Mutual labels: data-science

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

Stars: ✭ 8,329 (+28620.69%)

Mutual labels: data-science

Mlnet Workshop

ML.NET Workshop to predict car sales prices

Stars: ✭ 29 (+0%)

Mutual labels: data-science

Machine Learning Open Source

Monthly Series - Machine Learning Top 10 Open Source Projects

Stars: ✭ 943 (+3151.72%)

Mutual labels: data-science

Crime Analysis

Association Rule Mining from Spatial Data for Crime Analysis

Stars: ✭ 20 (-31.03%)

Mutual labels: data-science

View All Similar Projects ➔

Master status:

Development status:

Package information:

ReBATE (Relief-based Algorithm Training Environment)

This package includes stand-alone Python code to run any of the included/available Relief-Based algorithms (RBAs) designed for feature weighting/selection as part of a machine learning pipeline (supervised learning). Presently this includes the following core RBAs: ReliefF, SURF, SURF*, MultiSURF* and MultiSURF. Additionally, an implementation of the iterative TuRF mechanism is included. It is still under active development and we encourage you to check back on this repository regularly for updates.

These algorithms offer a computationally efficient way to perform feature selection that is sensitive to feature interactions as well as simple univariate associations, unlike most currently available filter-based feature selection methods. The main benefit of Relief algorithms is that they identify feature interactions without having to exhaustively check every pairwise interaction, thus taking significantly less time than exhaustive pairwise search.

Each core algorithm outputs an ordered set of feature names along with respective feature scores (i.e. weights). Certain algorithms require user specified run parameters (e.g. ReliefF requires the user to specify some 'k' number of nearest neighbors).

Relief algorithms are commonly applied to genetic analyses, where epistasis (i.e., feature interactions) is common. However, the algorithms implemented in this package can be applied to almost any supervised classification data set and supports:

Feature sets that are discrete/categorical, continuous-valued or a mix of both
Data with missing values
Binary endpoints (i.e., classification)
Multi-class endpoints (i.e., classification)
Continuous endpoints (i.e., regression)

Built into this code, is a strategy to 'automatically' detect from the loaded data, these relevant characteristics.

Of our two initial ReBATE software releases, this stand-alone version primarily focuses on improving run-time with the use of Cython. This code is most appropriate for more experienced users or those primarily interested in reducing analysis run time.

We recommend that scikit-learn users, Windows operating system users, beginners, or those looking for the most recent ReBATE developments to instead use our alternate scikit-rebate implementation. ReBATE can be run on Windows with some additional installation steps and possible troubleshooting outlined below.

License

Please see the repository license for the licensing and usage information for ReBATE. Generally, we have licensed ReBATE to make it as widely usable as possible.

Cython (Important Notice)

NOTICE: As is, this code will not run on your local platform! Portions of this code have been optimized with Cython routines for code speedup. As a result, before being able to use ReBATE on a given operating system (i.e. Linux, Mac, or Windows), critical binary files must be compiled as a one time step (or any time the underlying source code is modified, or any time an updated version of ReBATE is downloaded to your system. Compiling the necessary binary files is very easy to do on Mac or Linux systems (because they include a C compiler). However Windows users will unfortunately have to go through a few extra hurdles in order to complete this one time step. If you wish to avoid this hassle, please see our alternate scikit-rebate implementation.

Installation

For detailed information on installing ReBATE, including necessary prerequisites, special instructions for Windows users, and instructions for compiling cython, please refer to our installation documentation.

Running ReBATE

From the '/rebate/' directory, run the following to view all available options:

./rebate.py -h

For detailed information and examples of how to run the different Relief algorithms available in this package, please refer to our usage documentation.

Contributing to ReBATE

We welcome you to check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to ReBATE, please file a new issue so we can discuss it.

If you wish to contribute to ReBATE we strongly recommend following the steps detailed in contributing documentation.

Citing ReBATE

If you use ReBATE or the MultiSURF algorithm in a scientific publication, please consider citing the following paper:

Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore (2017). Benchmarking Relief-Based Feature Selection Methods. arXiv preprint, under review.

BibTeX entry:

@misc{Urbanowicz2017Benchmarking,
    author = {Urbanowicz, Ryan J. and Olson, Randal S. and Schmitt, Peter and Meeker, Melissa and Moore, Jason H.},
    title = {Benchmarking Relief-Based Feature Selection Methods},
    year = {2017},
    howpublished = {arXiv e-print. https://arxiv.org/abs/1711.08477},
}

If you wish to directly cite the original paper for one of the other algorithms implemented in ReBATE please refer to our citing documentation.

History

This code is largely based on Python implementations of ReliefF, SURF, SURF*, MultiSURF*, and TuRF within the ExSTraCS algorithm software. That Python code was in turn based on Java implementations of these algorithms within the Multifactor Dimensionality Reduction (MDR) software. In contrast with the MDR implementations, both the ExSTraCS and scikit-rebate, and present ReBATE versions of this code have been expanded to accommodate the following data considerations: Continuous features, a mix of discrete and continuous features, a continuous endpoint/outcome, and missing data values.

Possible future updates

Make this an installable package
Convert to Classes
Create GUI Interface

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 29

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗