Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ianlini → Feagen

ianlini / Feagen

Licence: bsd-2-clause

(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning data-science feature-engineering

Projects that are alternatives of or similar to Feagen

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (+560.61%)

Mutual labels: data-science, feature-engineering

Mljar Supervised

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀

Stars: ✭ 961 (+2812.12%)

Mutual labels: data-science, feature-engineering

Deep Learning Machine Learning Stock

Stock for Deep Learning and Machine Learning

Stars: ✭ 240 (+627.27%)

Mutual labels: data-science, feature-engineering

A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.

Stars: ✭ 182 (+451.52%)

Mutual labels: data-science, feature-engineering

An open source python library for automated feature engineering

Stars: ✭ 5,891 (+17751.52%)

Mutual labels: data-science, feature-engineering

LAMA - automatic model creation framework

Stars: ✭ 196 (+493.94%)

Mutual labels: data-science, feature-engineering

DeltaPy - Tabular Data Augmentation (by @firmai)

Stars: ✭ 344 (+942.42%)

Mutual labels: data-science, feature-engineering

[UNMAINTAINED] Automated machine learning for analytics & production

Stars: ✭ 1,559 (+4624.24%)

Mutual labels: data-science, feature-engineering

Feature Selection

Features selector based on the self selected-algorithm, loss function and validation method

Stars: ✭ 534 (+1518.18%)

Mutual labels: data-science, feature-engineering

Awesome Feature Engineering

A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning

Stars: ✭ 433 (+1212.12%)

Mutual labels: data-science, feature-engineering

EvalML is an AutoML library written in python.

Stars: ✭ 145 (+339.39%)

Mutual labels: data-science, feature-engineering

Feature exploration for supervised learning

Stars: ✭ 688 (+1984.85%)

Mutual labels: data-science, feature-engineering

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (+324.24%)

Mutual labels: data-science, feature-engineering

An intuitive library to extract features from time series

Stars: ✭ 202 (+512.12%)

Mutual labels: data-science, feature-engineering

A Python library for easy data analysis, visualization, exploration and modeling

Stars: ✭ 123 (+272.73%)

Mutual labels: data-science, feature-engineering

My Data Competition Experience

本人多次机器学习与大数据竞赛Top5的经验总结，满满的干货，拿好不谢

Stars: ✭ 271 (+721.21%)

Mutual labels: data-science, feature-engineering

Data transformations for the ML era

Stars: ✭ 96 (+190.91%)

Mutual labels: data-science, feature-engineering

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Stars: ✭ 10,698 (+32318.18%)

Mutual labels: data-science, feature-engineering

Open source demos

A collection of demos showcasing automated feature engineering and machine learning in diverse use cases

Stars: ✭ 391 (+1084.85%)

Mutual labels: data-science, feature-engineering

Hyperparameter hunter

Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries

Stars: ✭ 648 (+1863.64%)

Mutual labels: data-science, feature-engineering

View All Similar Projects ➔

Feagen

.. image:: https://img.shields.io/travis/ianlini/feagen/master.svg :target: https://travis-ci.org/ianlini/feagen .. image:: https://img.shields.io/pypi/v/feagen.svg :target: https://pypi.python.org/pypi/feagen .. image:: https://img.shields.io/pypi/l/feagen.svg :target: https://pypi.python.org/pypi/feagen

A fast and memory-efficient Python feature generating framework for machine learning.

Deprecation Warning

This package is deprecated. Please use https://github.com/ianlini/dagian instead.

Introduction

This package is currently not stable. This README is for version 1.0.0a7, but the current version in the master branch is very different from 1.0.0a7. If you want to use the latest version, please look at the tested examples.

Installation

.. code:: bash

pip install feagen==1.0.0a7

Getting start

Getting start from the simple lifetime prediction example </examples/lifetime_prediction/>_ is an easy way. You can first look at the raw data lifetime.csv </examples/lifetime_prediction/lifetime.csv>_ and then you may better understand what we are doing to the data.

Creating the feature generator

The most important part in Feagen is the feature generator class. You first need to define a class like in lifetime_feature_generator.py </examples/lifetime_prediction/lifetime_feature_generator.py>_ to tell Feagen how to deal with the data. (TODO: more details)

Creating the config files

There is a command line tool feagen-init that can help create the initial config files: the global config .feagenrc/config.yml and the bundle config .feagenrc/bundle_config.yml. You can look at the comments that are automatically generated in those files or in examples/lifetime_prediction/.feagenrc </examples/lifetime_prediction/.feagenrc>_ to understand how to change them. (TODO: more details)

Drawing the directed acyclic graph (DAG)

There is one way for you to check if the dependency is correct. You can use the command line tool feagen-draw-dag to draw the DAG image:

usage: feagen-draw-dag [-h] [-g GLOBAL_CONFIG] [-d DAG_OUTPUT_PATH]

Generate DAG.

optional arguments: -h, --help show this help message and exit -g GLOBAL_CONFIG, --global-config GLOBAL_CONFIG the path of the path configuration YAML file (default: .feagenrc/config.yml) -d DAG_OUTPUT_PATH, --dag-output-path DAG_OUTPUT_PATH output image path (default: dag.png)

You can specify the paths of the global config and the output image using -g and -d respectively. Running feagen-draw-dag -d fig/dag.png in examples/lifetime_prediction/ </examples/lifetime_prediction/>_ will give you examples/lifetime_prediction/fig/dag.png </examples/lifetime_prediction/fig/dag.png>_:

.. image:: /examples/lifetime_prediction/fig/dag.png

(Note that the order may not be the same)

Generating features

After the generator class and the config are defined, we can now generate the features. A command line tool feagen can be used now:

usage: feagen [-h] [-g GLOBAL_CONFIG] [-b BUNDLE_CONFIG] [-d DAG_OUTPUT_PATH] [--no-bundle]

Generate global data and data bundle.

optional arguments: -h, --help show this help message and exit -g GLOBAL_CONFIG, --global-config GLOBAL_CONFIG the path of the path configuration YAML file (default: .feagenrc/config.yml) -b BUNDLE_CONFIG, --bundle-config BUNDLE_CONFIG the path of the bundle configuration YAML file (default: .feagenrc/bundle_config.yml) -d DAG_OUTPUT_PATH, --dag-output-path DAG_OUTPUT_PATH draw the involved subDAG to the provided path (default: None) --no-bundle not generate the data bundle

You can specify the paths of the global config, the bundle config, and the involved subDAG image using -g, -b and -d respectively.

The program will first find the nodes in the DAG that are involved and build a subDAG for this task, and check whether the data has been generated in the global data. The resulting DAG after these checks will be output if you specify -d. For example, in examples/lifetime_prediction/, if you run feagen first and then add a new feature height_divided_by_weight, and run feagen -d fig/involved_dag.png, you will get an image examples/lifetime_prediction/fig/involved_dag.png </examples/lifetime_prediction/fig/involved_dag.png>:

.. image:: /examples/lifetime_prediction/fig/involved_dag.png

(Note that the order may not be the same)

After the subDAG is generated, the program will start running the methods you implemented in the generator class in an appropriate order, and then output to the global data. The global data will not be removed and can be reused. If you want to generate another bundle, the data that has been generated will not be generated again. This saves much time!

Finally, the data bundle is generated according to the structure specified in the bundle config. You can use hdfview <https://support.hdfgroup.org/products/java/hdfview/>_ to check the resulting global data and data bundle. It may help you understand what the output is. You can also use the argument --no-bundle if you don't want to generate the data bundle (only the global data will be generated).

Now, you can use the data bundle to do machine learning!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 33

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗