Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

mbernico / Snape

Licence: apache-2.0

Snape is a convenient artificial dataset generator that wraps sklearn's make_classification and make_regression and then adds in 'realism' features such as complex formating, varying scales, categorical variables, and missing values.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

dataset classification regression students

Projects that are alternatives of or similar to Snape

Php Ml

PHP-ML - Machine Learning library for PHP

Stars: ✭ 7,900 (+4996.77%)

Mutual labels: dataset, classification, regression

Openml R

R package to interface with OpenML

Stars: ✭ 81 (-47.74%)

Mutual labels: dataset, classification, regression

Dataset

Crop/Weed Field Image Dataset

Stars: ✭ 98 (-36.77%)

Mutual labels: dataset, classification

Universal Data Tool

Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.

Stars: ✭ 1,356 (+774.84%)

Mutual labels: dataset, classification

Autoannotationtool

A label tool aim to reduce semantic segmentation label time, rectangle and polygon annotation is supported

Stars: ✭ 113 (-27.1%)

Mutual labels: dataset, classification

Thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Stars: ✭ 1,282 (+727.1%)

Mutual labels: classification, regression

Lossfunctions.jl

Julia package of loss functions for machine learning.

Stars: ✭ 89 (-42.58%)

Mutual labels: classification, regression

Gpstuff

GPstuff - Gaussian process models for Bayesian analysis

Stars: ✭ 106 (-31.61%)

Mutual labels: classification, regression

Pytsetlinmachine

Implements the Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, Weighted Tsetlin Machine, and Embedding Tsetlin Machine, with support for continuous features, multigranularity, and clause indexing

Stars: ✭ 80 (-48.39%)

Mutual labels: classification, regression

Machine Learning Projects

This repository consists of all my Machine Learning Projects.

Stars: ✭ 135 (-12.9%)

Mutual labels: classification, regression

Tiny ml

numpy 实现的周志华《机器学习》书中的算法及其他一些传统机器学习算法

Stars: ✭ 129 (-16.77%)

Mutual labels: classification, regression

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (+1317.42%)

Mutual labels: classification, regression

A high-level machine learning and deep learning library for the PHP language.

Stars: ✭ 1,270 (+719.35%)

Mutual labels: classification, regression

Dlcv for beginners

《深度学习与计算机视觉》配套代码

Stars: ✭ 1,244 (+702.58%)

Mutual labels: classification, regression

Machine Learning Algorithms

A curated list of almost all machine learning algorithms and deep learning algorithms grouped by category.

Stars: ✭ 92 (-40.65%)

Mutual labels: classification, regression

Neuroflow

Artificial Neural Networks for Scala

Stars: ✭ 105 (-32.26%)

Mutual labels: classification, regression

Benchmarks

Comparison tools

Stars: ✭ 139 (-10.32%)

Mutual labels: classification, regression

Mlbox

MLBox is a powerful Automated Machine Learning python library.

Stars: ✭ 1,199 (+673.55%)

Mutual labels: classification, regression

Pointclouddatasets

3D point cloud datasets in HDF5 format, containing uniformly sampled 2048 points per shape.

Stars: ✭ 80 (-48.39%)

Mutual labels: dataset, classification

Mlr

Machine Learning in R

Stars: ✭ 1,542 (+894.84%)

Mutual labels: classification, regression

View All Similar Projects ➔

Snape

Motivation

Snape was primarily created for academic and educational settings. It has been used to create datasets that are unique per student, per assignment for various homework assignments. It has also been used to create class wide assessments in conjunction with 'Kaggle In the Classroom.'

Other users have suggested non-academic uses cases as well, including 'interview screening problems,' model comparison, etc.

Installation

Via Github

git clone https://github.com/mbernico/snape.git
cd snape
python setup.py install

Via pip

Coming Soon...

Quick Start

Snape can run either as a python module or as a command line application.

Command Line Usage

Creating a Dataset

From the main directory in the git repo:

python snape/make_dataset.py -c example/config_classification.json

Will use the configuration file example/config_classification.json to create an artificial dataset called 'my_dataset' (which is specified in the json config, more on this later...).

The dataset will consist of three files:

my_dataset_train.csv (80% of the artificial dataset with all dependent and independent variables)
my_dataset_test.csv (20% of the artificial dataset with only the dependent variables present)
my_dataset_testkey.csv (the same 20% as _test, including the dependent variables)

Note that if a star schema is generated, additional csv files will be generated. There will be one extra csv file per dimension, but only the main 'fact table' dataset will be split into test and train files.

The train and test files can be given to a student. The student can respond with a file of predictions, which can be scored against the testkey as follows:

Scoring a Dataset

snape/score_dataset.py  -p example/student_predictions.csv  -k example/student_testkey.csv

Snape's score_dataset.py will attempt to detect the problem type and then score it, printing some metrics

Problem Type Detection: binary
---Binary Classification Score---
             precision    recall  f1-score   support

          0       0.81      0.99      0.89      1601
          1       0.50      0.06      0.11       399

avg / total       0.75      0.80      0.73      2000

Python Module Usage

Creating a Dataset

from snape.make_dataset import make_dataset

# configuration json examples can be found in doc
conf = {
    "type": "classification",
    "n_classes": 2,
    "n_samples": 1000,
    "n_features": 10,
    "out_path": "./",
    "output": "my_dataset",
    "n_informative": 3,
    "n_duplicate": 0,
    "n_redundant": 0,
    "n_clusters": 2,
    "weights": [0.8, 0.2],
    "pct_missing": 0.00,
    "insert_dollar": "Yes",
    "insert_percent": "Yes",
    "n_categorical": 0,
    "star_schema": "No",
    "label_list": []
}

make_dataset(config=conf)

Scoring a Dataset

from snape.score_dataset import score_dataset

# a dataset's testkey can be compared to a prediction file using score_dataset()
results = score_dataset(y_file="student_testkey.csv", y_hat_file="student_predictions.csv")
# results is a tuple of (a_primary_metric, classification_report)
print("AUC = " + str(results[0]))
print(results[1])

Dataset Generation Config

Why Snape?

Snape is primarily used for creating complex datasets that challenge students and teach defense against the dark arts of machine learning. :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 155

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗