Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Allure of the Stars is a near-future Sci-Fi roguelike and tactical squad combat game written in Haskell; please offer feedback, e.g., after trying out the web frontend version at

Stars: ✭ 149 (-17.22%)

Mutual labels: squad

Url Classification

Machine learning to classify Malicious (Spam)/Benign URL's

Stars: ✭ 95 (-47.22%)

Mutual labels: classifier

Tensorflow Object Detection Tutorial

The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch

Stars: ✭ 113 (-37.22%)

Mutual labels: classifier

Naivebayes

📊 Naive Bayes classifier for JavaScript

Stars: ✭ 127 (-29.44%)

Mutual labels: classifier

Multi Matcher

simple rules engine

Stars: ✭ 84 (-53.33%)

Mutual labels: classifier

Naive Bayes Classifier

yet another general purpose naive bayesian classifier.

Stars: ✭ 162 (-10%)

Mutual labels: classifier

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+1793.89%)

Mutual labels: squad

Scene Text Recognition

Scene text detection and recognition based on Extremal Region(ER)

Stars: ✭ 146 (-18.89%)

Mutual labels: classifier

Monkeylearn

⛔️ ARCHIVED ⛔️ 🐒 R package for text analysis with Monkeylearn 🐒

Stars: ✭ 95 (-47.22%)

Mutual labels: classifier

Sytora

A sophisticated smart symptom search engine

Stars: ✭ 111 (-38.33%)

Mutual labels: classifier

Mnemonicreader

A PyTorch implementation of Mnemonic Reader for the Machine Comprehension task

Stars: ✭ 137 (-23.89%)

Mutual labels: squad

licensechecker (lc) a command line application which scans directories and identifies what software license things are under producing reports as either SPDX, CSV, JSON, XLSX or CLI Tabular output. Dual-licensed under MIT or the UNLICENSE.

Stars: ✭ 93 (-48.33%)

Mutual labels: classifier

Emlearn

Machine Learning inference engine for Microcontrollers and Embedded devices

Stars: ✭ 154 (-14.44%)

Mutual labels: classifier

Pancancer

Building classifiers using cancer transcriptomes across 33 different cancer-types

Stars: ✭ 84 (-53.33%)

Mutual labels: classifier

Digit Recognizer

A Machine Learning classifier for recognizing the digits for humans.

Stars: ✭ 126 (-30%)

Mutual labels: classifier

Programming Language Classifier

An example of how to use CreateML in Xcode 10 to create a Core ML model for classifying text

Stars: ✭ 172 (-4.44%)

Mutual labels: classifier

Speech signal processing and classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Stars: ✭ 155 (-13.89%)

Mutual labels: classifier

Awesome Decision Tree Papers

A collection of research papers on decision, classification and regression trees with implementations.

Stars: ✭ 1,908 (+960%)

Mutual labels: classifier

View All Similar Projects ➔

ALBERT-TF2.0

ALBERT model Fine Tuning using TF2.0

This repository contains TensorFlow 2.0 implementation for ALBERT.

Requirements

python3
pip install -r requirements.txt

ALBERT Pre-training

ALBERT model pre-training from scratch and Domain specific fine-tuning. Instructions here

Download ALBERT TF 2.0 weights

Verison 1	Version 2
base	base
large	large
xlarge	xlarge
xxlarge	xxlarge

unzip the model inside repo.

Above weights does not contain the final layer in original model. Now can only be used for fine tuning downstream tasks.

For full Weights conversion from TF-HUB to TF 2.0 here

Download glue data

Download using the below cmd

python download_glue_data.py --data_dir glue_data --tasks all

Fine-tuning

To prepare the fine-tuning data for final model training, use the create_finetuning_data.py script. Resulting datasets in tf_record format and training meta data should be later passed to training or evaluation scripts. The task-specific arguments are described in following sections:

Creating finetuninig data

Example CoLA

export GLUE_DIR=glue_data/
export ALBERT_DIR=large/

export TASK_NAME=CoLA
export OUTPUT_DIR=cola_processed
mkdir $OUTPUT_DIR

python create_finetuning_data.py \
 --input_data_dir=${GLUE_DIR}/ \
 --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
 --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
 --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
 --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
 --fine_tuning_task_type=classification --max_seq_length=128 \
 --classification_task_name=${TASK_NAME}

Running classifier

export MODEL_DIR=CoLA_OUT
python run_classifer.py \
--train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
--eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
--input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
--albert_config_file=${ALBERT_DIR}/config.json \
--task_name=${TASK_NAME} \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--output_dir=${MODEL_DIR} \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--do_train \
--do_eval \
--train_batch_size=16 \
--learning_rate=1e-5 \
--custom_training_loop

By default run_classifier will run 3 epochs. and evaluate on development set

Above cmd would result in dev set accuracy of 76.22 in CoLA task

The above code tested on TITAN RTX 24GB single GPU

SQuAD

Data and Evalution scripts

Training Data Preparation

export SQUAD_DIR=SQuAD
export SQUAD_VERSION=v1.1
export ALBERT_DIR=large
export OUTPUT_DIR=squad_out_${SQUAD_VERSION}
mkdir $OUTPUT_DIR

python create_finetuning_data.py \
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model  \
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record  \
--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--fine_tuning_task_type=squad \
--max_seq_length=384

Running Model

python run_squad.py \
--mode=train_and_predict \
--input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--train_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
--predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json \
--albert_config_file=${ALBERT_DIR}/config.json \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--train_batch_size=48 \
--predict_batch_size=48 \
--learning_rate=1e-5 \
--num_train_epochs=3 \
--model_dir=${OUTPUT_DIR} \
--strategy_type=mirror

Runnig SQuAD V2.0

export SQUAD_DIR=SQuAD
export SQUAD_VERSION=v2.0
export ALBERT_DIR=xxlarge
export OUTPUT_DIR=squad_out_${SQUAD_VERSION}
mkdir $OUTPUT_DIR

python create_finetuning_data.py \
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model  \
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record  \
--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--fine_tuning_task_type=squad \
--max_seq_length=384

python run_squad.py \
--mode=train_and_predict \
--input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--train_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
--predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json \
--albert_config_file=${ALBERT_DIR}/config.json \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--train_batch_size=24 \
--predict_batch_size=24 \
--learning_rate=1.5e-5 \
--num_train_epochs=3 \
--model_dir=${OUTPUT_DIR} \
--strategy_type=mirror \
--version_2_with_negative \
--max_seq_length=384

Experiment done on 4 x NVIDIA TITAN RTX 24 GB.

Result

Multi-GPU training and XLA

Use flag --strategy_type=mirror for Multi GPU training. Currently All the existing GPUs in the environment will be used.
Use flag --enable-xla to enable XLA. Model training starting time will be increase.(JIT compilation)

Ignore

Below warning will be displayed if you use keras model.fit method at end of each epoch. Issue with training steps calculation when tf.data provided to model.fit() Have no effect on model performance so ignore. Mostly will fixed in the next tf2 relase . Issue-link

2019-10-31 13:35:48.322897: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range:
End of sequence
         [[{{node IteratorGetNext}}]]
         [[model_1/albert_model/word_embeddings/Shape/_10]]
2019-10-31 13:36:03.302722: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range:
End of sequence
         [[{{node IteratorGetNext}}]]
         [[IteratorGetNext/_4]]

References

TensorFlow offical implementation of BERT in TF 2.0 . Lot of parts of code in this repo adapted from the above repo.
LAMB optimizer from TensorFlow addons
TF-HUB weights to TF 2.0 weights conversion : KPE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 180

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (22) 🔗