All Projects → proycon → deepfrog

proycon / deepfrog

Licence: GPL-3.0 license
An NLP-suite powered by deep learning

Programming Languages

rust
11053 projects
groovy
2714 projects
shell
77523 projects

Projects that are alternatives of or similar to deepfrog

frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+337.5%)
Mutual labels:  dutch, folia
molecule-attention-transformer
Pytorch reimplementation of Molecule Attention Transformer, which uses a transformer to tackle the graph-like structure of molecules
Stars: ✭ 46 (+187.5%)
Mutual labels:  transformers
long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
Stars: ✭ 103 (+543.75%)
Mutual labels:  transformers
gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022
Stars: ✭ 165 (+931.25%)
Mutual labels:  transformers
Home-Assistant-Sensor-Afvalbeheer
Provides Home Assistant sensors for multiple Dutch and Belgium waste collectors
Stars: ✭ 157 (+881.25%)
Mutual labels:  dutch
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+300%)
Mutual labels:  dutch
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (+631.25%)
Mutual labels:  transformers
language-planner
Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
Stars: ✭ 84 (+425%)
Mutual labels:  transformers
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+818.75%)
Mutual labels:  transformers
PyTorch-Model-Compare
Compare neural networks by their feature similarity
Stars: ✭ 119 (+643.75%)
Mutual labels:  transformers
DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
Stars: ✭ 58 (+262.5%)
Mutual labels:  transformers
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (+250%)
Mutual labels:  transformers
modules
The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We develop a method for analyzing emerging functional modularity in neural networks based on differentiable weight masks and use it to point out important issues in current-day neural networks.
Stars: ✭ 25 (+56.25%)
Mutual labels:  transformers
lightning-transformers
Flexible components pairing 🤗 Transformers with Pytorch Lightning
Stars: ✭ 551 (+3343.75%)
Mutual labels:  transformers
dutch-hackathons
Building the most comprehensive list of annual hackathons in the Netherlands at hackathonlist.nl.
Stars: ✭ 22 (+37.5%)
Mutual labels:  dutch
pH7-Internationalization
🎌 pH7CMS Internationalization (I18N) package 🙊 Get new languages for your pH7CMS website!
Stars: ✭ 17 (+6.25%)
Mutual labels:  dutch
xpandas
Universal 1d/2d data containers with Transformers functionality for data analysis.
Stars: ✭ 25 (+56.25%)
Mutual labels:  transformers
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (+50%)
Mutual labels:  transformers
UitzendingGemist
An *Unofficial* Uitzending Gemist application for Apple TV 4 (**deprecated, use TV Gemist ☝🏻**)
Stars: ✭ 48 (+200%)
Mutual labels:  dutch
pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Stars: ✭ 250 (+1462.5%)
Mutual labels:  transformers

DeepFrog - NLP Suite

Language Machines Badge Project Status: Suspended -  Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

Introduction

DeepFrog aims to be a (partial) successor of the Dutch-NLP suite Frog. Whereas the various NLP modules in Frog wre built on k-NN classifiers, DeepFrog builds on deep learning techniques and can use a variety of neural transformers.

Our deliverables are multi-faceted:

  1. Fine-tuned neural network models for Dutch NLP that can be compared with Frog and are directly usable with Huggingface's Transformers library for Python (or rust-bert for Rust).
  2. Training pipelines for the above models (see training).
  3. A software tool that integrates multiple models (not just limited to dutch!) and provides a single pipeline solution for end-users.
    • with full support for FoLiA XML input/output.
    • usage is not limited to the models we provide

Installation

DeepFrog and all its dependencies are included as an extra in LaMachine, which is the easiest way to install it. Within lamachine, do:

lamachine-add deepfrog && lamachine-update

Otherwise, simply install DeepFrog using Rust's package manager:

cargo install deepfrog

No cargo/rust on your system yet? Do sudo apt install cargo on Debian/ubuntu based systems, brew install rust on mac, or use rustup.

In order to run the DeepFrog command-line-tool, you need to have the C++ library libtorch installed. Download it from https://pytorch.org/ ; make sure you select package: libtorch there! You don't need the rest of pytorch.

To use our models directly with the Huggingface's Transformers library for Python, you merely need that library, models will be automatically downloaded and installed as you invoke them. The DeepFrog command-line-tool is not used in this workflow.

Models

We aim to make available various models for Dutch NLP.

RobBERT v1 Part-of-Speech (CGN tagset) for Dutch

Model page with instructions: https://huggingface.co/proycon/robbert-pos-cased-deepfrog-nld

Uses pre-trained model RobBERT (a Roberta model), fine-tuned on part-of-speech tags with the full corpus as also used by Frog. Uses the tag set of Corpus Gesproken Nederlands (CGN), this corpus constitutes a subset of the training data.

Test Evaluation:

f1 = 0.9708171206225681
loss = 0.07882563415198372
precision = 0.9708171206225681
recall = 0.9708171206225681

RobBERT v2 Part-of-Speech (CGN tagset) for Dutch

Model page with instructions: https://huggingface.co/proycon/robbert2-pos-cased-deepfrog-nld

Uses pre-trained model RobBERT v2 (a Roberta model), fine-tuned on part-of-speech tags with the full corpus as also used by Frog. Uses the tag set of Corpus Gesproken Nederlands (CGN), this corpus constitutes a subset of the training data.

f1 = 0.9664560038891591
loss = 0.09085878504153627
precision = 0.9659863945578231
recall = 0.9669260700389105

BERT Part-of-Speech (CGN tagset) for Dutch

Model page with instructions: https://huggingface.co/proycon/bert-pos-cased-deepfrog-nld

Uses pre-trained model BERTje (a BERT model), fine-tuned on part-of-speech tags with the full corpus as also used by Frog. Uses the tag set of Corpus Gesproken Nederlands (CGN), this corpus constitutes a subset of the training data.

Test Evaluation:

f1 = 0.9737354085603113
loss = 0.0647074995296342
precision = 0.9737354085603113
recall = 0.9737354085603113

RobBERT SoNaR1 Named Entities for Dutch

Model page with instructions: https://huggingface.co/proycon/robbert-ner-cased-sonar1-nld

Uses pre-trained model RobBERT (a Roberta model), fine-tuned on Named Entities from the SoNaR1 corpus (as also used by Frog). Provides basic PER,LOC,ORG,PRO,EVE,MISC tags.

Test Evaluation (note: this is a simple token-based evaluation rather than entity based!)

f1 = 0.9170731707317074
loss = 0.023864904676364467
precision = 0.9306930693069307
recall = 0.9038461538461539

Note: the tokenisation in this model is English rather than Dutch

RobBERT v2 SoNaR1 Named Entities for Dutch

Model page with instructions: https://huggingface.co/proycon/robbert2-ner-cased-sonar1-nld

Uses pre-trained model RobBERT (v2) (a Roberta model), fine-tuned on Named Entities from the SoNaR1 corpus (as also used by Frog). Provides basic PER,LOC,ORG,PRO,EVE,MISC tags.

f1 = 0.8878048780487806
loss = 0.03555946223787032
precision = 0.900990099009901
recall = 0.875

BERT SoNaR1 Named Entities for Dutch

Model page with instructions: https://huggingface.co/proycon/bert-ner-cased-sonar1-nld

Uses pre-trained model BERTje (a BERT model), fine-tuned on Named Entities from the SoNaR1 corpus (as also used by Frog). Provides basic PER,LOC,ORG,PRO,EVE,MISC tags.

Test Evaluation (note: this is a simple token-based evaluation rather than entity based!)

f1 = 0.9519230769230769
loss = 0.02323892477299803
precision = 0.9519230769230769
recall = 0.9519230769230769
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].