Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+371.88%)

Mutual labels: pretrained-models, bert

vietnamese-roberta

A Robustly Optimized BERT Pretraining Approach for Vietnamese

Stars: ✭ 22 (-31.25%)

Mutual labels: pretrained-models, bert

banglabert

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…

Stars: ✭ 186 (+481.25%)

Mutual labels: bert, xlm-roberta

lemma

A Morphological Parser (Analyser) / Lemmatizer written in Elixir.

Stars: ✭ 45 (+40.63%)

Mutual labels: morphology, lemmatization

transformer-models

Deep Learning Transformer models in MATLAB

Stars: ✭ 90 (+181.25%)

Mutual labels: pretrained-models, bert

udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.

Stars: ✭ 15 (-53.12%)

Mutual labels: lemmatization

finetuner

Finetuning any DNN for better embedding on neural search tasks

Stars: ✭ 442 (+1281.25%)

Mutual labels: pretrained-models

Morphos-Blade

Morphos adapter for Blade

Stars: ✭ 32 (+0%)

Mutual labels: morphology

OpenDialog

An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统，一键部署微信闲聊机器人)

Stars: ✭ 94 (+193.75%)

Mutual labels: bert

View All Similar Projects ➔

SyntaxDot

Introduction

SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, SyntaxDot can be used to perform any sequence labeling task, but so far the focus has been on:

Part-of-speech tagging
Morphological tagging
Topological field tagging
Lemmatization
Named entity recognition

The easiest way to get started with SyntaxDot is to use a pretrained sticker2 model (SyntaxDot is currently compatbile with sticker2 models).

Features

Input representations:
- Word pieces
- Sentence pieces
Flexible sequence encoder/decoder architecture, which supports:
- Simple sequence labels (e.g. POS, morphology, named entities)
- Lemmatization, based on edit trees
- Simple API to extend to other tasks
- Dependency parsing as sequence labeling
Dependency parsing using deep biaffine attention and MST decoding.
Multi-task training and classification using scalar weighting.
Encoder models:
- Transformers
- Finetuning of BERT, XLM-RoBERTa, ALBERT, and SqueezeBERT models
Model distillation
Deployment:
- Standalone binary that links against PyTorch's libtorch
- Very liberal license

Documentation

References

SyntaxDot uses techniques from or was inspired by the following papers:

The biaffine dependency parsing layer is based on Deep biaffine attention for neural dependency parsing. Timothy Dozat and Christopher Manning, ICLR 2017.
The model architecture and training regime was largely based on 75 Languages, 1 Model: Parsing Universal Dependencies Universally. Dan Kondratyuk and Milan Straka, 2019, Proceedings of the EMNLP 2019 and the 9th IJCNLP.
The tagging as sequence labeling scheme was proposed by Dependency Parsing as a Sequence Labeling Task. Drahomíra Spoustová, Miroslav Spousta, 2010, The Prague Bulletin of Mathematical Linguistics, Volume 94.
The idea to combine this scheme with neural networks comes from Viable Dependency Parsing as Sequence Labeling. Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez, 2019, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The encoding of lemmatization as edit trees was proposed in Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupała, 2008, PhD dissertation, Dublin City University.

Issues

You can report bugs and feature requests in the SyntaxDot issue tracker.

License

For licensing information, see COPYRIGHT.md.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

tensordot / syntaxdot

Programming Languages

Labels

Projects that are alternatives of or similar to syntaxdot

SyntaxDot

Introduction

Features

Documentation

References

Issues

License