Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (-93.77%)

Mutual labels: transformers, pretrained-models, bert

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+3.84%)

Mutual labels: transformers, albert, bert

FewCLUE

FewCLUE 小样本学习测评基准，中文版

Stars: ✭ 251 (-89.65%)

Mutual labels: benchmark, chinese, bert

Awesome Pretrained Chinese Nlp Models

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型集合

Stars: ✭ 195 (-91.96%)

Mutual labels: chinese, pretrained-models, nlu

AiSpace

AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0

Stars: ✭ 28 (-98.85%)

Mutual labels: chinese, pretrained-models, bert

KLUE

📖 Korean NLU Benchmark

Stars: ✭ 420 (-82.68%)

Mutual labels: benchmark, bert, roberta

Tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Stars: ✭ 5,077 (+109.36%)

Mutual labels: language-model, transformers, bert

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (-90.56%)

Mutual labels: transformers, language-model, bert

Transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+2198.64%)

Mutual labels: language-model, pretrained-models, bert

Roberta zh

RoBERTa中文预训练模型: RoBERTa for Chinese

Stars: ✭ 1,953 (-19.46%)

Mutual labels: chinese, bert, roberta

KB-ALBERT

KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델

Stars: ✭ 215 (-91.13%)

Mutual labels: transformers, language-model, albert

gap-text2sql

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Stars: ✭ 83 (-96.58%)

Mutual labels: nlu, pretrained-models, language-model

DiscEval

Discourse Based Evaluation of Language Understanding

Stars: ✭ 18 (-99.26%)

Mutual labels: benchmark, glue, bert

roberta-wwm-base-distill

this is roberta wwm base distilled model which was distilled from roberta wwm by roberta wwm large

Stars: ✭ 61 (-97.48%)

Mutual labels: pretrained-models, bert, roberta

Text-Summarization

Abstractive and Extractive Text summarization using Transformers.

Stars: ✭ 38 (-98.43%)

Mutual labels: transformers, bert, roberta

Chineseglue

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard

Stars: ✭ 1,548 (-36.16%)

Mutual labels: glue, albert, bert

MobileQA

离线端阅读理解应用 QA for mobile, Android & iPhone

Stars: ✭ 49 (-97.98%)

Mutual labels: chinese, albert, bert

wechsel

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Stars: ✭ 39 (-98.39%)

Mutual labels: transformers, language-model, bert

erc

Emotion recognition in conversation

Stars: ✭ 34 (-98.6%)

Mutual labels: transformers, bert, roberta

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+40.58%)

Mutual labels: language-model, transformers, bert

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-99.09%)

Mutual labels: benchmark, corpus, bert

Nlp Architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Stars: ✭ 2,768 (+14.14%)

Mutual labels: nlu, transformers, bert

CBLUE

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Stars: ✭ 379 (-84.37%)

Mutual labels: benchmark, corpus, chinese

HugsVision

HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

Stars: ✭ 154 (-93.65%)

Mutual labels: transformers, pretrained-models, bert

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (-95.01%)

Mutual labels: dataset, corpus, nlu

Medical-Names-Corpus

医疗语料库。医疗机构名语料库。药品本位码。

Stars: ✭ 26 (-98.93%)

Mutual labels: corpus, dataset

bert in a flask

A dockerized flask API, serving ALBERT and BERT predictions using TensorFlow 2.0.

Stars: ✭ 32 (-98.68%)

Mutual labels: albert, bert

Sensaturban

🔥Urban-scale point cloud dataset (CVPR 2021)

Stars: ✭ 135 (-94.43%)

Mutual labels: dataset, benchmark

Chinese-Word-Segmentation-in-NLP

State of the art Chinese Word Segmentation with Bi-LSTMs

Stars: ✭ 23 (-99.05%)

Mutual labels: chinese, language-model

CogView

Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".

Stars: ✭ 708 (-70.8%)

Mutual labels: transformers, pretrained-models

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-89.48%)

Mutual labels: dataset, corpus

Tape

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

Stars: ✭ 295 (-87.84%)

Mutual labels: dataset, benchmark

Cluecorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

Stars: ✭ 278 (-88.54%)

Mutual labels: chinese, corpus

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (-94.35%)

Mutual labels: dataset, corpus

Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (-88.33%)

Mutual labels: dataset, nlu

Datasets

A repository of pretty cool datasets that I collected for network science and machine learning research.

Stars: ✭ 302 (-87.55%)

Mutual labels: dataset, benchmark

Deeperforensics 1.0

[CVPR 2020] A Large-Scale Dataset for Real-World Face Forgery Detection

Stars: ✭ 338 (-86.06%)

Mutual labels: dataset, benchmark

Azureml Bert

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

Stars: ✭ 342 (-85.9%)

Mutual labels: language-model, pretrained-models

Bert Pytorch

Google AI 2018 BERT pytorch implementation

Stars: ✭ 4,642 (+91.42%)

Mutual labels: language-model, bert

Medmnist

[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis

Stars: ✭ 338 (-86.06%)

Mutual labels: dataset, benchmark

Chinese Bert Wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

Stars: ✭ 6,357 (+162.14%)

Mutual labels: bert, roberta

Gensim Data

Data repository for pretrained NLP models and NLP corpora.

Stars: ✭ 622 (-74.35%)

Mutual labels: dataset, pretrained-models

Cluener2020

CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition

Stars: ✭ 689 (-71.59%)

Mutual labels: chinese, dataset

Chatito

🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

Stars: ✭ 678 (-72.04%)

Mutual labels: dataset, nlu

Electra

中文预训练 ELECTRA 模型: 基于对抗学习 pretrain Chinese Model

Stars: ✭ 132 (-94.56%)

Mutual labels: language-model, pretrained-models

MaskedFaceRepresentation

Masked face recognition focuses on identifying people using their facial features while they are wearing masks. We introduce benchmarks on face verification based on masked face images for the development of COVID-safe protocols in airports.

Stars: ✭ 17 (-99.3%)

Mutual labels: benchmark, dataset

Pcam

The PatchCamelyon (PCam) deep learning classification benchmark.

Stars: ✭ 340 (-85.98%)

Mutual labels: dataset, benchmark

Nlp Recipes

Natural Language Processing Best Practices & Examples

Stars: ✭ 5,783 (+138.47%)

Mutual labels: pretrained-models, nlu

Caffenet Benchmark

Evaluation of the CNN design choices performance on ImageNet-2012.