ZerothKaldi-based Korean ASR (한국어 음성인식) open-source project
Mead BaselineDeep-Learning Model Exploration and Development for NLP
Xlnet zh中文预训练XLNet模型: Pre-Trained Chinese XLNet_Large
Pytorch NceThe Noise Contrastive Estimation for softmax output written in Pytorch
Attention MechanismsImplementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Gpt ScrollsA collaborative collection of open-source safe GPT-3 prompts that work well
Char Rnn ChineseMulti-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
Nlp learning结合python一起学习自然语言处理 (nlp): 语言模型、HMM、PCFG、Word2vec、完形填空式阅读理解任务、朴素贝叶斯分类器、TFIDF、PCA、SVD
Keras BertImplementation of BERT that could load official pre-trained models for feature extraction and prediction
OptimusOptimus: the first large-scale pre-trained VAE language model
MacbertRevisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP)
Gpt NeoAn implementation of model parallel GPT2& GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library.
Indic BertBERT-based Multilingual Model for Indian Languages
LazynlpLibrary to scrape and clean web pages to create massive datasets.
Lotclass[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
Keras XlnetImplementation of XLNet that can load pretrained checkpoints
Transformer LmTransformer language model (GPT-2) with sentencepiece tokenizer
SpeechtAn opensource speech-to-text software written in tensorflow
Electra pytorchPretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
Awd Lstm LmLSTM and QRNN Language Model Toolkit for PyTorch
Ld NetEfficient Contextualized Representation: Language Model Pruning for Sequence Labeling
TupeTransformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Electra中文 预训练 ELECTRA 模型: 基于对抗学习 pretrain Chinese Model
Chars2vecCharacter-based word embeddings model based on RNN for handling real world texts
RobbertA Dutch RoBERTa-based language model
Haystack🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Lingopackage lingo provides the data structures and algorithms required for natural language processing
GetlangNatural language detection package in pure Go
Transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Easy BertA Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Openseq2seqToolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Pytorch gbw lmPyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
PycluePython toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
TongramsA C++ library providing fast language model queries in compressed space.
Bit RnnQuantize weights and activations in Recurrent Neural Networks.
Pytorch Openai Transformer Lm🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
Greek BertA Greek edition of BERT pre-trained language model
Cross Domain nerCross-domain NER using cross-domain language modeling, code for ACL 2019 paper
Gpt2PyTorch Implementation of OpenAI GPT-2
PhonlpPhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)