pd3f🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Stars: ✭ 132 (+238.46%)
Pytorch NceThe Noise Contrastive Estimation for softmax output written in Pytorch
Stars: ✭ 204 (+423.08%)
mlmachine learning
Stars: ✭ 29 (-25.64%)
asr2424-hour Automatic Speech Recognition
Stars: ✭ 27 (-30.77%)
Bert As Language Modelbert as language model, fork from https://github.com/google-research/bert
Stars: ✭ 185 (+374.36%)
wechselCode for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
Stars: ✭ 39 (+0%)
ZerothKaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (+535.9%)
mongolian-nlpUseful resources for Mongolian NLP
Stars: ✭ 119 (+205.13%)
Gpt ScrollsA collaborative collection of open-source safe GPT-3 prompts that work well
Stars: ✭ 195 (+400%)
dasher-webDasher text entry in HTML, CSS, JavaScript, and SVG
Stars: ✭ 34 (-12.82%)
CharLMCharacter-aware Neural Language Model implemented by PyTorch
Stars: ✭ 32 (-17.95%)
MacbertRevisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP)
Stars: ✭ 167 (+328.21%)
Vaaku2VecLanguage Modeling and Text Classification in Malayalam Language using ULMFiT
Stars: ✭ 68 (+74.36%)
TF-NNLM-TKA toolkit for neural language modeling using Tensorflow including basic models like RNNs and LSTMs as well as more advanced models.
Stars: ✭ 20 (-48.72%)
Relational Rnn PytorchAn implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.
Stars: ✭ 236 (+505.13%)
language-plannerOfficial Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
Stars: ✭ 84 (+115.38%)
LingvoLingvo
Stars: ✭ 2,361 (+5953.85%)
CoLAKECOLING'2020: CoLAKE: Contextualized Language and Knowledge Embedding
Stars: ✭ 86 (+120.51%)
Keras BertImplementation of BERT that could load official pre-trained models for feature extraction and prediction
Stars: ✭ 2,264 (+5705.13%)
swig-srilmSWIG Wrapper for the SRILM toolkit
Stars: ✭ 33 (-15.38%)
personality-predictionExperiments for automated personality detection using Language Models and psycholinguistic features on various famous personality datasets including the Essays dataset (Big-Five)
Stars: ✭ 109 (+179.49%)
Gpt NeoAn implementation of model parallel GPT2& GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library.
Stars: ✭ 1,252 (+3110.26%)
gdcCode for the ICLR 2021 paper "A Distributional Approach to Controlled Text Generation"
Stars: ✭ 94 (+141.03%)
calmContext Aware Language Models
Stars: ✭ 29 (-25.64%)
Black-Box-TuningICML'2022: Black-Box Tuning for Language-Model-as-a-Service
Stars: ✭ 99 (+153.85%)
KB-ALBERTKB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델
Stars: ✭ 215 (+451.28%)
minGPT-TFA minimal TF2 re-implementation of the OpenAI GPT training
Stars: ✭ 36 (-7.69%)
rnn-theanoRNN(LSTM, GRU) in Theano with mini-batch training; character-level language models in Theano
Stars: ✭ 68 (+74.36%)
FNet-pytorchUnofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+423.08%)
COCO-LM[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+179.49%)
PCPMPresenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-46.15%)
PLBARTOfficial code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
Stars: ✭ 151 (+287.18%)
open clipAn open source implementation of CLIP.
Stars: ✭ 1,534 (+3833.33%)
Mead BaselineDeep-Learning Model Exploration and Development for NLP
Stars: ✭ 238 (+510.26%)
subword-lstm-lmLSTM Language Model with Subword Units Input Representations
Stars: ✭ 45 (+15.38%)
Xlnet zh中文预训练XLNet模型: Pre-Trained Chinese XLNet_Large
Stars: ✭ 207 (+430.77%)
gpt-jA GPT-J API to use with python3 to generate text, blogs, code, and more
Stars: ✭ 101 (+158.97%)
Attention MechanismsImplementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (+420.51%)
backpropBackprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+487.18%)
cscgCode Generation as a Dual Task of Code Summarization.
Stars: ✭ 28 (-28.21%)
Char Rnn ChineseMulti-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
Stars: ✭ 192 (+392.31%)
mlp-gpt-jaxA GPT, made only of MLPs, in Jax
Stars: ✭ 53 (+35.9%)
Nlp learning结合python一起学习自然语言处理 (nlp): 语言模型、HMM、PCFG、Word2vec、完形填空式阅读理解任务、朴素贝叶斯分类器、TFIDF、PCA、SVD
Stars: ✭ 188 (+382.05%)
Bert Sklearna sklearn wrapper for Google's BERT model
Stars: ✭ 182 (+366.67%)
OptimusOptimus: the first large-scale pre-trained VAE language model
Stars: ✭ 180 (+361.54%)
gpt-j-apiAPI for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend
Stars: ✭ 248 (+535.9%)
lm-scorer📃Language Model based sentences scoring library
Stars: ✭ 264 (+576.92%)
CodeT5Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
Stars: ✭ 390 (+900%)
Word-Prediction-NgramNext Word Prediction using n-gram Probabilistic Model with various Smoothing Techniques
Stars: ✭ 25 (-35.9%)
LM-CNLCChinese Natural Language Correction via Language Model
Stars: ✭ 15 (-61.54%)
gap-text2sqlGAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Stars: ✭ 83 (+112.82%)