LingvoLingvo
Stars: ✭ 2,361 (+6645.71%)
lm-scorer📃Language Model based sentences scoring library
Stars: ✭ 264 (+654.29%)
LM-CNLCChinese Natural Language Correction via Language Model
Stars: ✭ 15 (-57.14%)
FNet-pytorchUnofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+482.86%)
PCPMPresenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-40%)
CoLAKECOLING'2020: CoLAKE: Contextualized Language and Knowledge Embedding
Stars: ✭ 86 (+145.71%)
gpt-jA GPT-J API to use with python3 to generate text, blogs, code, and more
Stars: ✭ 101 (+188.57%)
frogFrog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+100%)
php-ntlmMessage encoder/decoder and password hasher for the NTLM authentication protocol
Stars: ✭ 14 (-60%)
mlmachine learning
Stars: ✭ 29 (-17.14%)
nytwitNew York Times Word Innovation Types dataset
Stars: ✭ 21 (-40%)
mongolian-nlpUseful resources for Mongolian NLP
Stars: ✭ 119 (+240%)
tying-wv-and-wcImplementation for "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"
Stars: ✭ 39 (+11.43%)
datalinguistStanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+165.71%)
foliaFoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (+60%)
gdcCode for the ICLR 2021 paper "A Distributional Approach to Controlled Text Generation"
Stars: ✭ 94 (+168.57%)
eflmEfficient Fitting of Linear and Generalized Linear Models by using just base R. The speed gains over lm and glm are obtained by reducing the NxP model matrix to a PxP matrix, and the best computational performance is obtained when R is linked against OpenBLAS, Intel MKL or other optimized BLAS library.
Stars: ✭ 14 (-60%)
uctoUnicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
Stars: ✭ 58 (+65.71%)
wikipronMassively multilingual pronunciation mining
Stars: ✭ 167 (+377.14%)
subword-lstm-lmLSTM Language Model with Subword Units Input Representations
Stars: ✭ 45 (+28.57%)
gravityR package that provides estimation methods for Gravity Models
Stars: ✭ 24 (-31.43%)
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-40%)
dasher-webDasher text entry in HTML, CSS, JavaScript, and SVG
Stars: ✭ 34 (-2.86%)
open clipAn open source implementation of CLIP.
Stars: ✭ 1,534 (+4282.86%)
cscgCode Generation as a Dual Task of Code Summarization.
Stars: ✭ 28 (-20%)
miniconsUtility for analyzing Transformer based representations of language.
Stars: ✭ 28 (-20%)
citation-functionMeasuring the Evolution of a Scientific Field through Citation Frames
Stars: ✭ 40 (+14.29%)
CodeT5Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
Stars: ✭ 390 (+1014.29%)
gpt-j-apiAPI for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend
Stars: ✭ 248 (+608.57%)
bangla-bertBangla-Bert is a pretrained bert model for Bengali language
Stars: ✭ 41 (+17.14%)
embeddingsEmbeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polish Language
Stars: ✭ 27 (-22.86%)
word2vec-tsneGoogle News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (+68.57%)
minGPT-TFA minimal TF2 re-implementation of the OpenAI GPT training
Stars: ✭ 36 (+2.86%)
Word-Prediction-NgramNext Word Prediction using n-gram Probabilistic Model with various Smoothing Techniques
Stars: ✭ 25 (-28.57%)
wechselCode for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
Stars: ✭ 39 (+11.43%)
CISTEMStemmer for German
Stars: ✭ 33 (-5.71%)
language-plannerOfficial Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
Stars: ✭ 84 (+140%)
KoLMKorean text normalization and language preparation package for LM in Kaldi-based ASR system
Stars: ✭ 46 (+31.43%)
SentimentAnalysisSentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-8.57%)
backpropBackprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+554.29%)
lxa5Linguistica 5: Unsupervised Learning of Linguistic Structure
Stars: ✭ 27 (-22.86%)
mlp-gpt-jaxA GPT, made only of MLPs, in Jax
Stars: ✭ 53 (+51.43%)
Black-Box-TuningICML'2022: Black-Box Tuning for Language-Model-as-a-Service
Stars: ✭ 99 (+182.86%)
pyVHDLParserStreaming based VHDL parser.
Stars: ✭ 51 (+45.71%)
datastories-semeval2017-task6Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-42.86%)
foliapyAn extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Stars: ✭ 13 (-62.86%)
sembei🍘 単語分割を経由しない単語埋め込み 🍘
Stars: ✭ 14 (-60%)
SDLM-pytorchCode accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts
Stars: ✭ 27 (-22.86%)