Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → kanishkamisra → minicons

kanishkamisra / minicons

Licence: MIT License

Utility for analyzing Transformer based representations of language.

Programming Languages

139335 projects - #7 most used programming language

Labels

nlp natural-language-processing transformers language-model

Projects that are alternatives of or similar to minicons

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+8560.71%)

Mutual labels: transformers, language-model

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Stars: ✭ 5,077 (+18032.14%)

Mutual labels: transformers, language-model

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+12075%)

Mutual labels: transformers, language-model

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Stars: ✭ 109 (+289.29%)

Mutual labels: transformers, language-model

KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델

Stars: ✭ 215 (+667.86%)

Mutual labels: transformers, language-model

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Stars: ✭ 39 (+39.29%)

Mutual labels: transformers, language-model

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+717.86%)

Mutual labels: transformers, language-model

language-planner

Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"

Stars: ✭ 84 (+200%)

Mutual labels: transformers, language-model

Contains code for the paper "Vision Transformers are Robust Learners" (AAAI 2022).

Stars: ✭ 78 (+178.57%)

Mutual labels: transformers

Contextualised Embeddings and Language Modelling using BERT and Friends using R

Stars: ✭ 39 (+39.29%)

Mutual labels: transformers

GoEmotions-pytorch

Pytorch Implementation of GoEmotions 😍😢😱

Stars: ✭ 95 (+239.29%)

Mutual labels: transformers

tying-wv-and-wc

Implementation for "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"

Stars: ✭ 39 (+39.29%)

Mutual labels: language-model

A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+

Stars: ✭ 35 (+25%)

Mutual labels: transformers

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Stars: ✭ 390 (+1292.86%)

Mutual labels: language-model

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Stars: ✭ 34 (+21.43%)

Mutual labels: transformers

Using pre-trained BERT models for Chinese and English NER with 🤗Transformers

Stars: ✭ 114 (+307.14%)

Mutual labels: transformers

Multimodal Image Synthesis and Editing: A Survey

Stars: ✭ 214 (+664.29%)

Mutual labels: transformers

tensorflow-with-kenlm

Tensorflow with KenLM integrated for beam search scoring

Stars: ✭ 30 (+7.14%)

Mutual labels: language-model

spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Stars: ✭ 39 (+39.29%)

Mutual labels: transformers

Active Learning for Text Classification in Python

Stars: ✭ 241 (+760.71%)

Mutual labels: transformers

View All Similar Projects ➔

minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models

This repo is a wrapper around the transformers library from hugging face 🤗

Installation

Install from Pypi using:

pip install minicons

Supported Functionality

Extract word representations from Contextualized Word Embeddings
Score sequences using language model scoring techniques, including masked language models following Salazar et al. (2020).

Examples

Extract word representations from contextualized word embeddings:

from minicons import cwe

model = cwe.CWE('bert-base-uncased')

context_words = [("I went to the bank to withdraw money.", "bank"), 
                 ("i was at the bank of the river ganga!", "bank")]

print(model.extract_representation(context_words, layer = 12))

''' 
tensor([[ 0.5399, -0.2461, -0.0968,  ..., -0.4670, -0.5312, -0.0549],
        [-0.8258, -0.4308,  0.2744,  ..., -0.5987, -0.6984,  0.2087]],
       grad_fn=<MeanBackward1>)
'''

Compute sentence acceptability measures (surprisals) using Word Prediction Models:

from minicons import scorer

mlm_model = scorer.MaskedLMScorer('bert-base-uncased', 'cpu')
ilm_model = scorer.IncrementalLMScorer('distilgpt2', 'cpu')

stimuli = ["The keys to the cabinet are on the table.",
           "The keys to the cabinet is on the table."]

# use sequence_score with different reduction options: 
# Sequence Surprisal - lambda x: -x.sum(1)
# Sequence Log-probability - lambda x: x.sum(1)
# Sequence Surprisal, normalized by number of tokens - lambda x: -x.mean(1)
# Sequence Log-probability, normalized by number of tokens - lambda x: x.mean(1)
# and so on...

print(ilm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))

'''
[39.879737854003906, 42.75846481323242]
'''

# MLM scoring, inspired by Salazar et al., 2020
print(mlm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item()))
'''
[13.962685585021973, 23.415111541748047]
'''

Tutorials

Recent Updates

November 6, 2021: MLM scoring has been fixed! You can now use model.token_score() and model.sequence_score() with MaskedLMScorers as well!

Citation

If you use minicons, please cite the following paper:

@article{misra2022minicons,
    title={minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models},
    author={Kanishka Misra},
    journal={arXiv preprint arXiv:2203.13112},
    year={2022}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 28

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗