This repository contains the code implementation used in the paper Temporally Coherent Embeddings for Self-Supervised Video Representation Learning (TCE).

Stars: ✭ 51 (-53.21%)

Mutual labels: representation-learning, contrastive-learning

wechsel

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Stars: ✭ 39 (-64.22%)

Mutual labels: transformers, language-model

KB-ALBERT

KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델

Stars: ✭ 215 (+97.25%)

Mutual labels: transformers, language-model

PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Stars: ✭ 151 (+38.53%)

Mutual labels: representation-learning, language-model

language-planner

Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"

Stars: ✭ 84 (-22.94%)

Mutual labels: transformers, language-model

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+3027.52%)

Mutual labels: transformers, language-model

CodeT5

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Stars: ✭ 390 (+257.8%)

Mutual labels: representation-learning, language-model

gnn-lspe

Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022

Stars: ✭ 165 (+51.38%)

Mutual labels: transformers, representation-learning

minicons

Utility for analyzing Transformer based representations of language.

Stars: ✭ 28 (-74.31%)

Mutual labels: transformers, language-model

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+110.09%)

Mutual labels: transformers, language-model

object-aware-contrastive

Object-aware Contrastive Learning for Debiased Scene Representation (NeurIPS 2021)

Stars: ✭ 44 (-59.63%)

Mutual labels: representation-learning, contrastive-learning

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+2124.77%)

Mutual labels: transformers, language-model

label-studio-transformers

Label data using HuggingFace's transformers and automatically get a prediction service

Stars: ✭ 117 (+7.34%)

Mutual labels: transformers, natural-language-understanding

text2class

Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT

Stars: ✭ 15 (-86.24%)

Mutual labels: transformers, natural-language-understanding

View All Similar Projects ➔

COCO-LM

This repository contains the scripts for fine-tuning COCO-LM pretrained models on GLUE and SQuAD 2.0 benchmarks.

Paper: COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Overview

We provide the scripts in two versions, based on two widely-used open-source codebases, the Fairseq Library and the Huggingface Transformers Library. The two code versions are mostly equivalent in functionality, and you are free to use either of them. However, we note that the fairseq version is what we used in our experiments, and it will best reproduce the results in the paper; the huggingface version is implemented later to provide compatibility with the Huggingface Transformers Library, and may yield slightly different results.

Please follow the README files under the two directories for running the code.

GLUE Fine-Tuning Results

The General Language Understanding Evaluation (GLUE) benchmark is a collection of sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.

GLUE dev set results of COCO-LM base++ and large++ models are as follows (median of 5 different random seeds):

Model	MNLI-m/mm	QQP	QNLI	SST-2	CoLA	RTE	MRPC	STS-B	AVG
COCO-LM base++	90.2/90.0	92.2	94.2	94.6	67.3	87.4	91.2	91.8	88.6
COCO-LM large++	91.4/91.6	92.8	95.7	96.9	73.9	91.0	92.2	92.7	90.8

GLUE test set results of COCO-LM base++ and large++ models are as follows (no ensemble, task-specific tricks, etc.):

Model	MNLI-m/mm	QQP	QNLI	SST-2	CoLA	RTE	MRPC	STS-B	AVG
COCO-LM base++	89.8/89.3	89.8	94.2	95.6	68.6	82.3	88.5	90.3	87.4
COCO-LM large++	91.6/91.1	90.5	95.8	96.7	70.5	89.2	88.4	91.8	89.3

SQuAD 2.0 Fine-Tuning Results

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

SQuAD 2.0 dev set results of COCO-LM base++ and large++ models are as follows (median of 5 different random seeds):

Model	EM	F1
COCO-LM base++	85.4	88.1
COCO-LM large++	88.2	91.0

Citation

If you find the code and models useful for your research, please cite the following paper:

@inproceedings{meng2021cocolm,
  title={{COCO-LM}: Correcting and contrasting text sequences for language model pretraining},
  author={Meng, Yu and Xiong, Chenyan and Bajaj, Payal and Tiwary, Saurabh and Bennett, Paul and Han, Jiawei and Song, Xia},
  booktitle={Conference on Neural Information Processing Systems},
  year={2021}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

microsoft / COCO-LM

Programming Languages

Labels

Projects that are alternatives of or similar to COCO-LM

COCO-LM

Overview

GLUE Fine-Tuning Results

SQuAD 2.0 Fine-Tuning Results

Citation

Contributing