All Projects โ†’ BM-K โ†’ KoSentenceBERT-ETRI

BM-K / KoSentenceBERT-ETRI

Licence: other
๐ŸŒท Sentence Embeddings using Siamese ETRI KoBERT-Networks

Programming Languages

python
139335 projects - #7 most used programming language
CSS
56736 projects
SCSS
7915 projects

Projects that are alternatives of or similar to KoSentenceBERT-ETRI

event-embedding-multitask
*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach
Stars: โœญ 22 (-84.4%)
Mutual labels:  sentence-similarity
siamese dssm
siamese dssm sentence_similarity sentece_similarity_rank tensorflow
Stars: โœญ 59 (-58.16%)
Mutual labels:  sentence-similarity
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: โœญ 99 (-29.79%)
Mutual labels:  sentence-similarity
spacy-sentence-bert
Sentence transformers models for SpaCy
Stars: โœญ 88 (-37.59%)
Mutual labels:  sentence-bert
sentences-similarity-cluster
Calculate similarity of sentences & Cluster the result.
Stars: โœญ 14 (-90.07%)
Mutual labels:  sentence-similarity
abcnn pytorch
Implementation of ABCNN(Attention-Based Convolutional Neural Network) on Pytorch
Stars: โœญ 35 (-75.18%)
Mutual labels:  sentence-similarity
SentenceSimilarity
The enhanced RCNN model used for sentence similarity classification
Stars: โœญ 41 (-70.92%)
Mutual labels:  sentence-similarity
Siamese-Recurrent-Architectures
Usage of Siamese Recurrent Neural network architectures for semantic textual similarity
Stars: โœญ 19 (-86.52%)
Mutual labels:  sentence-similarity
CHIP2018
CHIP2018้—ฎๅฅๅŒน้…ๅคง่ต› Rank6่งฃๅ†ณๆ–นๆกˆ
Stars: โœญ 20 (-85.82%)
Mutual labels:  sentence-similarity
MP-CNN-Variants
Variants of Multi-Perspective Convolutional Neural Networks
Stars: โœญ 22 (-84.4%)
Mutual labels:  sentence-similarity
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: โœญ 69 (-51.06%)
Mutual labels:  sentence-similarity

Ko-Sentence-BERT

๐ŸŒท Korean SentenceBERT : Sentence Embeddings using Siamese ETRI KoBERT-Networks

Note
๋‹ค์–‘ํ•œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ๋ฐ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.
[Sentence-Embedding-Is-All-You-Need]

Installation

  • ETRI KorBERT๋Š” transformers 2.4.1 ~ 2.8.0์—์„œ๋งŒ ๋™์ž‘ํ•˜๊ณ  Sentence-BERT๋Š” 3.1.0 ๋ฒ„์ „ ์ด์ƒ์—์„œ ๋™์ž‘ํ•˜์—ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ˆ˜์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • huggingface transformer, sentence transformers, tokenizers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ฝ”๋“œ๋ฅผ ์ง์ ‘ ์ˆ˜์ •ํ•˜๋ฏ€๋กœ ๊ฐ€์ƒํ™˜๊ฒฝ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ์‚ฌ์šฉํ•œ Docker image๋Š” Docker Hub์— ์ฒจ๋ถ€ํ•ฉ๋‹ˆ๋‹ค.
  • ETRI KoBERT๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€๊ณ  ๋ณธ ๋ ˆํŒŒ์ง€ํ† ๋ฆฌ์—์„  ETRI KoBERT๋ฅผ ์ œ๊ณตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • SKT KoBERT๋ฅผ ์‚ฌ์šฉํ•œ ๋ฒ„์ „์€ ๋‹ค์Œ ๋ ˆํŒŒ์ง€ํ† ๋ฆฌ์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
git clone https://github.com/BM-K/KoSentenceBERT.git
python -m venv .KoSBERT
. .KoSBERT/bin/activate
pip install -r requirements.txt
  • transformer, tokenizers, sentence_transformers ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ .KoSBERT/lib/python3.7/site-packages/ ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ETRI_KoBERT ๋ชจ๋ธ๊ณผ tokenizer๊ฐ€ KoSentenceBERT ๋””๋ ‰ํ† ๋ฆฌ ์•ˆ์— ์กด์žฌํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ETRI ๋ชจ๋ธ๊ณผ tokenizer๋Š” ๋‹ค์Œ ์˜ˆ์‹œ์™€ ๊ฐ™์ด ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค :
from ETRI_tok.tokenization_etri_eojeol import BertTokenizer
self.auto_model = BertModel.from_pretrained('./ETRI_KoBERT/003_bert_eojeol_pytorch') 
self.tokenizer = BertTokenizer.from_pretrained('./ETRI_KoBERT/003_bert_eojeol_pytorch/vocab.txt', do_lower_case=False)

Train Models

  • ๋ชจ๋ธ ํ•™์Šต์„ ์›ํ•˜์‹œ๋ฉด KoSentenceBERT ๋””๋ ‰ํ† ๋ฆฌ ์•ˆ์— KorNLUDatasets์ด ์กด์žฌํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • STS ํ•™์Šต ์‹œ ๋ชจ๋ธ ๊ตฌ์กฐ์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๋ฐ์ดํ„ฐ์™€ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค :

    KoSentenceBERT/KorNLUDatasets/KorSTS/tune_test.tsv

    STS test ๋ฐ์ดํ„ฐ์…‹์˜ ์ผ๋ถ€
python training_nli.py      # NLI ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต
python training_sts.py      # STS ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต
python con_training_sts.py  # NLI ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต ํ›„ STS ๋ฐ์ดํ„ฐ๋กœ Fine-Tuning

Pre-Trained Models

pooling mode๋Š” MEAN-strategy๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ํ•™์Šต์‹œ ๋ชจ๋ธ์€ output ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅ ๋ฉ๋‹ˆ๋‹ค.

๋””๋ ‰ํ† ๋ฆฌ ํ•™์Šต๋ฐฉ๋ฒ•
training_nli_ETRI_KoBERT-003_bert_eojeol Only Train NLI
training_sts_ETRI_KoBERT-003_bert_eojeol Only Train STS
training_nli_sts_ETRI_KoBERT-003_bert_eojeol STS + NLI

Performance

  • Seed ๊ณ ์ •, test set
Model Cosine Pearson Cosine Spearman Euclidean Pearson Euclidean Spearman Manhattan Pearson Manhattan Spearman Dot Pearson Dot Spearman
NLl 67.96 70.45 71.06 70.48 71.17 70.51 64.87 63.04
STS 80.43 79.99 78.18 78.03 78.13 77.99 73.73 73.40
STS + NLI 80.10 80.42 79.14 79.28 79.08 79.22 74.46 74.16
  • Performance comparison with other models [KLUE-PLMs].

Application Examples

  • ์ƒ์„ฑ ๋œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์„ ๋‹ค์šด ์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
  • STS + NLI pretrained ๋ชจ๋ธ์„ ํ†ตํ•ด ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Semantic Search

SemanticSearch.py๋Š” ์ฃผ์–ด์ง„ ๋ฌธ์žฅ๊ณผ ์œ ์‚ฌํ•œ ๋ฌธ์žฅ์„ ์ฐพ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.
๋จผ์ € Corpus์˜ ๋ชจ๋“  ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

from sentence_transformers import SentenceTransformer, util
import numpy as np

model_path = './output/training_nli_sts_ETRI_KoBERT-003_bert_eojeol'

embedder = SentenceTransformer(model_path)

# Corpus with example sentences
corpus = ['ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค.',
          '๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค.',
          'ํ•œ ์—ฌ์ž๊ฐ€ ๋ฐ”์ด์˜ฌ๋ฆฐ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค.',
          '์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.']

corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)

# Query sentences:
queries = ['ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.',
           '๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.',
           '์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.']

# Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity
top_k = 5
for query in queries:
    query_embedding = embedder.encode(query, convert_to_tensor=True)
    cos_scores = util.pytorch_cos_sim(query_embedding, corpus_embeddings)[0]
    cos_scores = cos_scores.cpu()

    #We use np.argpartition, to only partially sort the top_k results
    top_results = np.argpartition(-cos_scores, range(top_k))[0:top_k]

    print("\n\n======================\n\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")

    for idx in top_results[0:top_k]:
        print(corpus[idx].strip(), "(Score: %.4f)" % (cos_scores[idx]))
        


๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค :

========================


Query: ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.

Top 5 most similar sentences in corpus:
ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค. (Score: 0.7557)
ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค. (Score: 0.6464)
ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค. (Score: 0.2565)
ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค. (Score: 0.2333)
๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค. (Score: 0.1792)


========================


Query: ๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.

Top 5 most similar sentences in corpus:
์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค. (Score: 0.6732)
์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค. (Score: 0.3401)
๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค. (Score: 0.1037)
ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค. (Score: 0.0617)
๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค. (Score: 0.0466)


=======================


Query: ์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.

Top 5 most similar sentences in corpus:
์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค. (Score: 0.7164)
๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค. (Score: 0.3216)
์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค. (Score: 0.2071)
ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค. (Score: 0.1089)
ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค. (Score: 0.0724)

Clustering

Clustering.py๋Š” ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ์œ ์‚ฌ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์œ ์‚ฌํ•œ ๋ฌธ์žฅ์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๋Š” ์˜ˆ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
์ด์ „๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋จผ์ € ๊ฐ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

from sentence_transformers import SentenceTransformer, util
import numpy as np

model_path = './output/training_nli_sts_ETRI_KoBERT-003_bert_eojeol'

embedder = SentenceTransformer(model_path)

# Corpus with example sentences
corpus = ['ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค.',
          '๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค.',
          'ํ•œ ์—ฌ์ž๊ฐ€ ๋ฐ”์ด์˜ฌ๋ฆฐ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค.',
          '์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.',
          '๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.',
          '์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.']

corpus_embeddings = embedder.encode(corpus)

# Then, we perform k-means clustering using sklearn:
from sklearn.cluster import KMeans

num_clusters = 5
clustering_model = KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings)
cluster_assignment = clustering_model.labels_

clustered_sentences = [[] for i in range(num_clusters)]
for sentence_id, cluster_id in enumerate(cluster_assignment):
    clustered_sentences[cluster_id].append(corpus[sentence_id])

for i, cluster in enumerate(clustered_sentences):
    print("Cluster ", i+1)
    print(cluster)
    print("")

๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค :

Cluster  1
['๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค.', '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.', '์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.']

Cluster  2
['ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค.', 'ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค.']

Cluster  3
['ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค.', 'ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค.', 'ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.']

Cluster  4
['๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค.', 'ํ•œ ์—ฌ์ž๊ฐ€ ๋ฐ”์ด์˜ฌ๋ฆฐ์„ ์—ฐ์ฃผํ•œ๋‹ค.']

Cluster  5
['์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.', '๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.']

Downstream Tasks Demo




Citing

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "http://arxiv.org/abs/1908.10084",
}
@article{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    journal= "arXiv preprint arXiv:2004.09813",
    month = "04",
    year = "2020",
    url = "http://arxiv.org/abs/2004.09813",
}
@article{ham2020kornli,
  title={KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding},
  author={Ham, Jiyeon and Choe, Yo Joong and Park, Kyubyong and Choi, Ilji and Soh, Hyungjoon},
  journal={arXiv preprint arXiv:2004.03289},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].