All Projects → KwangKa → SIMCSE_unsup

KwangKa / SIMCSE_unsup

Licence: MIT license
中文无监督SimCSE Pytorch实现

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SIMCSE unsup

Awesome Pretrained Chinese Nlp Models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型集合
Stars: ✭ 195 (+72.57%)
Mutual labels:  chinese
Medical Books
Open sourece medical books in LaTeX. LaTeX写的中文开源医学书籍
Stars: ✭ 211 (+86.73%)
Mutual labels:  chinese
Cnn Text Classification Tf Chinese
CNN for Chinese Text Classification in Tensorflow
Stars: ✭ 237 (+109.73%)
Mutual labels:  chinese
Awesome Books
📚 开发者推荐阅读的书籍
Stars: ✭ 2,740 (+2324.78%)
Mutual labels:  chinese
Machine Learning Yearning Chinese Ver
(完结)Andrew NG Machine-Learning-Yearning translation documents(吴恩达《Machine Learning Yearning》中文翻译及英文原稿)
Stars: ✭ 209 (+84.96%)
Mutual labels:  chinese
Kivy Cn
A Chinese Translation of Kivy Programming Guide Based on Kivy 1.9.2 中文翻译Kivy开发文档
Stars: ✭ 219 (+93.81%)
Mutual labels:  chinese
Char Rnn Chinese
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
Stars: ✭ 192 (+69.91%)
Mutual labels:  chinese
Zh Ner Keras
details
Stars: ✭ 252 (+123.01%)
Mutual labels:  chinese
Somiao Pinyin
Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法
Stars: ✭ 209 (+84.96%)
Mutual labels:  chinese
Gpt2 Newstitle
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。
Stars: ✭ 235 (+107.96%)
Mutual labels:  chinese
Icychesszero
中国象棋alpha zero程序
Stars: ✭ 206 (+82.3%)
Mutual labels:  chinese
Nlp4han
中文自然语言处理工具集【断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查】
Stars: ✭ 206 (+82.3%)
Mutual labels:  chinese
Jshistory Cn
🇨🇳 《JavaScript 二十年》中文版
Stars: ✭ 3,686 (+3161.95%)
Mutual labels:  chinese
Weibo terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+1930.97%)
Mutual labels:  chinese
Finance Python
python tools for Finance with the functionality of indicator calculation, business day calculation and so on.
Stars: ✭ 238 (+110.62%)
Mutual labels:  chinese
Book
文言陰符 An Introduction to Programming in Wenyan Language
Stars: ✭ 194 (+71.68%)
Mutual labels:  chinese
Nonflowers
Procedurally generated paintings of nonexistent flowers.
Stars: ✭ 208 (+84.07%)
Mutual labels:  chinese
Shan Shui Inf
Procedurally generated Chinese landscape painting.
Stars: ✭ 3,168 (+2703.54%)
Mutual labels:  chinese
Chinese text normalization
Chinese text normalization for speech processing
Stars: ✭ 242 (+114.16%)
Mutual labels:  chinese
Tms
基于频道模式的团队沟通协作+轻量级任务看板,支持mardown、富文本、在线表格和思维导图的团队博文wiki,i18n国际化翻译管理的响应式web开源团队协作系统。
Stars: ✭ 232 (+105.31%)
Mutual labels:  chinese

A PyTorch implementation of unsupervised SimCSE

SimCSE: Simple Contrastive Learning of Sentence Embeddings


1. 用法

无监督训练

python train_unsup.py ./data/news_title.txt ./path/to/huggingface_pretrained_model

详细参数

python train_unsup.py -h

相似文本检索测试

python test_unsup.py
query title:
基金亏损路未尽 后市看法仍偏谨慎

sim title:
基金亏损路未尽 后市看法仍偏谨慎
海通证券:私募对后市看法偏谨慎
连塑基本面不容乐观 后市仍有下行空间
基金谨慎看待后市行情
稳健投资者继续保持观望 市场走势还未明朗
下半年基金投资谨慎乐观
华安基金许之彦:下半年谨慎乐观
楼市主导 期指后市不容乐观
基金公司谨慎看多明年市
前期乐观预期被否 基金重归谨慎

STS-B数据集训练和测试

中文STS-B数据集,详情见这里

# 训练
python train_unsup.py ./data/STS-B/cnsd-sts-train_unsup.txt

# 验证
python eval_unsup.py
模型 STS-B dev STS-B test
hfl/chinese-bert-wwm-ext 0.3326 0.3209
simcse 0.7499 0.6909

与苏剑林的实验结果接近,BERT-P1是0.3465,SIMCSE是0.6904

2. 参考

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].