All Projects → YC-wind → embedding_study

YC-wind / embedding_study

Licence: other
中文预训练模型生成字向量学习,测试BERT,ELMO的中文效果

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to embedding study

FewCLUE
FewCLUE 小样本学习测评基准,中文版
Stars: ✭ 251 (+167.02%)
Mutual labels:  chinese, bert
SentimentAnalysis
(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset
Stars: ✭ 40 (-57.45%)
Mutual labels:  embeddings, bert
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+6980.85%)
Mutual labels:  chinese, bert
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+0%)
Mutual labels:  chinese, bert
Chinese Word Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Stars: ✭ 9,548 (+10057.45%)
Mutual labels:  embeddings, chinese
MobileQA
离线端阅读理解应用 QA for mobile, Android & iPhone
Stars: ✭ 49 (-47.87%)
Mutual labels:  chinese, bert
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+2479.79%)
Mutual labels:  chinese, bert
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-75.53%)
Mutual labels:  bert, elmo
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+100%)
Mutual labels:  embeddings, bert
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (-63.83%)
Mutual labels:  embeddings, bert
LightLM
高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task
Stars: ✭ 54 (-42.55%)
Mutual labels:  chinese, bert
AiSpace
AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0
Stars: ✭ 28 (-70.21%)
Mutual labels:  chinese, bert
BERT-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
Stars: ✭ 92 (-2.13%)
Mutual labels:  chinese, bert
CLUE pytorch
CLUE baseline pytorch CLUE的pytorch版本基线
Stars: ✭ 72 (-23.4%)
Mutual labels:  chinese, bert
ADL2019
Applied Deep Learning (2019 Spring) @ NTU
Stars: ✭ 20 (-78.72%)
Mutual labels:  bert, elmo
Roberta zh
RoBERTa中文预训练模型: RoBERTa for Chinese
Stars: ✭ 1,953 (+1977.66%)
Mutual labels:  chinese, bert
muse-as-service
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.
Stars: ✭ 45 (-52.13%)
Mutual labels:  embeddings, bert
simple elmo
Simple library to work with pre-trained ELMo models in TensorFlow
Stars: ✭ 49 (-47.87%)
Mutual labels:  embeddings, elmo
NLPDataAugmentation
Chinese NLP Data Augmentation, BERT Contextual Augmentation
Stars: ✭ 94 (+0%)
Mutual labels:  chinese, bert
Probabilistic-RNN-DA-Classifier
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model
Stars: ✭ 22 (-76.6%)
Mutual labels:  embeddings

bert

说明

参考 bert-as-service,只保留生成embedding的代码。

  • 想要了解具体的bert实现,请参考 google-bert

  • 已经预训练好的模型,请访问 模型链接.

  • bert_classification。基于 bert 的分类模型,下游任务(二分类、多分类、多标签分类都可以修改)

ELMO-tf

ELMO : Deep contextualized word representations https://arxiv.org/abs/1802.05365

说明

参考 ELMO-tf,修改部分代码,适应于中文语料。 这位韩国小哥哥写的代码很清晰,相对于原始的实现,可读性好很多。原始的实现需要自行整理,搭建中文处理机制。

word2vec

中文词向量:https://github.com/Embedding/Chinese-Word-Vectors 腾讯词向量:链接:https://pan.baidu.com/s/1meeKUBKbGMyTGrx664F4Ng 密码:xfh1

说明

执行word2vec目录下,word2vec_embedding.py文件即可。

该项目使用腾讯词向量进行 词向量、句向量的计算。

  • 词向量(查表,没有就按字)
  • 句向量(词性加权,词向量,最后求平均)

使用:

下载腾讯词向量,tencent_45000.txt 应该就可以了

其他

#!/usr/bin/env bash

# 重置git的方法,(第二步需要 修改,git add 最好手动确定下)
#1. Checkout
git checkout --orphan latest_branch
#2. Add all the files
git add -A
#3. Commit the changes
git commit -am "commit message"
#4. Delete the branch
git branch -D master
#5.Rename the current branch to master
git branch -m master
#6.Finally, force update your repository
git push -f origin master
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].