YC-wind / embedding_study

Licence: other

中文预训练模型生成字向量学习，测试BERT，ELMO的中文效果

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to embedding study

FewCLUE

FewCLUE 小样本学习测评基准，中文版

Stars: ✭ 251 (+167.02%)

Mutual labels: chinese, bert

SentimentAnalysis

(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset

Stars: ✭ 40 (-57.45%)

Mutual labels: embeddings, bert

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+6980.85%)

Mutual labels: chinese, bert

OpenDialog

An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统，一键部署微信闲聊机器人)

Stars: ✭ 94 (+0%)

Mutual labels: chinese, bert

Chinese Word Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Stars: ✭ 9,548 (+10057.45%)

Mutual labels: embeddings, chinese

MobileQA

离线端阅读理解应用 QA for mobile, Android & iPhone

Stars: ✭ 49 (-47.87%)

Mutual labels: chinese, bert

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+2479.79%)

Mutual labels: chinese, bert

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-75.53%)

Mutual labels: bert, elmo

text2text

Text2Text: Cross-lingual natural language processing and generation toolkit

Stars: ✭ 188 (+100%)

Mutual labels: embeddings, bert

Text and Audio classification with Bert

Text Classification in Turkish Texts with Bert

Stars: ✭ 34 (-63.83%)

Mutual labels: embeddings, bert

LightLM

高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task

Stars: ✭ 54 (-42.55%)

Mutual labels: chinese, bert

AiSpace

AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0

Stars: ✭ 28 (-70.21%)

Mutual labels: chinese, bert

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.

Stars: ✭ 92 (-2.13%)

Mutual labels: chinese, bert

CLUE pytorch

CLUE baseline pytorch CLUE的pytorch版本基线

Stars: ✭ 72 (-23.4%)

Mutual labels: chinese, bert

ADL2019

Applied Deep Learning (2019 Spring) @ NTU

Stars: ✭ 20 (-78.72%)

Mutual labels: bert, elmo

Roberta zh

RoBERTa中文预训练模型: RoBERTa for Chinese

Stars: ✭ 1,953 (+1977.66%)

Mutual labels: chinese, bert

muse-as-service

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

Stars: ✭ 45 (-52.13%)

Mutual labels: embeddings, bert

simple elmo

Simple library to work with pre-trained ELMo models in TensorFlow

Stars: ✭ 49 (-47.87%)

Mutual labels: embeddings, elmo

NLPDataAugmentation

Chinese NLP Data Augmentation， BERT Contextual Augmentation

Stars: ✭ 94 (+0%)

Mutual labels: chinese, bert

Probabilistic-RNN-DA-Classifier

Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model

Stars: ✭ 22 (-76.6%)

Mutual labels: embeddings

View All Similar Projects ➔

bert

说明

参考 bert-as-service,只保留生成embedding的代码。

想要了解具体的bert实现，请参考 google-bert
已经预训练好的模型，请访问模型链接.
bert_classification。基于 bert 的分类模型，下游任务（二分类、多分类、多标签分类都可以修改）

ELMO-tf

ELMO : Deep contextualized word representations https://arxiv.org/abs/1802.05365

说明

参考 ELMO-tf,修改部分代码，适应于中文语料。这位韩国小哥哥写的代码很清晰，相对于原始的实现，可读性好很多。原始的实现需要自行整理，搭建中文处理机制。

最原始的 ELMO 实现原理，请参考 Deep contextualized word representations
其他基于 ELMO 的chinese 版本实现，请访问 ELMO chinese.

word2vec

中文词向量：https://github.com/Embedding/Chinese-Word-Vectors 腾讯词向量：链接:https://pan.baidu.com/s/1meeKUBKbGMyTGrx664F4Ng 密码:xfh1

说明

执行word2vec目录下，word2vec_embedding.py文件即可。

该项目使用腾讯词向量进行词向量、句向量的计算。

词向量（查表，没有就按字）
句向量（词性加权，词向量，最后求平均）

使用：

下载腾讯词向量，tencent_45000.txt 应该就可以了

其他

#!/usr/bin/env bash

# 重置git的方法，（第二步需要 修改，git add 最好手动确定下）
#1. Checkout
git checkout --orphan latest_branch
#2. Add all the files
git add -A
#3. Commit the changes
git commit -am "commit message"
#4. Delete the branch
git branch -D master
#5.Rename the current branch to master
git branch -m master
#6.Finally, force update your repository
git push -f origin master

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

YC-wind / embedding_study

Programming Languages

Labels

Projects that are alternatives of or similar to embedding study

bert

说明

ELMO-tf

说明

word2vec

说明

其他