Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → benywon → ChineseBert

benywon / ChineseBert

Licence: other

This is a chinese Bert model specific for question answering

Programming Languages

139335 projects - #7 most used programming language

77523 projects

Labels

natural-language-processing deep-learning chinese-nlp

Projects that are alternatives of or similar to ChineseBert

任何 JS 环境可用的中文分词包，fork from leizongmin/node-segment

Stars: ✭ 139 (+479.17%)

Mutual labels: chinese-nlp

NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.

Stars: ✭ 233 (+870.83%)

Mutual labels: chinese-nlp

Chinese-automatic-speech-recognition

Chinese speech recognition

Stars: ✭ 147 (+512.5%)

Mutual labels: chinese-nlp

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Stars: ✭ 155 (+545.83%)

Mutual labels: chinese-nlp

百度NLP：分词，词性标注，命名实体识别，词重要性

Stars: ✭ 2,792 (+11533.33%)

Mutual labels: chinese-nlp

Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

Stars: ✭ 1,813 (+7454.17%)

Mutual labels: chinese-nlp

Chinese Chatbot

中文聊天机器人，基于10万组对白训练而成，采用注意力机制，对一般问题都会生成一个有意义的答复。已上传模型，可直接运行，跑不起来直播吃键盘。

Stars: ✭ 124 (+416.67%)

Mutual labels: chinese-nlp

THU Chinese Keyphrase Extraction Toolkit

Stars: ✭ 116 (+383.33%)

Mutual labels: chinese-nlp

中文自然语言处理工具集【断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查】

Stars: ✭ 206 (+758.33%)

Mutual labels: chinese-nlp

Electra with tensorflow

This is an implementation of electra according to the paper {ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators}

Stars: ✭ 13 (-45.83%)

Mutual labels: chinese-nlp

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Stars: ✭ 2,441 (+10070.83%)

Mutual labels: chinese-nlp

一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面

Stars: ✭ 186 (+675%)

Mutual labels: chinese-nlp

ChineseNounPhraseExtraction

使用词性模板抽取中文语料中的名词短语

Stars: ✭ 18 (-25%)

Mutual labels: chinese-nlp

Information Extraction Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Stars: ✭ 1,888 (+7766.67%)

Mutual labels: chinese-nlp

ltp4j: Language Technology Platform For Java

Stars: ✭ 165 (+587.5%)

Mutual labels: chinese-nlp

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (+470.83%)

Mutual labels: chinese-nlp

An unsupervised Chinese word segmentation tool.

Stars: ✭ 13 (-45.83%)

Mutual labels: chinese-nlp

bert tokenization for java

This is a java version of Chinese tokenization descried in BERT.

Stars: ✭ 39 (+62.5%)

Mutual labels: chinese-nlp

Berserker - BERt chineSE woRd toKenizER

Stars: ✭ 17 (-29.17%)

Mutual labels: chinese-nlp

Chinese-Minority-PLM

CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)

Stars: ✭ 133 (+454.17%)

Mutual labels: chinese-nlp

View All Similar Projects ➔

ChineseBert

This is a chinese Bert model specific for question answering. We provide two models, a large model which is a 16 layer 1024 transformer, and a small model with 8 layer and 512 hidden size. Our implementation is a different from the original paper https://arxiv.org/abs/1810.04805, in which we replace a position embedding with LSTM, which shows advantages when the text length varies a lot.

Currently it is run on python3 and pytorch

#Stats:

Data: 200m chinese internet question answering pairs.

tokenizer: we use the sentencepiece tokenizer with vocab size equal to 35,000

For both large and small model, we train it for 2m steps, which did not suffer from overfit problem

large model takes 12 days for one epoch on 8-GPU NV-LINK v100. Small model takes 2 days for one epoch on 8-GPU NV-LINK v100.

#Usage:

Fed with chinese question answer pair and get the combined representations.

You can refer to the main.py for more detail.

The model has been tested under sequence length less than 1024

As the torch model file is very large, you should download it from the google drive via get_model.sh

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 24

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗