tongchangD / bert_for_corrector

Licence: other

基于bert进行中文文本纠错

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bert for corrector

BERTOverflow

A Pre-trained BERT on StackOverflow Corpus

Stars: ✭ 40 (-79.9%)

Mutual labels: bert

ExpBERT

Code for our ACL '20 paper "Representation Engineering with Natural Language Explanations"

Stars: ✭ 28 (-85.93%)

Mutual labels: bert

CheXbert

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Stars: ✭ 51 (-74.37%)

Mutual labels: bert

beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

Stars: ✭ 738 (+270.85%)

Mutual labels: bert

rasa-bert-finetune

支持rasa-nlu 的bert finetune

Stars: ✭ 46 (-76.88%)

Mutual labels: bert

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.

Stars: ✭ 92 (-53.77%)

Mutual labels: bert

AliceMind

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Stars: ✭ 1,479 (+643.22%)

Mutual labels: bert

Transformer-QG-on-SQuAD

Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)

Stars: ✭ 28 (-85.93%)

Mutual labels: bert

R-AT

Regularized Adversarial Training

Stars: ✭ 19 (-90.45%)

Mutual labels: bert

BERT-QE

Code and resources for the paper "BERT-QE: Contextualized Query Expansion for Document Re-ranking".

Stars: ✭ 43 (-78.39%)

Mutual labels: bert

datagrand bert

2019达观杯信息提取第5名代码

Stars: ✭ 20 (-89.95%)

Mutual labels: bert

GEANet-BioMed-Event-Extraction

Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs

Stars: ✭ 52 (-73.87%)

Mutual labels: bert

korpatbert

특허분야 특화된 한국어 AI언어모델 KorPatBERT

Stars: ✭ 48 (-75.88%)

Mutual labels: bert

bert attn viz

Visualize BERT's self-attention layers on text classification tasks

Stars: ✭ 41 (-79.4%)

Mutual labels: bert

BiaffineDependencyParsing

BERT+Self-attention Encoder ; Biaffine Decoder ; Pytorch Implement

Stars: ✭ 67 (-66.33%)

Mutual labels: bert

LAMB Optimizer TF

LAMB Optimizer for Large Batch Training (TensorFlow version)

Stars: ✭ 119 (-40.2%)

Mutual labels: bert

TabFormer

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Stars: ✭ 209 (+5.03%)

Mutual labels: bert

question generator

An NLP system for generating reading comprehension questions

Stars: ✭ 188 (-5.53%)

Mutual labels: bert

neural-ranking-kd

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

Stars: ✭ 74 (-62.81%)

Mutual labels: bert

TriB-QA

吹逼我们是认真的

Stars: ✭ 45 (-77.39%)

Mutual labels: bert

View All Similar Projects ➔

BERT model correct error character with mask feature

实在抱歉,之前做项目比较急,然后没有完全上传完文件,导致大家使用受阻,现已更新
有人提醒缺少模型，近期空闲，特意将bert模型奉上,提取码为：hhxx 另外其中缺少得文件也有上传，安心食用。

另实体识别纠错,ner_for_corrector

实体识别纠错的效果还可以，见代码,，详情介绍见地址

Bert 使用说明

保存预训练模型在data文件夹下 ├── data
│   ├── bert_config.json
│   ├── config.json
│   ├── pytorch_model.bin
│   └── vocab.txt
├── bert_corrector.py
├── config.py
├── logger.py
├── predict_mask.py
├── README.md
└── text_utils.py
运行bert_corrector.py可以进行纠错。

python bert_corrector.py

运行'predict_mask.py' 可以直接观测用[mask] 掩盖的地方可能出现的汉字
'''
python predict_mask.py
'''
评估通用数据下训练的结果并不适用于垂直领域的纠错，需要重新训练

export CUDA_VISIBLE_DEVICES=0  
python run_lm_finetuning.py \  
    --output_dir=chinese_finetuned_lm \
    --model_type=bert \
    --model_name_or_path=bert-base-chinese \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE \
    --mlm
    --num_train_epochs=3

或者使用

python -m run_lm_finetuning \  
    --bert_model bert-base-uncased \  
    --do_lower_case \  
    --do_train \ 
    --train_file ./samples/sample_text.txt \  
    --output_dir ./samples/samples_out \  
    --num_train_epochs 5.0 \  
    --learning_rate 3e-5 \  
    --train_batch_size 16 \  
    --max_seq_length 128

参数可根据机器设备进行删改

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

tongchangD / bert_for_corrector

Programming Languages

Labels

Projects that are alternatives of or similar to bert for corrector

BERT model correct error character with mask feature

另实体识别纠错,ner_for_corrector

Bert 使用说明

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

tongchangD / bert_for_corrector

Programming Languages

Labels

Projects that are alternatives of or similar to bert for corrector

BERT model correct error character with mask feature

另 实体识别纠错,ner_for_corrector

Bert 使用说明

另实体识别纠错,ner_for_corrector