All Projects โ†’ monologg โ†’ KoELECTRA-Pipeline

monologg / KoELECTRA-Pipeline

Licence: Apache-2.0 license
Transformers Pipeline with KoELECTRA

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to KoELECTRA-Pipeline

KoBERT-Transformers
KoBERT on ๐Ÿค— Huggingface Transformers ๐Ÿค— (with Bug Fixed)
Stars: โœญ 162 (+337.84%)
Mutual labels:  transformers, korean-nlp
KcELECTRA
๐Ÿค— Korean Comments ELECTRA: ํ•œ๊ตญ์–ด ๋Œ“๊ธ€๋กœ ํ•™์Šตํ•œ ELECTRA ๋ชจ๋ธ
Stars: โœญ 119 (+221.62%)
Mutual labels:  korean-nlp, electra
GoEmotions-pytorch
Pytorch Implementation of GoEmotions ๐Ÿ˜๐Ÿ˜ข๐Ÿ˜ฑ
Stars: โœญ 95 (+156.76%)
Mutual labels:  pipeline, transformers
KB-ALBERT
KB๊ตญ๋ฏผ์€ํ–‰์—์„œ ์ œ๊ณตํ•˜๋Š” ๊ฒฝ์ œ/๊ธˆ์œต ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ํ•œ๊ตญ์–ด ALBERT ๋ชจ๋ธ
Stars: โœญ 215 (+481.08%)
Mutual labels:  transformers, korean-nlp
eve-bot
EVE bot, a customer service chatbot to enhance virtual engagement for Twitter Apple Support
Stars: โœญ 31 (-16.22%)
Mutual labels:  transformers
effepi
Fun functional programming with pipelinable functions
Stars: โœญ 13 (-64.86%)
Mutual labels:  pipeline
BERT-NER
Using pre-trained BERT models for Chinese and English NER with ๐Ÿค—Transformers
Stars: โœญ 114 (+208.11%)
Mutual labels:  transformers
MISE
Multimodal Image Synthesis and Editing: A Survey
Stars: โœญ 214 (+478.38%)
Mutual labels:  transformers
swarmci
Swarm CI - Docker Swarm-based CI system or enhancement to existing systems.
Stars: โœญ 48 (+29.73%)
Mutual labels:  pipeline
sparklanes
A lightweight data processing framework for Apache Spark
Stars: โœญ 17 (-54.05%)
Mutual labels:  pipeline
hlatyping
Precision HLA typing from next-generation sequencing data
Stars: โœญ 28 (-24.32%)
Mutual labels:  pipeline
flamingo
FreeCAD - flamingo workbench
Stars: โœญ 30 (-18.92%)
Mutual labels:  pipeline
ParsBigBird
Persian Bert For Long-Range Sequences
Stars: โœญ 58 (+56.76%)
Mutual labels:  transformers
NGI-RNAseq
Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.
Stars: โœญ 50 (+35.14%)
Mutual labels:  pipeline
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: โœญ 39 (+5.41%)
Mutual labels:  transformers
emg-viral-pipeline
VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Stars: โœญ 38 (+2.7%)
Mutual labels:  pipeline
text2keywords
Trained T5 and T5-large model for creating keywords from text
Stars: โœญ 53 (+43.24%)
Mutual labels:  transformers
nlp classification
Implementing nlp papers relevant to classification with PyTorch, gluonnlp
Stars: โœญ 224 (+505.41%)
Mutual labels:  korean-nlp
robustness-vit
Contains code for the paper "Vision Transformers are Robust Learners" (AAAI 2022).
Stars: โœญ 78 (+110.81%)
Mutual labels:  transformers
bacannot
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
Stars: โœญ 51 (+37.84%)
Mutual labels:  pipeline

KoELECTRA-Pipeline

Transformers Pipeline with KoELECTRA

Available Pipeline

Subtask Model Link
NSMC koelectra-base koelectra-base-finetuned-nsmc
koelectra-small koelectra-small-finetuned-nsmc
Naver-NER koelectra-base koelectra-base-finetuned-naver-ner
koelectra-small koelectra-small-finetuned-naver-ner
KorQuad koelectra-base-v2 koelectra-base-v2-finetuned-korquad
koelectra-small-v2 koelectra-small-v2-distilled-korquad-384

Customized NER Pipeline

ํ•˜๋‚˜์˜ Word๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ Wordpiece๋กœ ์ชผ๊ฐœ์ง€๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋Š”๋ฐ, NerPipeline์€ piece-level๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” ์ถ”ํ›„์— ๋‹จ์–ด ๋‹จ์œ„๋กœ ๋ณต์›ํ•  ๋•Œ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • NerPipeline ํด๋ž˜์Šค๋ฅผ ner_pipeline.py์— ์ผ๋ถ€ ์ˆ˜์ •ํ•˜์—ฌ ์žฌ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ignore_special_tokens๋ผ๋Š” ์ธ์ž๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ, [CLS]์™€ [SEP] ํ† ํฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฌด์‹œํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ignore_labels=['O']์ผ ์‹œ O tag๋ฅผ ์ œ์™ธํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Requirements

  • torch>=1.4.0
  • transformers==3.0.2

Run reference code

$ python3 test_nsmc.py
$ python3 test_naver_ner.py
$ python3 test_korquad.py

Example

1. NSMC

from transformers import ElectraTokenizer, ElectraForSequenceClassification, pipeline
from pprint import pprint

tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-small-finetuned-nsmc")
model = ElectraForSequenceClassification.from_pretrained("monologg/koelectra-small-finetuned-nsmc")

nsmc = pipeline(
    "sentiment-analysis",
    tokenizer=tokenizer,
    model=model
)

print(nsmc("์ด ์˜ํ™”๋Š” ๋ฏธ์ณค๋‹ค. ๋„ทํ”Œ๋ฆญ์Šค๊ฐ€ ์ผ์ƒํ™”๋œ ์‹œ๋Œ€์— ๊ทน์žฅ์ด ์กด์žฌํ•ด์•ผํ•˜๋Š” ์ด์œ ๋ฅผ ์ฆ๋ช…ํ•ด์ค€๋‹ค."))

# Out
[{'label': 'positive', 'score': 0.8729340434074402}]

2. Naver-NER

from transformers import ElectraTokenizer, ElectraForTokenClassification
from ner_pipeline import NerPipeline
from pprint import pprint

tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-small-finetuned-naver-ner")
model = ElectraForTokenClassification.from_pretrained("monologg/koelectra-small-finetuned-naver-ner")

ner = NerPipeline(model=model,
                  tokenizer=tokenizer,
                  ignore_labels=[],
                  ignore_special_tokens=True)


pprint(ner("2009๋…„ 7์›” FC์„œ์šธ์„ ๋– ๋‚˜ ์ž‰๊ธ€๋žœ๋“œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๋ณผํ„ด ์›๋”๋Ÿฌ์Šค๋กœ ์ด์ ํ•œ ์ด์ฒญ์šฉ์€ ํฌ๋ฆฌ์Šคํƒˆ ํŒฐ๋ฆฌ์Šค์™€ ๋…์ผ ๋ถ„๋ฐ์Šค๋ฆฌ๊ฐ€2 VfL ๋ณดํ›”์„ ๊ฑฐ์ณ ์ง€๋‚œ 3์›” K๋ฆฌ๊ทธ๋กœ ์ปด๋ฐฑํ–ˆ๋‹ค. ํ–‰์„ ์ง€๋Š” ์„œ์šธ์ด ์•„๋‹Œ ์šธ์‚ฐ์ด์—ˆ๋‹ค"))

# Out
[{'entity': 'DAT-B', 'score': 0.9996234178543091, 'word': '2009๋…„'},
 {'entity': 'DAT-I', 'score': 0.93541419506073, 'word': '7์›”'},
 {'entity': 'ORG-B', 'score': 0.9994615912437439, 'word': 'FC์„œ์šธ์„'},
 {'entity': 'O', 'score': 0.999957799911499, 'word': '๋– ๋‚˜'},
 {'entity': 'LOC-B', 'score': 0.9983285069465637, 'word': '์ž‰๊ธ€๋žœ๋“œ'},
 {'entity': 'ORG-B', 'score': 0.9989873766899109, 'word': 'ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ'},
 {'entity': 'ORG-B', 'score': 0.9315412044525146, 'word': '๋ณผํ„ด'},
 {'entity': 'ORG-I', 'score': 0.9993480443954468, 'word': '์›๋”๋Ÿฌ์Šค๋กœ'},
 {'entity': 'O', 'score': 0.9999217987060547, 'word': '์ด์ ํ•œ'},
 {'entity': 'PER-B', 'score': 0.9994915127754211, 'word': '์ด์ฒญ์šฉ์€'},
 {'entity': 'ORG-B', 'score': 0.999463677406311, 'word': 'ํฌ๋ฆฌ์Šคํƒˆ'},
 {'entity': 'ORG-I', 'score': 0.999179482460022, 'word': 'ํŒฐ๋ฆฌ์Šค์™€'},
 {'entity': 'LOC-B', 'score': 0.9977350234985352, 'word': '๋…์ผ'},
 {'entity': 'ORG-B', 'score': 0.9813936352729797, 'word': '๋ถ„๋ฐ์Šค๋ฆฌ๊ฐ€2'},
 {'entity': 'ORG-B', 'score': 0.8733143210411072, 'word': 'VfL'},
 {'entity': 'ORG-I', 'score': 0.9937891960144043, 'word': '๋ณดํ›”์„'},
 {'entity': 'O', 'score': 0.9999728202819824, 'word': '๊ฑฐ์ณ'},
 {'entity': 'DAT-B', 'score': 0.9963461756706238, 'word': '์ง€๋‚œ'},
 {'entity': 'DAT-I', 'score': 0.9909392595291138, 'word': '3์›”'},
 {'entity': 'ORG-B', 'score': 0.9995419383049011, 'word': 'K๋ฆฌ๊ทธ๋กœ'},
 {'entity': 'O', 'score': 0.9999108910560608, 'word': '์ปด๋ฐฑํ–ˆ๋‹ค.'},
 {'entity': 'O', 'score': 0.9993030428886414, 'word': 'ํ–‰์„ ์ง€๋Š”'},
 {'entity': 'ORG-B', 'score': 0.9915705323219299, 'word': '์„œ์šธ์ด'},
 {'entity': 'O', 'score': 0.9999194741249084, 'word': '์•„๋‹Œ'},
 {'entity': 'ORG-B', 'score': 0.9994401931762695, 'word': '์šธ์‚ฐ์ด์—ˆ๋‹ค'}]

3. KorQuad

from transformers import ElectraTokenizer, ElectraForQuestionAnswering, pipeline
from pprint import pprint

tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-small-v2-distilled-korquad-384")
model = ElectraForQuestionAnswering.from_pretrained("monologg/koelectra-small-v2-distilled-korquad-384")

qa = pipeline("question-answering", tokenizer=tokenizer, model=model)

pprint(qa({
    "question": "ํ•œ๊ตญ์˜ ๋Œ€ํ†ต๋ น์€ ๋ˆ„๊ตฌ์ธ๊ฐ€?",
    "context": "๋ฌธ์žฌ์ธ ๋Œ€ํ†ต๋ น์€ 28์ผ ์„œ์šธ ์ฝ”์—‘์Šค์—์„œ ์—ด๋ฆฐ โ€˜๋ฐ๋ทฐ (Deview) 2019โ€™ ํ–‰์‚ฌ์— ์ฐธ์„ํ•ด ์ Š์€ ๊ฐœ๋ฐœ์ž๋“ค์„ ๊ฒฉ๋ คํ•˜๋ฉด์„œ ์šฐ๋ฆฌ ์ •๋ถ€์˜ ์ธ๊ณต์ง€๋Šฅ ๊ธฐ๋ณธ๊ตฌ์ƒ์„ ๋‚ด๋†“์•˜๋‹ค.",
}))

# Out
{'answer': '๋ฌธ์žฌ์ธ', 'end': 3, 'score': 0.9644287549022144, 'start': 0}

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].