Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → haven-jeon → KoSpacing

haven-jeon / KoSpacing

Licence: other

Automatic Korean word spacing with R

Programming Languages

7636 projects

Labels

deep-neural-networks tensorflow keras korean korean-nlp

Projects that are alternatives of or similar to KoSpacing

A simple wrapper class for extracting features(embedding) and comparing them using BERT in TensorFlow

Stars: ✭ 24 (-68.42%)

Mutual labels: korean, korean-nlp

hangul-search-js

🇰🇷 Simple Korean text search module

Stars: ✭ 22 (-71.05%)

Mutual labels: korean, korean-nlp

(Beta) PyKOMORAN is wrapped KOMORAN in Python using Py4J.

Stars: ✭ 38 (-50%)

Mutual labels: korean, korean-nlp

Kss: A Toolkit for Korean sentence segmentation

Stars: ✭ 198 (+160.53%)

Mutual labels: korean, korean-nlp

Korean Easy Data Augmentation

Stars: ✭ 62 (-18.42%)

Mutual labels: korean, korean-nlp

Korean Hate Speech Detection Model

Stars: ✭ 38 (-50%)

Mutual labels: korean, korean-nlp

g2pK: g2p module for Korean

Stars: ✭ 137 (+80.26%)

Mutual labels: korean, korean-nlp

📖 Korean NLU Benchmark

Stars: ✭ 420 (+452.63%)

Mutual labels: korean, korean-nlp

Hangulize transcribes non-Korean words into Hangul

Stars: ✭ 152 (+100%)

Mutual labels: korean

컴공생을 위한 대학 생활 가이드라인

Stars: ✭ 202 (+165.79%)

Mutual labels: korean

KoalaNLP = Korean + Scala + NLP. 한국어 형태소 및 구문 분석기의 모음입니다.

Stars: ✭ 146 (+92.11%)

Mutual labels: korean

Chooses correct Korean particle morphs for arbitrary words.

Stars: ✭ 160 (+110.53%)

Mutual labels: korean

Korean IME

Stars: ✭ 208 (+173.68%)

Mutual labels: korean

Pytorch Tutorials Kr

🇰🇷PyTorch에서 제공하는 튜토리얼의 한국어 번역을 위한 저장소입니다. (Translate PyTorch tutorials in Korean🇰🇷)

Stars: ✭ 148 (+94.74%)

Mutual labels: korean

Golang 기술 소식 뉴스레터

Stars: ✭ 233 (+206.58%)

Mutual labels: korean

🇰🇷영타를 한글로, 한타를 영어로 변환해주는 자바스크립트 오픈소스 라이브러리

Stars: ✭ 143 (+88.16%)

Mutual labels: korean

The Road To Learn React Korean

🇰🇷 리액트 도움닫기 - The the Road to learn React (2018) [Deprecated]

Stars: ✭ 140 (+84.21%)

Mutual labels: korean

node.js 한국 커뮤니티

Stars: ✭ 240 (+215.79%)

Mutual labels: korean

Modern TrueType font based on an old-but-good Korean bitmap font.

Stars: ✭ 230 (+202.63%)

Mutual labels: korean

Korean Alphabet Transcription

Stars: ✭ 184 (+142.11%)

Mutual labels: korean

View All Similar Projects ➔

KoSpacing

R package for automatic Korean word spacing.

Python verson can be found here.

Introduction

Word spacing is one of the important parts of the preprocessing of Korean text analysis. Accurate spacing greatly affects the accuracy of subsequent text analysis. KoSpacing has fairly accurate automatic word spacing performance, especially good for online text originated from SNS or SMS.

For example.

“아버지가방에들어가신다.” can be spaced both of below.

“아버지가 방에 들어가신다.” means “My father enters the room.”
“아버지 가방에 들어가신다.” means “My father goes into the bag.”

Common sense, the first is the right answer.

KoSpacing is based on Deep Learning model trained from large corpus(more than 100 million NEWS articles from Chan-Yub Park).

Performance

Test Set	Accuracy
Sejong(colloquial style) Corpus(1M)	97.1%
OOOO(literary style) Corpus(3M)	94.3%

Accuracy = # correctly spaced characters/# characters in the test data.
- Might be increased performance if normalize compound words.

Install

To install from GitHub, use

install.packages('remotes')
remotes::install_github('haven-jeon/KoSpacing')
library(KoSpacing)
set_env()

Example

library(KoSpacing)
#> If you install package first fime,
#> Please set_env() run before using spacing()
spacing("김형호영화시장분석가는'1987'의네이버영화정보네티즌10점평에서언급된단어들을지난해12월27일부터올해1월10일까지통계프로그램R과KoNLP패키지로텍스트마이닝하여분석했다.")
#> loaded KoSpacing model!
#> [1] "김형호 영화시장 분석가는 '1987'의 네이버 영화 정보 네티즌 10점 평에서 언급된 단어들을 지난해 12월 27일부터 올해 1월 10일까지 통계 프로그램 R과 KoNLP 패키지로 텍스트마이닝하여 분석했다."

Model Architecture

Citation

@misc{heewon2018,
author = {Heewon Jeon},
title = {KoSpacing: Automatic Korean word spacing},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/KoSpacing}}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 76

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗