All Projects → fuzzythecat → awesome-spacer

fuzzythecat / awesome-spacer

Licence: MIT license
Automatic Korean word spacing with TensorFlow 2.0 + Keras

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to awesome-spacer

ToolTipPopupWordTV
ToolTipopupWordTV is an Open Source Android library that allows developers to easily open a popup with details by select a word from a textview.
Stars: ✭ 41 (-2.38%)
Mutual labels:  word
wordroller
Free Microsoft Word document (aka .docx) processing library for .Net
Stars: ✭ 17 (-59.52%)
Mutual labels:  word
text-classification-transformers
Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)
Stars: ✭ 32 (-23.81%)
Mutual labels:  korean
website
라라벨 코리아의 홈페이지 소스입니다
Stars: ✭ 31 (-26.19%)
Mutual labels:  korean
KoParadigm
KoParadigm: Korean Inflectional Paradigm Generator
Stars: ✭ 48 (+14.29%)
Mutual labels:  korean
ReadWordTable
使用poi解析word文档(.docx)中的表格内容及格式,并以html形式输出
Stars: ✭ 26 (-38.1%)
Mutual labels:  word
kor-to-number.js
한글로 적힌 한국어 수사를 숫자로 변환하는 자바스크립트 라이브러리입니다.
Stars: ✭ 39 (-7.14%)
Mutual labels:  korean
aspose-words-cloud-node
Node.Js library for communicating with the Aspose.Words Cloud API
Stars: ✭ 20 (-52.38%)
Mutual labels:  word
weasels
List of (possible) English weasel words
Stars: ✭ 32 (-23.81%)
Mutual labels:  word
KoLM
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
Stars: ✭ 46 (+9.52%)
Mutual labels:  korean
flutter filereader
Flutter实现的本地文件(pdf word excel 等)查看插件,非在线预览
Stars: ✭ 101 (+140.48%)
Mutual labels:  word
officeexport-java
三行代码导出自定义样式word
Stars: ✭ 68 (+61.9%)
Mutual labels:  word
translate english
Java程序员阅读源码必知英语单词
Stars: ✭ 24 (-42.86%)
Mutual labels:  word
AlgorithmSet
대회용 Algorithm / DS 모음
Stars: ✭ 18 (-57.14%)
Mutual labels:  korean
PyKOMORAN
(Beta) PyKOMORAN is wrapped KOMORAN in Python using Py4J.
Stars: ✭ 38 (-9.52%)
Mutual labels:  korean
hangul ipsum
한글 버전의 lorem ipsum 생성기
Stars: ✭ 17 (-59.52%)
Mutual labels:  korean
TIL
[2020, 12, 12 ~ ing] Today I Learned
Stars: ✭ 18 (-57.14%)
Mutual labels:  korean
IreneBot
Irene Bot for Discord in Python
Stars: ✭ 15 (-64.29%)
Mutual labels:  korean
latex in word
LaTeX equation edition in a macro-enabled Word document
Stars: ✭ 29 (-30.95%)
Mutual labels:  word
WantWords
An open-source online reverse dictionary.
Stars: ✭ 6,187 (+14630.95%)
Mutual labels:  word

awesome-spacer

awesome-spacer is a project for automatic Korean word spacing, using TensorFlow 2 + Keras.

Requirements

  • Python 3.6
  • TensorFlow 2.0
  • NumPy
  • tqdm
  • scikit-learn

Getting Started

  • awesome-spacer-train-colab.ipynb: You can use this to train your own model on Sejong corpus with Google Colab. To train on custom datasets, try using CLI instead.

  • awesome-spacer-test-colab.ipynb: You can use this to test pre-trained models trained on Sejong corpus. Weight links and corresponding model configurations are included in the notebook.

Train

To train the model, you should provide the path to your dataset. You can import this module and train in Jupyter Notebook(see notebooks for example), or train from CLI.

Currently training with CPU is not supported.

# Train a new model from scratch. 
python train.py --data_path path/to/dataset --gpu_list 0

# Continue training from pre-trained model.
python train.py --data_path path/to/dataset --gpu_list 0 --trained_model path/to/weights.h5

# Pass GPU ids for multi-GPU training.
python train.py --data_path path/to/dataset --gpu_list 0,1,2,3

Logging to TensorBoard

The script is configured to log test results to TensorBoard along with losses, accuracy, etc. This way you can visually monitor model performances.

Examples

See Jupyter Notebook examples for usage.

  • Before
내가그린기린그림은긴기린그린그림이고,네가그린기린그림은길지않은기린그린그림이다.
영국의철학자인화이트헤드는"서양의2000년철학은모두플라톤의각주에불과하다"라고말했으며,
시인에머슨은"철학은플라톤이고,플라톤은철학"이라평하였는데,플라톤은소크라테스의수제자이다. 
플라톤이20대인시절,스승소크라테스가민주주의에의해끝내사형당하는것을보고크게분개했으며, 
이는그의귀족주의"철인정치"지지의큰계기가되었다.
  • After
내가 그린 기린 그림은 긴 기린 그린 그림이고, 네가 그린 기린 그림은 길지 않은 기린 그린 그림이다.
영국의 철학자인 화이트헤드는 "서양의 2000년 철학은 모두 플라톤의 각주에 불과하다"라고 말했으며,
시인 에머슨은 "철학은 플라톤이고, 플라톤은 철학"이라 평하였는데, 플라톤은 소크라테스의 수제자이다. 
플라톤이 20대인 시절, 스승 소크라테스가 민주주의에 의해 끝내 사형당하는 것을 보고 크게 분개했으며, 
이는 그의 귀족주의 "철인정치"지지의 큰 계기가 되었다.

Workshop Materials

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].