sngjuk / Meme Glossary
Programming Languages
Projects that are alternatives of or similar to Meme Glossary
meme-glossary
- Retrieve meme-image with query sentence embedding over zmq.
- Generate memes from comics.
python3 is required. Install
Client only usage :
git clone https://github.com/sngjuk/meme-glossary.git
./install.sh client
Full usage :
git clone --recurse-submodules https://github.com/sngjuk/meme-glossary.git
./install.sh all
Usage :
Please check ./example folder.
Client :
import client
mc = client.MgClient(ip='localhost', port=5555)
# Query with sentence.
mc.dank(['Nice to meet you'], max_img=3, min_sim=0.15)
# Random meme
mc.random()
# Save as a file.
mc.save_meme(img_data, 'image.jpg')
Server :
./app.py --model_path= model.bin --meme_dir= meme_dir --xml_dir= xml_dir --vec_path= meme_voca.vec
example folder)
Example : (check inPrepare Memes from comic book.
1. Crawl comics from web. (Please find the source for memes.. this script crawls Korean comics)
Output : Comic book image files. (1_original_comics)
prepare_memes/comics_crawler.py
2. Cut comic book into scenes.
Input : Comic book image files. (1_original_comics)
Output : Cut Scenes. (2_kumiko_cut_meme)
prepare_memes/cutter.py --kumiko= /prepare_memes/kumiko --meme_dir= 1_original_comics --out_dir= 2_kumiko_cut_meme
3. Filter error cuts manually. (GUI environment is recommended.)
Input : Cut Scenes. (2_kumiko_cut_meme)
Output : Manually filtered memes. (3_manual_filtered_meme)
4-1. Label with Google vision cloud API. (Please check --lang_hint and pricing policy in this repo's wiki page .)
Input : Manually filtered memes. (3_manual_filtered_meme)
Output : Meme label xml. (4_label_xml)
prepare_memes/auto_labeler.py --meme_dir= 3_manual_filtered_meme --output_dir= 4_label_xml --lang_hint= ' '
4-2. or Label Manually.
prepare_memes/manual_labeler.py --meme_dir= 3_manual_filtered_meme --output_dir= 4_label_xml
4-3. or Label with Rect Label. (xml format is sharable with Rect Label).
https://rectlabel.com/
5. Generate .vec for similiarity search. {episode/filename : vector}
Input : Meme label xml. (4_label_xml), Sentence embedding model. (model.bin) -please check below.
Output : .vec file for similiarity search. (5_meme_voca.vec)
prepare_memes/xml2vec.py --model_path= model.bin --xml_dir= 4_label_xml --vec_path= 5_meme_voca.vec
Prepare Sentence Embedding Model.
Pretrained models : Pretrained Eng model
Note : To train a new sent2vec model, you first need some large training text file. This file should contain one sentence per line. The provided code does not perform tokenization and lowercasing, you have to preprocess your input data yourself.
*You can replace nlp model(not sent2vec) by simply chainging /server/nlp/model.py
한국어 모델
-
Pretrained KR model(전처리한 나무위키 텍스트 220mb (부족한 데이터양으로 학습 후 모르는 단어가 꽤나 많습니다)
-
Pretrained decomposed KR model (자소분해 후 학습된 모델, 위 모델보다 나은 성능이지만 OOV 문제는 같습니다)
*자소 분해된 쿼리를 사용하기위해 xml2vec.py, app.py에 --lang=ko 옵션을 줍니다.
Done! execute server :
./app.py --model_path model.bin --meme_dir= 3_manual_filtered_meme --xml_dir= 4_label_xml --vec_path= 5_meme_voca.vec (--lang=ko <- 자소분해모델 사용시 추가)