All Projects → lovit → Textmining Tutorial

lovit / Textmining Tutorial

(한국어) 텍스트 마이닝을 위한 공부거리들

Projects that are alternatives of or similar to Textmining Tutorial

Improved Seam Carving
A numpy implementation of forward energy from the paper “Improved Seam Carving for Video Retargeting" (2008)
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Qml Mooc
Lecture notebooks and coding assignments for the quantum machine learning MOOC created by Peter Wittek on EdX in the Spring 2019
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Zanzibar Aerial Mapping
Open source notebooks to create state-of-the-art detection, segmentation, & classification of buildings on drone/aerial imagery with deep learning
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
Pytorch Pose Estimation
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Alphalens
Performance analysis of predictive (alpha) stock factors
Stars: ✭ 2,130 (+1292.16%)
Mutual labels:  jupyter-notebook
Python 100 Days Master
python100天学习资料
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Stanford Cs229
Python solutions to the problem sets of Stanford's graduate course on Machine Learning, taught by Prof. Andrew Ng
Stars: ✭ 151 (-1.31%)
Mutual labels:  jupyter-notebook
Predict Remaining Useful Life
Predict remaining useful life of a component based on historical sensor observations using automated feature engineering
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
Pyspark Pictures
Learn the pyspark API through pictures and simple examples
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Sentiment analysis
This is the code for "Sentiment Analysis - Data Lit #1" by Siraj Raval on Youtube
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
Ssd keras
简明 SSD 目标检测模型 keras version(交通标志识别 训练部分见 dev 分支)
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Cognitive Vision Python
Jupyter Notebook with Python samples for the Cognitive Services Computer Vision API
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Mish
Mish Deep Learning Activation Function for PyTorch / FastAI
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
3 Min Pytorch
<펭귄브로의 3분 딥러닝, 파이토치맛> 예제 코드
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Ar Depth
Fast Depth Densification for Occlusion-Aware Augmented Reality
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
Netgan
Implementation of the paper "NetGAN: Generating Graphs via Random Walks".
Stars: ✭ 152 (-0.65%)
Mutual labels:  jupyter-notebook
Pytorch stylegan encoder
Pytorch implementation of a StyleGAN encoder. Images to latent space representation.
Stars: ✭ 151 (-1.31%)
Mutual labels:  jupyter-notebook
Pyiron
pyiron - an integrated development environment (IDE) for computational materials science.
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
Suite2p
cell detection in calcium imaging recordings
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook
Code For Learn Machinelearning
Stars: ✭ 153 (+0%)
Mutual labels:  jupyter-notebook

(한국어) 텍스트 마이닝을 위한 튜토리얼

텍스트 마이닝을 공부하기 위한 자료입니다. 언어에 상관없이 적용할 수 있는 자연어처리 / 머신러닝 관련 자료도 포함되지만, 한국어 분석을 위한 자료들도 포함됩니다.

  • 이 자료는 현재 작업중이며, slide와 jupyter notebook example codes가 포함되어 있습니다.
  • 이 자료는 soynlp package를 이용합니다. 한국어 분석을 위한 자연어처리 코드입니다. soynlp 역시 현재 작업중입니다.
  • Slides 내용에 관련된 texts 는 blog 에 포스팅 중입니다.
  • 실습코드는 코드 repository 에 있습니다.

Contents

  1. Python basic
    1. jupyter tutorial
  2. From text to vector (KoNLPy)
    1. [x] n-gram
    2. [x] from text to vector using KoNLPy
  3. Word extraction and tokenization (Korean)
    1. [x] word extractor
    2. [x] unsupervised tokenizer
    3. [x] noun extractor
    4. [x] dictionary based pos tagger
  4. Document classification
    1. [x] Logistic Regression and Lasso regression
    2. [x] SVM (linear, RBF)
    3. [x] k-nearest neighbors classifier
    4. [x] Feed-forward neural network
    5. [x] Decision Tree
    6. [x] Naive Bayes
  5. Sequential labeling
    1. [x] Conditional Random Field
  6. Embedding for representation
    1. [x] Word2Vec / Doc2Vec
    2. [x] GloVe
    3. [x] FastText (word embedding using subword)
    4. [x] FastText (supervised word embedding)
    5. [x] Sparse Coding
    6. [x] Nonnegative Matrix Factorization (NMF) for topic modeling
  7. Embedding for vector visualization
    1. [x] MDS, ISOMAP, Locally Linear Embedding, PCA, Kernel PCA
    2. [x] t-SNE
    3. [ ] t-SNE (detailed)
  8. Keyword / Related words analysis
    1. [x] co-occurrence based keyword / related word analysis
  9. Document clustering
    1. [x] k-means is good for document clustering
    2. [x] DBSCAN, hierarchical, GMM, BGMM are not appropriate for document clustering
  10. Finding similar documents (neighbor search)
    1. [x] Random Projection
    2. [x] Locality Sensitive Hashing
    3. [x] Inverted Index
  11. Graph similarity and ranking (centrality)
    1. [x] SimRank & Random Walk with Restart
    2. [x] PageRank, HITS, WordRank, TextRank
    3. [x] kr-wordrank keyword extraction
  12. String similarity
    1. [x] Levenshtein / Cosine / Jaccard distance
  13. Convolutional Neural Network (CNN)
    1. [x] Introduction of CNN
    2. [x] Word-level CNN for sentence classification (Yoon Kim)
    3. [x] Character-level CNN (LeCun)
    4. [x] BOW-CNN
  14. Recurrent Neural Network (RNN)
    1. [x] Introduction of RNN
    2. [x] LSTM, GRU
    3. [x] Deep RNN & ELMo
    4. [x] Sequence to sequence & seq2seq with attention
    5. [x] Skip-thought vector
    6. [x] Attention mechanism for sentence classification
    7. [x] Hierarchical Attention Network (HAN) for document classification
    8. [x] Transformer & BERT
  15. Applications
    1. [x] soyspacing: heuristic Korean space correction
    2. [x] crf-based Korean soace correction
    3. [x] HMM & CRF-based part-of-speech tagger (morphological analyzer)
    4. [ ] semantic movie search using IMDB
  16. TBD

Thanks to

자료를 리뷰하고 함께 토론해주는 고마운 동료들이 많습니다. 특히 많은 시간과 정성을 들여 도와주는 태욱에게 고마움을 표합니다.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].