All Projects → jeongukjae → python-mecab

jeongukjae / python-mecab

Licence: BSD-3-Clause license
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
perl
6916 projects

Projects that are alternatives of or similar to python-mecab

Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+1503.7%)
Mutual labels:  tokenizer, text-processing
ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-29.63%)
Mutual labels:  tokenizer, text-processing
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+66.67%)
Mutual labels:  tokenizer, text-processing
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+1522.22%)
Mutual labels:  tokenizer, text-processing
dif
'dif' is a Linux preprocessing front end to gvimdiff/meld/kompare
Stars: ✭ 18 (-33.33%)
Mutual labels:  text-processing
Kawazu
A C# library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported. Inspired by project Kuroshiro.
Stars: ✭ 33 (+22.22%)
Mutual labels:  mecab
frangipanni
Program to convert lines of text into a tree structure.
Stars: ✭ 1,176 (+4255.56%)
Mutual labels:  text-processing
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+122.22%)
Mutual labels:  text-processing
Emotion-recognition-from-tweets
A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.
Stars: ✭ 17 (-37.04%)
Mutual labels:  text-processing
s3-concat
Concatenate Amazon S3 files remotely using flexible patterns
Stars: ✭ 32 (+18.52%)
Mutual labels:  text-processing
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-40.74%)
Mutual labels:  text-processing
suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (+14.81%)
Mutual labels:  tokenizer
estratto
parsing fixed width files content made easy
Stars: ✭ 12 (-55.56%)
Mutual labels:  text-processing
text2video
Text to Video Generation Problem
Stars: ✭ 28 (+3.7%)
Mutual labels:  text-processing
SuperCombinators
[Deprecated] A Swift parser combinator framework
Stars: ✭ 19 (-29.63%)
Mutual labels:  text-processing
ConTexto
Librería en Python para minería de texto y NLP
Stars: ✭ 43 (+59.26%)
Mutual labels:  text-processing
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-3.7%)
Mutual labels:  tokenizer
chinese-tokenizer
Tokenizes Chinese texts into words.
Stars: ✭ 72 (+166.67%)
Mutual labels:  tokenizer
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+2533.33%)
Mutual labels:  text-preprocessing
mecab-python-msvc
mecab-python for mecab-ko-msvc
Stars: ✭ 23 (-14.81%)
Mutual labels:  mecab

This project has been moved to https://github.com/jeongukjae/mecab-bind


python-mecab

Run Test Status codecov Codacy Badge

Py Versions PyPi Versions License

A repository to bind mecab for Python 3.5+. Not using swig nor pybind.

Support only Linux, macOS

Original source codes: taku910/mecab

Installation

pip install python-mecab

Usage

Tagger

with eunjeon/mecab-ko-dic.

>>> from mecab import Tagger
>>> tagger = Tagger() # or Tagger('path/to/dic')
>>> tagger.parse("안녕하세요. 이 프로젝트는 python-mecab입니다.")
(('안녕', 'NNG,행위,T,안녕,*,*,*,*'), ('하', 'XSV,*,F,하,*,*,*,*'), ('세요', 'EP+EF,*,F,세요,Inflect,EP,EF,시/EP/*+어요/EF/*'), ('.', 'SF,*,*,*,*,*,*,*'), ('이', 'MM,~명사,F,이,*,*,*,*'), ('프로젝트', 'NNG,*,F,프로젝트,*,*,*,*'), ('는', 'JX,*,T,는,*,*,*,*'), ('python', 'SL,*,*,*,*,*,*,*'), ('-', 'SY,*,*,*,*,*,*,*'), ('mecab', 'SL,*,*,*,*,*,*,*'), ('입니다', 'VCP+EF,*,F,입니다,Inflect,VCP,EF,이/VCP/*+ᄇ니다/EF/*'), ('.', 'SF,*,*,*,*,*,*,*'))
>>> parsed = tagger.parse("안녕하세요. 이 프로젝트는 python-mecab입니다.")
>>> print(*parsed, sep='\n')
('안녕', 'NNG,행위,T,안녕,*,*,*,*')
('하', 'XSV,*,F,하,*,*,*,*')
('세요', 'EP+EF,*,F,세요,Inflect,EP,EF,시/EP/*+어요/EF/*')
('.', 'SF,*,*,*,*,*,*,*')
('이', 'MM,~명사,F,이,*,*,*,*')
('프로젝트', 'NNG,*,F,프로젝트,*,*,*,*')
('는', 'JX,*,T,는,*,*,*,*')
('python', 'SL,*,*,*,*,*,*,*')
('-', 'SY,*,*,*,*,*,*,*')
('mecab', 'SL,*,*,*,*,*,*,*')
('입니다', 'VCP+EF,*,F,입니다,Inflect,VCP,EF,이/VCP/*+ᄇ니다/EF/*')
('.', 'SF,*,*,*,*,*,*,*')

binded cli commands

  • mecab
  • mecab-dict-index
  • mecab-dict-gen
  • mecab-test-gen
  • mecab-cost-train
  • mecab-system-eval
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].