jeongukjae / python-mecab

Licence: BSD-3-Clause license

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Programming Languages

C++

36643 projects - #6 most used programming language

python

139335 projects - #7 most used programming language

perl

6916 projects

Projects that are alternatives of or similar to python-mecab

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+1503.7%)

Mutual labels: tokenizer, text-processing

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-29.63%)

Mutual labels: tokenizer, text-processing

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+66.67%)

Mutual labels: tokenizer, text-processing

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

Stars: ✭ 438 (+1522.22%)

Mutual labels: tokenizer, text-processing

dif

'dif' is a Linux preprocessing front end to gvimdiff/meld/kompare

Stars: ✭ 18 (-33.33%)

Mutual labels: text-processing

Kawazu

A C# library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported. Inspired by project Kuroshiro.

Stars: ✭ 33 (+22.22%)

Mutual labels: mecab

frangipanni

Program to convert lines of text into a tree structure.

Stars: ✭ 1,176 (+4255.56%)

Mutual labels: text-processing

perke

A keyphrase extractor for Persian

Stars: ✭ 60 (+122.22%)

Mutual labels: text-processing

Emotion-recognition-from-tweets

A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.

Stars: ✭ 17 (-37.04%)

Mutual labels: text-processing

s3-concat

Concatenate Amazon S3 files remotely using flexible patterns

Stars: ✭ 32 (+18.52%)

Mutual labels: text-processing

corpusexplorer2.0

Korpuslinguistik war noch nie so einfach...

Stars: ✭ 16 (-40.74%)

Mutual labels: text-processing

suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

Stars: ✭ 31 (+14.81%)

Mutual labels: tokenizer

estratto

parsing fixed width files content made easy

Stars: ✭ 12 (-55.56%)

Mutual labels: text-processing

text2video

Text to Video Generation Problem

Stars: ✭ 28 (+3.7%)

Mutual labels: text-processing

SuperCombinators

[Deprecated] A Swift parser combinator framework

Stars: ✭ 19 (-29.63%)

Mutual labels: text-processing

ConTexto

Librería en Python para minería de texto y NLP

Stars: ✭ 43 (+59.26%)

Mutual labels: text-processing

xontrib-output-search

Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.

Stars: ✭ 26 (-3.7%)

Mutual labels: tokenizer

chinese-tokenizer

Tokenizes Chinese texts into words.

Stars: ✭ 72 (+166.67%)

Mutual labels: tokenizer

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+2533.33%)

Mutual labels: text-preprocessing

mecab-python-msvc

mecab-python for mecab-ko-msvc

Stars: ✭ 23 (-14.81%)

Mutual labels: mecab

View All Similar Projects ➔

mecab-bind

python-mecab

A repository to bind mecab for Python 3.5+. Not using swig nor pybind.

Support only Linux, macOS

Original source codes: taku910/mecab

Installation

pip install python-mecab

Usage

Tagger

with eunjeon/mecab-ko-dic.

>>> from mecab import Tagger
>>> tagger = Tagger() # or Tagger('path/to/dic')
>>> tagger.parse("안녕하세요. 이 프로젝트는 python-mecab입니다.")
(('안녕', 'NNG,행위,T,안녕,*,*,*,*'), ('하', 'XSV,*,F,하,*,*,*,*'), ('세요', 'EP+EF,*,F,세요,Inflect,EP,EF,시/EP/*+어요/EF/*'), ('.', 'SF,*,*,*,*,*,*,*'), ('이', 'MM,~명사,F,이,*,*,*,*'), ('프로젝트', 'NNG,*,F,프로젝트,*,*,*,*'), ('는', 'JX,*,T,는,*,*,*,*'), ('python', 'SL,*,*,*,*,*,*,*'), ('-', 'SY,*,*,*,*,*,*,*'), ('mecab', 'SL,*,*,*,*,*,*,*'), ('입니다', 'VCP+EF,*,F,입니다,Inflect,VCP,EF,이/VCP/*+ᄇ니다/EF/*'), ('.', 'SF,*,*,*,*,*,*,*'))
>>> parsed = tagger.parse("안녕하세요. 이 프로젝트는 python-mecab입니다.")
>>> print(*parsed, sep='\n')
('안녕', 'NNG,행위,T,안녕,*,*,*,*')
('하', 'XSV,*,F,하,*,*,*,*')
('세요', 'EP+EF,*,F,세요,Inflect,EP,EF,시/EP/*+어요/EF/*')
('.', 'SF,*,*,*,*,*,*,*')
('이', 'MM,~명사,F,이,*,*,*,*')
('프로젝트', 'NNG,*,F,프로젝트,*,*,*,*')
('는', 'JX,*,T,는,*,*,*,*')
('python', 'SL,*,*,*,*,*,*,*')
('-', 'SY,*,*,*,*,*,*,*')
('mecab', 'SL,*,*,*,*,*,*,*')
('입니다', 'VCP+EF,*,F,입니다,Inflect,VCP,EF,이/VCP/*+ᄇ니다/EF/*')
('.', 'SF,*,*,*,*,*,*,*')

binded cli commands

mecab
mecab-dict-index
mecab-dict-gen
mecab-test-gen
mecab-cost-train
mecab-system-eval

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jeongukjae / python-mecab

Programming Languages

Labels

Projects that are alternatives of or similar to python-mecab

This project has been moved to https://github.com/jeongukjae/mecab-bind

python-mecab

Installation

Usage

Tagger

binded cli commands