All Projects → mocobeta → Janome

mocobeta / Janome

Licence: apache-2.0
Japanese morphological analysis engine written in pure Python

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Janome

Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (-12.06%)
Mutual labels:  japanese-language, nlp-library
Yomichan
Japanese pop-up dictionary extension for Chrome and Firefox.
Stars: ✭ 464 (-26.35%)
Mutual labels:  japanese-language
rakutenma-python
Rakuten MA (Python version)
Stars: ✭ 15 (-97.62%)
Mutual labels:  japanese-language
Giveme5w1h
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Stars: ✭ 316 (-49.84%)
Mutual labels:  nlp-library
clj-duckling
Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings. (a duckling clojure fork)
Stars: ✭ 15 (-97.62%)
Mutual labels:  nlp-library
Lingua
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (-45.87%)
Mutual labels:  nlp-library
japanese-pitch-accent-resources
Trying to consolidate japanese phonetic, and in particular pitch accent resources into one list
Stars: ✭ 64 (-89.84%)
Mutual labels:  japanese-language
Awesome Japanese
Awesome Japanese learning resource
Stars: ✭ 563 (-10.63%)
Mutual labels:  japanese-language
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+3388.57%)
Mutual labels:  nlp-library
Quick Nlp
Pytorch NLP library based on FastAI
Stars: ✭ 279 (-55.71%)
Mutual labels:  nlp-library
Chatbot ner
chatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (-56.67%)
Mutual labels:  nlp-library
NLP-tools
Useful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-93.81%)
Mutual labels:  nlp-library
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (-32.38%)
Mutual labels:  nlp-library
unofficial-jisho-api
Encapsulates the official Jisho.org API and also provides kanji, example, and stroke diagram search.
Stars: ✭ 88 (-86.03%)
Mutual labels:  japanese-language
Sudachi
A Japanese Tokenizer for Business
Stars: ✭ 496 (-21.27%)
Mutual labels:  nlp-library
ebe-dataset
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
Stars: ✭ 16 (-97.46%)
Mutual labels:  japanese-language
Contextualized Topic Models
A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.
Stars: ✭ 318 (-49.52%)
Mutual labels:  nlp-library
Pythainlp
Thai Natural Language Processing in Python.
Stars: ✭ 582 (-7.62%)
Mutual labels:  nlp-library
Nagisa
A Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (-58.73%)
Mutual labels:  nlp-library
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (-31.27%)
Mutual labels:  nlp-library

======= Janome

.. image:: https://github.com/mocobeta/janome/workflows/Checks/badge.svg :target: https://github.com/mocobeta/janome/actions?query=workflow%3AChecks

.. image:: https://coveralls.io/repos/github/mocobeta/janome/badge.svg?branch=master :target: https://coveralls.io/github/mocobeta/janome?branch=master

.. image:: https://badges.gitter.im/org.png :target: https://gitter.im/janome-python

.. image:: https://img.shields.io/pypi/dm/Janome.svg :target: https://pypistats.org/packages/janome

.. image:: https://img.shields.io/conda/v/conda-forge/janome :target: https://anaconda.org/conda-forge/janome

Janome is a Japanese morphological analysis engine written in pure Python.

General documentation:

https://mocobeta.github.io/janome/en/ (English)

https://mocobeta.github.io/janome/ (Japanese)

Requirements

Python 3.6+ is required.

Install

[Note] This consumes about 500 MB memory for building.

.. code:: bash

(venv) $ python setup.py install

Run

.. code:: bash

(env) $ python

from janome.tokenizer import Tokenizer t = Tokenizer() for token in t.tokenize(u'すもももももももものうち'): ... print(token) ... すもも 名詞,一般,,,,,すもも,スモモ,スモモ も 助詞,係助詞,,,,,も,モ,モ もも 名詞,一般,,,,,もも,モモ,モモ も 助詞,係助詞,,,,,も,モ,モ もも 名詞,一般,,,,,もも,モモ,モモ の 助詞,連体化,,,,,の,ノ,ノ うち 名詞,非自立,副詞可能,,,*,うち,ウチ,ウチ

.. code:: bash

(env) $ python

from janome.tokenizer import Tokenizer from janome.analyzer import Analyzer from janome.charfilter import * from janome.tokenfilter import * text = u'蛇の目はPure Pythonな形態素解析器です。' char_filters = [UnicodeNormalizeCharFilter(), RegexReplaceCharFilter(u'蛇の目', u'janome')] tokenizer = Tokenizer() token_filters = [CompoundNounFilter(), POSStopFilter(['記号','助詞']), LowerCaseFilter()] a = Analyzer(char_filters=char_filters, tokenizer=tokenizer, token_filters=token_filters) for token in a.analyze(text): ... print(token) ... janome 名詞,固有名詞,組織,,,,,, pure 名詞,固有名詞,組織,,,,,, python 名詞,一般,,,,,,,* な 助動詞,,,,特殊・ダ,体言接続,だ,ナ,ナ 形態素解析器 名詞,複合,,,,,形態素解析器,ケイタイソカイセキキ,ケイタイソカイセキキ です 助動詞,,,,特殊・デス,基本形,です,デス,デス

Twitter

@janome_py <https://twitter.com/janome_py>_

Developmet information for contributors

See this Wiki:

https://github.com/mocobeta/janome/wiki#for-contributors

License

Licensed under Apache License 2.0 and uses the MeCab-IPADIC dictionary/statistical model.

See LICENSE.txt and NOTICE.txt for license details.

Acknowledgement

Special thanks to @ikawaha, @takuyaa, @nakagami and @janome_oekaki.

Copyright

Copyright(C) 2020, Tomoko Uchida. All rights reserved.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].