Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (-32.38%)

Mutual labels: nlp-library

unofficial-jisho-api

Encapsulates the official Jisho.org API and also provides kanji, example, and stroke diagram search.

Stars: ✭ 88 (-86.03%)

Mutual labels: japanese-language

Sudachi

A Japanese Tokenizer for Business

Stars: ✭ 496 (-21.27%)

Mutual labels: nlp-library

ebe-dataset

Evidence-based Explanation Dataset (AACL-IJCNLP 2020)

Stars: ✭ 16 (-97.46%)

Mutual labels: japanese-language

Contextualized Topic Models

A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.

Stars: ✭ 318 (-49.52%)

Mutual labels: nlp-library

Pythainlp

Thai Natural Language Processing in Python.

Stars: ✭ 582 (-7.62%)

Mutual labels: nlp-library

Nagisa

A Japanese tokenizer based on recurrent neural networks

Stars: ✭ 260 (-58.73%)

Mutual labels: nlp-library

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (-31.27%)

Mutual labels: nlp-library

View All Similar Projects ➔

======= Janome

.. image:: https://github.com/mocobeta/janome/workflows/Checks/badge.svg :target: https://github.com/mocobeta/janome/actions?query=workflow%3AChecks

.. image:: https://coveralls.io/repos/github/mocobeta/janome/badge.svg?branch=master :target: https://coveralls.io/github/mocobeta/janome?branch=master

.. image:: https://badges.gitter.im/org.png :target: https://gitter.im/janome-python

.. image:: https://img.shields.io/pypi/dm/Janome.svg :target: https://pypistats.org/packages/janome

.. image:: https://img.shields.io/conda/v/conda-forge/janome :target: https://anaconda.org/conda-forge/janome

Janome is a Japanese morphological analysis engine written in pure Python.

General documentation:

https://mocobeta.github.io/janome/en/ (English)

https://mocobeta.github.io/janome/ (Japanese)

Requirements

Python 3.6+ is required.

Install

[Note] This consumes about 500 MB memory for building.

.. code:: bash

(venv) $ python setup.py install

Run

.. code:: bash

(env) $ python

from janome.tokenizer import Tokenizer t = Tokenizer() for token in t.tokenize(u'すもももももももものうち'): ... print(token) ... すもも名詞,一般,,,,,すもも,スモモ,スモモも助詞,係助詞,,,,,も,モ,モもも名詞,一般,,,,,もも,モモ,モモも助詞,係助詞,,,,,も,モ,モもも名詞,一般,,,,,もも,モモ,モモの助詞,連体化,,,,,の,ノ,ノうち名詞,非自立,副詞可能,,,*,うち,ウチ,ウチ

.. code:: bash

(env) $ python

from janome.tokenizer import Tokenizer from janome.analyzer import Analyzer from janome.charfilter import * from janome.tokenfilter import * text = u'蛇の目はPure Ｐｙｔｈｏｎな形態素解析器です。' char_filters = [UnicodeNormalizeCharFilter(), RegexReplaceCharFilter(u'蛇の目', u'janome')] tokenizer = Tokenizer() token_filters = [CompoundNounFilter(), POSStopFilter(['記号','助詞']), LowerCaseFilter()] a = Analyzer(char_filters=char_filters, tokenizer=tokenizer, token_filters=token_filters) for token in a.analyze(text): ... print(token) ... janome 名詞,固有名詞,組織,,,,,, pure 名詞,固有名詞,組織,,,,,, python 名詞,一般,,,,,,,* な助動詞,,,,特殊・ダ,体言接続,だ,ナ,ナ形態素解析器名詞,複合,,,,,形態素解析器,ケイタイソカイセキキ,ケイタイソカイセキキです助動詞,,,,特殊・デス,基本形,です,デス,デス

Twitter

@janome_py <https://twitter.com/janome_py>_

Developmet information for contributors

See this Wiki:

https://github.com/mocobeta/janome/wiki#for-contributors

License

Licensed under Apache License 2.0 and uses the MeCab-IPADIC dictionary/statistical model.

See LICENSE.txt and NOTICE.txt for license details.

Acknowledgement

Special thanks to @ikawaha, @takuyaa, @nakagami and @janome_oekaki.

Copyright

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 630

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗