Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A complete overview of the JavaScript landscape in 2016: trends about front-end and node.js frameworks, tooling... Available in English, Japanese and Chinese.

Stars: ✭ 75 (-44.44%)

Mutual labels: japanese

Nadesiko3

Japanese Programming Language Nadesiko v3 (JavaScript)

Stars: ✭ 125 (-7.41%)

Mutual labels: japanese

Nodejs Ja

Node.js 日本語ローカリゼーション

Stars: ✭ 98 (-27.41%)

Mutual labels: japanese

Ichiran

Linguistic tools for texts in Japanese language

Stars: ✭ 120 (-11.11%)

Mutual labels: japanese

Epub Manga Creator

a web GUI for create japanese epub manga

Stars: ✭ 90 (-33.33%)

Mutual labels: japanese

The Tab Of Words

A minimal Chrome / Firefox extension to help you learn Japanese words in each new tab.

Stars: ✭ 94 (-30.37%)

Mutual labels: japanese

Posuto

🏣📮〠 Japanese postal code data.

Stars: ✭ 109 (-19.26%)

Mutual labels: japanese

Qolibri

Continuation of the qolibri EPWING dictionary/book reader

Stars: ✭ 82 (-39.26%)

Mutual labels: japanese

Cutlet

Japanese to romaji converter in Python

Stars: ✭ 124 (-8.15%)

Mutual labels: japanese

Awesome Bert Japanese

📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information

Stars: ✭ 76 (-43.7%)

Mutual labels: japanese

Languagepod101 Scraper

Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Stars: ✭ 104 (-22.96%)

Mutual labels: japanese

Konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

Stars: ✭ 130 (-3.7%)

Mutual labels: japanese

Fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Stars: ✭ 125 (-7.41%)

Mutual labels: japanese

Textlint Rule Preset Jtf Style

JTF日本語標準スタイルガイド for textlint.

Stars: ✭ 112 (-17.04%)

Mutual labels: japanese

View All Similar Projects ➔

.. raw:: html

Back to Home

==================== Japanese NLP Library

.. sectnum:: .. contents::

Requirements

Third Party Dependencies
- Cabocha Japanese Morphological parser http://sourceforge.net/projects/cabocha/
Python Dependencies
- Python 2.6.* or above

`Links`

All code at jProcessing Repo GitHub_

.. _GitHub: https://github.com/kevincobain2000/jProcessing

Documentation_ and HomePage_ and Sphinx_

.. _Documentation: http://www.jaist.ac.jp/~s1010205/jnlp

.. _HomePage: http://www.jaist.ac.jp/~s1010205/

.. _Sphinx: http://readthedocs.org/docs/jprocessing/en/latest/

PyPi_ Python Package

.. _PyPi: http://pypi.python.org/pypi/jProcessing/0.1

clone [email protected]:kevincobain2000/jProcessing.git

`Install`

In Terminal ::

bash$ python setup.py install

History

0.2

  + Sentiment Analysis of Japanese Text

0.1 + Morphologically Tokenize Japanese Sentence + Kanji / Hiragana / Katakana to Romaji Converter + Edict Dictionary Search - borrowed + Edict Examples Search - incomplete + Sentence Similarity between two JP Sentences + Run Cabocha(ISO--8859-1 configured) in Python. + Longest Common String between Sentences + Kanji to Katakana Pronunciation + Hiragana, Katakana Chart Parser

Libraries and Modules

Tokenize `jTokenize.py`

In Python ::

from jNlp.jTokenize import jTokenize input_sentence = u'私は彼を５日前、つまりこの前の金曜日に駅で見かけた' list_of_tokens = jTokenize(input_sentence) print list_of_tokens print '--'.join(list_of_tokens).encode('utf-8')

Returns:

... [u'\u79c1', u'\u306f', u'\u5f7c', u'\u3092', u'\uff15'...] ... 私--は--彼--を--５--日--前--、--つまり--この--前--の--金曜日--に--駅--で--見かけ--た

Katakana Pronunciation:

print '--'.join(jReads(input_sentence)).encode('utf-8') ... ワタシ--ハ--カレ--ヲ--ゴ--ニチ--マエ--、--ツマリ--コノ--マエ--ノ--キンヨウビ--ニ--エキ--デ--ミカケ--タ

Cabocha `jCabocha.py`

Run Cabocha_ with original EUCJP or IS0-8859-1 configured encoding, with utf8 python

.. _Cabocha: http://code.google.com/p/cabocha/

If cabocha is configured as utf8 then see this http://nltk.googlecode.com/svn/trunk/doc/book-jp/ch12.html#cabocha

.. code-block:: python

from jNlp.jCabocha import cabocha print cabocha(input_sentence).encode('utf-8')

Output:

.. code-block:: xml

私は彼を５日前、

Kanji / Katakana /Hiragana to Tokenized Romaji `jConvert.py`

Uses data/katakanaChart.txt and parses the chart. See katakanaChart_.

.. code-block:: python

from jNlp.jConvert import * input_sentence = u'気象庁が２１日午前４時４８分、発表した天気概況によると、' print ' '.join(tokenizedRomaji(input_sentence)) print tokenizedRomaji(input_sentence)

.. code-block:: python

...kisyoutyou ga ni ichi nichi gozen yon ji yon hachi hun hapyou si ta tenki gaikyou ni yoru to ...[u'kisyoutyou', u'ga', u'ni', u'ichi', u'nichi', u'gozen',...]

katakanaChart.txt

.. _katakanaChart:

katakanaChartFile_ and hiraganaChartFile_

.. _katakanaChartFile: https://raw.github.com/kevincobain2000/jProcessing/master/src/jNlp/data/katakanaChart.txt

.. _hiraganaChartFile: https://raw.github.com/kevincobain2000/jProcessing/master/src/jNlp/data/hiraganaChart.txt

Longest Common String Japanese `jProcessing.py`

On English Strings ::

from jNlp.jProcessing import long_substr a = 'Once upon a time in Italy' b = 'Thre was a time in America' print long_substr(a, b)

Output ::

...a time in

On Japanese Strings ::

a = u'これでアナタも冷え知らず' b = u'これでア冷え知らずナタも' print long_substr(a, b).encode('utf-8')

Output ::

...冷え知らず

Similarity between two sentences `jProcessing.py`

Uses MinHash by checking the overlap http://en.wikipedia.org/wiki/MinHash

:English Strings:

from jNlp.jProcessing import Similarities s = Similarities() a = 'There was' b = 'There is' print s.minhash(a,b) ...0.444444444444

:Japanese Strings:

from jNlp.jProcessing import * a = u'これは何ですか？' b = u'これはわからないです' print s.minhash(' '.join(jTokenize(a)), ' '.join(jTokenize(b))) ...0.210526315789

Edict Japanese Dictionary Search with Example sentences

Sample Ouput Demo

.. raw:: html

Edict dictionary and example sentences parser.

This package uses the EDICT_ and KANJIDIC_ dictionary files. These files are the property of the Electronic Dictionary Research and Development Group_ , and are used in conformance with the Group's licence_ .

.. _EDICT: http://www.csse.monash.edu.au/~jwb/edict.html .. _KANJIDIC: http://www.csse.monash.edu.au/~jwb/kanjidic.html .. _Group: http://www.edrdg.org/ .. _licence: http://www.edrdg.org/edrdg/licence.html

Edict Parser By Paul Goins, see edict_search.py Edict Example sentences Parse by query, Pulkit Kathuria, see edict_examples.py Edict examples pickle files are provided but latest example files can be downloaded from the links provided.

Charset

Two files

utf8 Charset example file if not using src/jNlp/data/edict_examples

To convert EUCJP/ISO-8859-1 to utf8 ::

iconv -f EUCJP -t UTF-8 path/to/edict_examples > path/to/save_with_utf-8
ISO-8859-1 edict_dictionary file

Outputs example sentences for a query in Japanese only for ambiguous words.

Links

Latest Dictionary files can be downloaded here_

.. _here: http://www.csse.monash.edu.au/~jwb/edict.html

`edict_search.py`

:author: Paul Goins License included linkToOriginal_:

.. _linkToOriginal: http://repo.or.cz/w/jbparse.git/blame/8e42831ca5f721c0320b27d7d83cb553d6e9c68f:/jbparse/edict.py

For all entries of sense definitions

from jNlp.edict_search import * query = u'認める' edict_path = 'src/jNlp/data/edict-yy-mm-dd' kp = Parser(edict_path) for i, entry in enumerate(kp.search(query)): ... print entry.to_string().encode('utf-8')

`edict_examples.py`

:Note: Only outputs the examples sentences for ambiguous words (if word has one or more senses)

:author: Pulkit Kathuria

from jNlp.edict_examples import * query = u'認める' edict_path = 'src/jNlp/data/edict-yy-mm-dd' edict_examples_path = 'src/jNlp/data/edict_examples' search_with_example(edict_path, edict_examples_path, query)

Output ::

認める

Sense (1) to recognize; EX:01 我々は彼の才能を認めている。We appreciate his talent.

Sense (2) to observe; EX:01 ｘ線写真で異状が認められます。We have detected an abnormality on your x-ray.

Sense (3) to admit; EX:01 母は私の計画をよいと認めた。Mother approved my plan. EX:02 母は決して私の結婚を認めないだろう。Mother will never approve of my marriage. EX:03 父は決して私の結婚を認めないだろう。Father will never approve of my marriage. EX:04 彼は女性の喫煙をいいものだと認めない。He doesn't approve of women smoking. ...

Sentiment Analysis Japanese Text

This section covers (1) Sentiment Analysis on Japanese text using Word Sense Disambiguation, Wordnet-jp_ (Japanese Word Net file name wnjpn-all.tab), SentiWordnet_ (English SentiWordNet file name SentiWordNet_3.*.txt).

.. _Wordnet-jp: http://nlpwww.nict.go.jp/wn-ja/eng/downloads.html .. _SentiWordnet: http://sentiwordnet.isti.cnr.it/

Wordnet files download links

How to Use

The following classifier is baseline, which works as simple mapping of Eng to Japanese using Wordnet and classify on polarity score using SentiWordnet.

(Adnouns, nouns, verbs, .. all included)
No WSD module on Japanese Sentence
Uses word as its common sense for polarity score

from jNlp.jSentiments import * jp_wn = '../../../../data/wnjpn-all.tab' en_swn = '../../../../data/SentiWordNet_3.0.0_20100908.txt' classifier = Sentiment() classifier.train(en_swn, jp_wn) text = u'監督、俳優、ストーリー、演出、全部最高！' print classifier.baseline(text) ...Pos Score = 0.625 Neg Score = 0.125 ...Text is Positive

Japanese Word Polarity Score

from jNlp.jSentiments import * jp_wn = '_dicts/wnjpn-all.tab' #path to Japanese Word Net en_swn = '_dicts/SentiWordNet_3.0.0_20100908.txt' #Path to SentiWordNet classifier = Sentiment() sentiwordnet, jpwordnet = classifier.train(en_swn, jp_wn) positive_score = sentiwordnet[jpwordnet[u'全部']][0] negative_score = sentiwordnet[jpwordnet[u'全部']][1] print 'pos score = {0}, neg score = {1}'.format(positive_score, negative_score) ...pos score = 0.625, neg score = 0.0

Contacts

:Author: pulkit[at]jaist.ac.jp [change at with @]

.. include:: disqus_jnlp.html.rst

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 135

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

kevincobain2000 / Jprocessing

Labels

Projects that are alternatives of or similar to Jprocessing

==================== Japanese NLP Library

Requirements

Links

Install

History

Libraries and Modules

Tokenize jTokenize.py

Cabocha jCabocha.py

Kanji / Katakana /Hiragana to Tokenized Romaji jConvert.py

Longest Common String Japanese jProcessing.py

Similarity between two sentences jProcessing.py

Edict Japanese Dictionary Search with Example sentences

Sample Ouput Demo

Edict dictionary and example sentences parser.

Charset

Links

edict_search.py

edict_examples.py

Sentiment Analysis Japanese Text

Wordnet files download links

How to Use

Japanese Word Polarity Score

Contacts

`Links`

`Install`

Tokenize `jTokenize.py`

Cabocha `jCabocha.py`

Kanji / Katakana /Hiragana to Tokenized Romaji `jConvert.py`

Longest Common String Japanese `jProcessing.py`

Similarity between two sentences `jProcessing.py`

`edict_search.py`

`edict_examples.py`