Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.

Stars: ✭ 229 (-3.78%)

Mutual labels: natural-language-processing

Reside

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Stars: ✭ 222 (-6.72%)

Mutual labels: natural-language-processing

Dilated Cnn Ner

Dilated CNNs for NER in TensorFlow

Stars: ✭ 222 (-6.72%)

Mutual labels: natural-language-processing

Textlint Rule Preset Ja Technical Writing

技術文書向けのtextlintルールプリセット

Stars: ✭ 218 (-8.4%)

Mutual labels: japanese

Pytorch Bert Crf Ner

KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)

Stars: ✭ 236 (-0.84%)

Mutual labels: natural-language-processing

Genki Study Resources

A collection of exercises for practicing what is taught in Genki: An Integrated Course in Elementary Japanese.

Stars: ✭ 232 (-2.52%)

Mutual labels: japanese

Catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

Stars: ✭ 224 (-5.88%)

Mutual labels: natural-language-processing

Machine Learning Notebooks

Machine Learning notebooks for refreshing concepts.

Stars: ✭ 222 (-6.72%)

Mutual labels: natural-language-processing

Wordgcn

ACL 2019: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks

Stars: ✭ 230 (-3.36%)

Mutual labels: natural-language-processing

Bert4doc Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Stars: ✭ 220 (-7.56%)

Mutual labels: natural-language-processing

Deepnlp Models Pytorch

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

Stars: ✭ 2,760 (+1059.66%)

Mutual labels: natural-language-processing

Ai Job Resume

AI 算法岗简历模板

Stars: ✭ 219 (-7.98%)

Mutual labels: natural-language-processing

Ja.javascript.info

現代の JavaScript チュートリアル

Stars: ✭ 226 (-5.04%)

Mutual labels: japanese

Spacy Services

💫 REST microservices for various spaCy-related tasks

Stars: ✭ 230 (-3.36%)

Mutual labels: natural-language-processing

Text summarization with tensorflow

Implementation of a seq2seq model for summarization of textual data. Demonstrated on amazon reviews, github issues and news articles.

Stars: ✭ 226 (-5.04%)

Mutual labels: natural-language-processing

Catalyst

Accelerated deep learning R&D

Stars: ✭ 2,804 (+1078.15%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

======== Pykakasi

Overview

.. image:: https://readthedocs.org/projects/pykakasi/badge/?version=latest :target: https://pykakasi.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. image:: https://badge.fury.io/py/pykakasi.png :target: http://badge.fury.io/py/Pykakasi :alt: PyPI version

.. image:: https://github.com/miurahr/pykakasi/workflows/Run%20Tox%20tests/badge.svg :target: https://github.com/miurahr/pykakasi/actions?query=workflow%3A%22Run+Tox+tests%22 :alt: Run Tox tests

.. image:: https://dev.azure.com/miurahr/github/_apis/build/status/miurahr.pykakasi?branchName=master :target: https://dev.azure.com/miurahr/github/_build?definitionId=13&branchName=master :alt: Azure-Pipelines

.. image:: https://coveralls.io/repos/miurahr/pykakasi/badge.svg?branch=master :target: https://coveralls.io/r/miurahr/pykakasi?branch=master :alt: Coverage status

pykakasi is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet). It can handle characters in NFC form.

It is based on the kakasi_ library, which is written in C.

Install (from PyPI_): pip install pykakasi
Documentation available on readthedocs_

.. _PyPI: https://pypi.org/project/pykakasi/ .. _kakasi: http://kakasi.namazu.org/ .. _Documentation available on readthedocs: https://pykakasi.readthedocs.io/en/latest/index.html

Supported python versions

pykakasi 1.2 supports python 2.7, python 3.5, 3.6, 3.7
pykakasi 2.0 supports python 3.6, 3.7, 3.8, pypy3.6-7.1.1

Usage

Here is an usage of NewAPI for pykakasi v2.0.0 and later. Transliterate Japanese text to kana, hiragana and romaji:

.. code-block:: python

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字"
result = kks.convert(text)
for item in result:
    print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))

かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'

Here is an example that output as similar with furigana mode.

.. code-block:: python

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字交じり文"
result = kks.convert(text)
for item in result:
    print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
print()

かな[Kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]

Old API

There is also an old API for v1.2.

Transliterate Japanese text to rōmaji:

.. code-block:: pycon

>>> import pykakasi
>>>
>>> text = u"かな漢字交じり文"
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
>>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
>>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
>>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
>>> kakasi.setMode("s", True) # add space, default: no separator
>>> kakasi.setMode("C", True) # capitalize, default: no capitalize
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
kana Kanji Majiri Bun

Tokenize Japanese text (split by word boundaries), equivalent to kakasi's wakati gaki option:

.. code-block:: pycon

>>> wakati = pykakasi.wakati()
>>> conv = wakati.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな 漢字 交じり 文

Add furigana_ (pronounciation aid) in rōmaji to text:

.. code-block:: pycon

>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("J","aF") # Japanese to furigana
>>> kakasi.setMode("H","aF") # Japanese to furigana
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]

Input mode values: "J" (Japanese: kanji, hiragana and katakana), "H" (hiragana), "K" (katakana).

Output mode values: "H" (hiragana), "K" (katakana), "a" (alphabet / rōmaji), "aF" (furigana in rōmaji).

There are other setMode switches which control output:

"r": Romanisation table: Hepburn_ (default), Kunrei_ or Passport
"s": Separator: False adds no spaces between words (default), True adds spaces between words
"C": Capitalize: False adds no capital letters (default), True makes each word start with a capital letter

.. _furigana: https://en.wikipedia.org/wiki/Furigana .. _Hepburn: https://en.wikipedia.org/wiki/Hepburn_romanization .. _Kunrei: https://en.wikipedia.org/wiki/Kunrei-shiki_romanization

Copyright and License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 238

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗