Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (-40.55%)

Mutual labels: word-segmentation, pos-tagging

Jptdp

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Stars: ✭ 146 (-42.52%)

Mutual labels: pos-tagging, part-of-speech-tagger

KWDLC

Kyoto University Web Document Leads Corpus

Stars: ✭ 64 (-74.8%)

Mutual labels: japanese, morphological-analysis

SynThai

Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning

Stars: ✭ 41 (-83.86%)

Mutual labels: word-segmentation, pos-tagging

Fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Stars: ✭ 125 (-50.79%)

Mutual labels: japanese, tokenizer

Vncorenlp

A Vietnamese natural language processing toolkit (NAACL 2018)

Stars: ✭ 354 (+39.37%)

Mutual labels: pos-tagging, word-segmentation

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+70.47%)

Mutual labels: tokenizer, word-segmentation

Sudachidict

A lexicon for Sudachi

Stars: ✭ 127 (-50%)

Mutual labels: morphological-analysis, pos-tagging

Lac

百度NLP：分词，词性标注，命名实体识别，词重要性

Stars: ✭ 2,792 (+999.21%)

Mutual labels: part-of-speech-tagger, word-segmentation

sinling

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (-85.04%)

Mutual labels: tokenizer, pos-tagging

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Stars: ✭ 32 (-87.4%)

Mutual labels: tokenizer, morphological-analysis

HebPipe

An NLP pipeline for Hebrew

Stars: ✭ 15 (-94.09%)

Mutual labels: part-of-speech-tagger, morphological-analysis

udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.

Stars: ✭ 15 (-94.09%)

Mutual labels: pos-tagging, morphological-analysis

Sudachipy

Python version of Sudachi, a Japanese tokenizer.

Stars: ✭ 207 (-18.5%)

Mutual labels: morphological-analysis, pos-tagging

Kuromoji

Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search

Stars: ✭ 745 (+193.31%)

Mutual labels: japanese, part-of-speech-tagger

Sudachi

A Japanese Tokenizer for Business

Stars: ✭ 496 (+95.28%)

Mutual labels: morphological-analysis, pos-tagging

Kiwi

Kiwi(지능형 한국어 형태소 분석기)

Stars: ✭ 107 (-57.87%)

Mutual labels: morphological-analysis, word-segmentation

Udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (-37.01%)

Mutual labels: tokenizer, pos-tagging

Rdrpostagger

A fast and accurate POS and morphological tagging toolkit (EACL 2014)

Stars: ✭ 126 (-50.39%)

Mutual labels: pos-tagging, part-of-speech-tagger

grasp

Essential NLP & ML, short & fast pure Python code

Stars: ✭ 58 (-77.17%)

Mutual labels: tokenizer, part-of-speech-tagger

suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

Stars: ✭ 31 (-87.8%)

Mutual labels: tokenizer, morphological-analysis

pytorch Joint-Word-Segmentation-and-POS-Tagging

Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging

Stars: ✭ 37 (-85.43%)

Mutual labels: word-segmentation, pos-tagging

GrammarEngine

Грамматический Словарь Русского Языка (+ английский, японский, etc)

Stars: ✭ 68 (-73.23%)

Mutual labels: part-of-speech-tagger, morphological-analysis

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (-20.08%)

Mutual labels: pos-tagging, word-segmentation

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (-62.2%)

Mutual labels: pos-tagging, part-of-speech-tagger

Toiro

A comparison tool of Japanese tokenizers

Stars: ✭ 95 (-62.6%)

Mutual labels: japanese, word-segmentation

datalinguist

Stanford CoreNLP in idiomatic Clojure.

Stars: ✭ 93 (-63.39%)

Mutual labels: pos-tagging, part-of-speech-tagger

Articutapi

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到 SIGHAN 2005 F1-measure 94% 以上，Recall 96% 以上的成績。

Stars: ✭ 252 (-0.79%)

Mutual labels: pos-tagging, part-of-speech-tagger

SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Stars: ✭ 28 (-88.98%)

Mutual labels: word-segmentation

Hebrew-Tokenizer

A very simple python tokenizer for Hebrew text.

Stars: ✭ 16 (-93.7%)

Mutual labels: tokenizer

customized-symspell

Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

Stars: ✭ 51 (-79.92%)

Mutual labels: word-segmentation

Japanese-Words

整理日语N2单词（新标准日本语初级和中级）

Stars: ✭ 41 (-83.86%)

Mutual labels: japanese

knp

A Japanese Parser

Stars: ✭ 16 (-93.7%)

Mutual labels: japanese

unidic-py

Unidic packaged for installation via pip.

Stars: ✭ 17 (-93.31%)

Mutual labels: japanese

wana kana rust

Utility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji

Stars: ✭ 46 (-81.89%)

Mutual labels: japanese

gazou

Japanese OCR for Linux & Windows

Stars: ✭ 32 (-87.4%)

Mutual labels: japanese

cang-jie

Chinese tokenizer for tantivy, based on jieba-rs

Stars: ✭ 48 (-81.1%)

Mutual labels: tokenizer

kanji-web-app

Angular.js kanji web application

Stars: ✭ 45 (-82.28%)

Mutual labels: japanese

analyze-desumasu-dearu

文の敬体(ですます調)、常体(である調)を解析するJavaScriptライブラリ

Stars: ✭ 15 (-94.09%)

Mutual labels: japanese

hashformers

Hashformers is a framework for hashtag segmentation with transformers.

Stars: ✭ 18 (-92.91%)

Mutual labels: word-segmentation

cws-tensorflow

基于Tensorflow的中文分词模型

Stars: ✭ 25 (-90.16%)

Mutual labels: word-segmentation

japanese-pitch-accent-resources

Trying to consolidate japanese phonetic, and in particular pitch accent resources into one list

Stars: ✭ 64 (-74.8%)

Mutual labels: japanese

bredon

A modern CSS value compiler in JavaScript

Stars: ✭ 39 (-84.65%)

Mutual labels: tokenizer

ATKSpy

this repository is a python package that supports SOAP interface to communicate with the Microsoft ATKS

Stars: ✭ 27 (-89.37%)

Mutual labels: pos-tagging

visual syntactic embedding video captioning

Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*

Stars: ✭ 23 (-90.94%)

Mutual labels: pos-tagging

KanjiRecognitionDictionary

Perfect for those who forgets kanji pronunciation

Stars: ✭ 14 (-94.49%)

Mutual labels: japanese

unsupervised-pos-tagging

教師なし品詞タグ推定

Stars: ✭ 16 (-93.7%)

Mutual labels: pos-tagging

jp-ocr-prunned-cnn

Attempting feature map prunning on a CNN trained for Japanese OCR

Stars: ✭ 15 (-94.09%)

Mutual labels: japanese

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Stars: ✭ 21 (-91.73%)

Mutual labels: tokenizer

textlint-ja

textlintの日本語コミュニティ/ルールのアイデア

Stars: ✭ 41 (-83.86%)

Mutual labels: japanese

pascal-interpreter

A simple interpreter for a large subset of Pascal language written for educational purposes

Stars: ✭ 21 (-91.73%)

Mutual labels: tokenizer

textlint-rule-ja-no-abusage

よくある日本語の誤用をチェックするtextlintルール

Stars: ✭ 21 (-91.73%)

Mutual labels: japanese

Zipangu

A library for compatibility about Japan.

Stars: ✭ 27 (-89.37%)

Mutual labels: japanese

rippletagger

RippleTagger identifies part-of-speech tags (Nouns, Verbs, and so on...). You give it a sentence, it gives you a list of tags back.

Stars: ✭ 12 (-95.28%)

Mutual labels: pos-tagging

hanzi-tools

Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.

Stars: ✭ 69 (-72.83%)

Mutual labels: word-segmentation

1-60 of 337 similar projects

›

next*5