Analyze your WhatsApp Chat in Seconds. Reveal insights & get statistics, while all data stays on your device. No chat data is sent to a server it runs only locally in your browser.

Stars: ✭ 41 (-81.86%)

Mutual labels: analyzer

klay

KLAY - Korean Language AnalYzer (한국어 형태소 분석기)

Stars: ✭ 19 (-91.59%)

Mutual labels: analyzer

gd-tokenizer

A small godot project with a tokenizer written in GDScript.

Stars: ✭ 34 (-84.96%)

Mutual labels: tokenizer

discord

GitHub webhook that analyzes pull requests and adds comments about incompatible CSS

Stars: ✭ 29 (-87.17%)

Mutual labels: analyzer

snapdragon-lexer

Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.

Stars: ✭ 19 (-91.59%)

Mutual labels: tokenizer

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-80.09%)

Mutual labels: tokenizer

mp4analyser

mp4 file analyser written in Python

Stars: ✭ 50 (-77.88%)

Mutual labels: analyzer

PerformanceAnalyzer

Under the iOS platform, the analyzer is a tool which statistics CPU, FPS, Memory, Loading-Time and provides the output of statistical data. And contain SQL execution time monitor base on FMDatabase and UI refresh in main thread monitor

Stars: ✭ 42 (-81.42%)

Mutual labels: analyzer

lexertk

C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html

Stars: ✭ 26 (-88.5%)

Mutual labels: tokenizer

flvAnalyser

FLV v1.0 analyser

Stars: ✭ 76 (-66.37%)

Mutual labels: analyzer

grasp

Essential NLP & ML, short & fast pure Python code

Stars: ✭ 58 (-74.34%)

Mutual labels: tokenizer

chinese-tokenizer

Tokenizes Chinese texts into words.

Stars: ✭ 72 (-68.14%)

Mutual labels: tokenizer

SwiLex

A universal lexer library in Swift.

Stars: ✭ 29 (-87.17%)

Mutual labels: tokenizer

python-mecab

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Stars: ✭ 27 (-88.05%)

Mutual labels: tokenizer

hprof-slurp

JVM heap dump analyzer

Stars: ✭ 65 (-71.24%)

Mutual labels: analyzer

View All Similar Projects ➔

Lindera

A morphological analysis library in Rust. This project fork from kuromoji-rs.

Lindera aims to build a library which is easy to install and provides concise APIs for various Rust applications.

The following products are required to build:

Rust >= 1.46.0

Usage

Make sure you activated the full features of the lindera crate on Cargo.toml:

[dependencies]
lindera = { version = "0.12.0", features = ["full"] }

Basic example

This example covers the basic usage of Lindera.

It will:

Create a tokenizer in normal mode
Tokenize the input text
Output the tokens

use lindera::tokenizer::Tokenizer;
use lindera::LinderaResult;

fn main() -> LinderaResult<()> {
    // create tokenizer
    let tokenizer = Tokenizer::new()?;

    // tokenize the text
    let tokens = tokenizer.tokenize("関西国際空港限定トートバッグ")?;

    // output the tokens
    for token in tokens {
        println!("{}", token.text);
    }

    Ok(())
}

The above example can be run as follows:

% cargo run --features=ipadic --example=basic_example

You can see the result as follows:

関西国際空港
限定
トートバッグ

User dictionary example

You can give user dictionary entries along with the default system dictionary. User dictionary should be a CSV with following format.

<surface_form>,<part_of_speech>,<reading>

For example:

% cat ./resources/simple_userdic.csv
東京スカイツリー,カスタム名詞,トウキョウスカイツリー
東武スカイツリーライン,カスタム名詞,トウブスカイツリーライン
とうきょうスカイツリー駅,カスタム名詞,トウキョウスカイツリーエキ

With an user dictionary, Tokenizer will be created as follows:

use std::path::PathBuf;

use lindera::LinderaResult;
use lindera::{
    mode::Mode,
    tokenizer::{
        DictionaryConfig, DictionaryKind, DictionarySourceType, Tokenizer, TokenizerConfig,
        UserDictionaryConfig,
    },
};

fn main() -> LinderaResult<()> {
    let dictionary = DictionaryConfig {
        kind: DictionaryKind::IPADIC,
        path: None,
    };

    let user_dictionary = Some(UserDictionaryConfig {
        kind: DictionaryKind::IPADIC,
        source_type: DictionarySourceType::Csv,
        path: PathBuf::from("./resources/ipadic_simple_userdic.csv"),
    });

    // create tokenizer
    let config = TokenizerConfig {
        dictionary,
        user_dictionary: user_dictionary,
        mode: Mode::Normal,
    };
    let tokenizer = Tokenizer::with_config(config)?;

    // tokenize the text
    let tokens = tokenizer.tokenize("東京スカイツリーの最寄り駅はとうきょうスカイツリー駅です")?;

    // output the tokens
    for token in tokens {
        println!("{}", token.text);
    }

    Ok(())
}

The above example can be by cargo run --example:

% cargo run --features=ipadic --example=userdic_example
東京スカイツリー
の
最寄り駅
は
とうきょうスカイツリー駅
です

API reference

The API reference is available. Please see following URL:

lindera

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

lindera-morphology / lindera

Programming Languages

Labels

Projects that are alternatives of or similar to lindera

Lindera

Usage

Basic example

User dictionary example

API reference