Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → bminixhofer → Nlprule

bminixhofer / Nlprule

Licence: other

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

Programming Languages

11053 projects

57 projects

Labels

machine-learning nlp natural-language-processing spellcheck

Projects that are alternatives of or similar to Nlprule

The most popular spellchecking library.

Stars: ✭ 1,196 (+287.06%)

Mutual labels: natural-language-processing, spellcheck

Style and Grammar Checker for 25+ Languages

Stars: ✭ 5,641 (+1725.57%)

Mutual labels: natural-language-processing, spellcheck

🖋️ Fast and safe spellchecking C++ library

Stars: ✭ 108 (-65.05%)

Mutual labels: natural-language-processing, spellcheck

Kts linguistics

Spellcheck, phonetics, text processing and more

Stars: ✭ 18 (-94.17%)

Mutual labels: natural-language-processing, spellcheck

Hunspell Dict Ko

Korean spellchecking dictionary for Hunspell

Stars: ✭ 187 (-39.48%)

Mutual labels: natural-language-processing, spellcheck

Simple text proofreader based on 'write-good' (hemingway-app-like suggestions) and 'nodehun' (spelling).

Stars: ✭ 285 (-7.77%)

Mutual labels: spellcheck

Deep Learning Nlp Rl Papers

Recent Deep Learning papers in NLU and RL

Stars: ✭ 288 (-6.8%)

Mutual labels: natural-language-processing

LanguageCrunch NLP server docker image

Stars: ✭ 281 (-9.06%)

Mutual labels: natural-language-processing

Awesome Distributed Deep Learning

A curated list of awesome Distributed Deep Learning resources.

Stars: ✭ 277 (-10.36%)

Mutual labels: natural-language-processing

Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Stars: ✭ 304 (-1.62%)

Mutual labels: natural-language-processing

A Model for Natural Language Attack on Text Classification and Inference

Stars: ✭ 298 (-3.56%)

Mutual labels: natural-language-processing

🏥 Medical Text Mining and Information Extraction with spaCy

Stars: ✭ 287 (-7.12%)

Mutual labels: natural-language-processing

The CMU Link Grammar natural language parser

Stars: ✭ 286 (-7.44%)

Mutual labels: natural-language-processing

Official implementation of the paper “GECToR – Grammatical Error Correction: Tag, Not Rewrite” // Published on BEA15 Workshop (co-located with ACL 2020) https://www.aclweb.org/anthology/2020.bea-1.16.pdf

Stars: ✭ 287 (-7.12%)

Mutual labels: natural-language-processing

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (-8.41%)

Mutual labels: natural-language-processing

A simple resume parser used for extracting information from resumes

Stars: ✭ 297 (-3.88%)

Mutual labels: natural-language-processing

The Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"

Stars: ✭ 279 (-9.71%)

Mutual labels: natural-language-processing

Source code for transferable dialogue state generator (TRADE, Wu et al., 2019). https://arxiv.org/abs/1905.08743

Stars: ✭ 287 (-7.12%)

Mutual labels: natural-language-processing

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

Stars: ✭ 3,312 (+971.84%)

Mutual labels: natural-language-processing

A collection of datasets that pair questions with SQL queries.

Stars: ✭ 287 (-7.12%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

nlprule

A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based approach to NLP using resources from LanguageTool.

Python Usage

Install: pip install nlprule

Use:

from nlprule import Tokenizer, Rules

tokenizer = Tokenizer.load("en")
rules = Rules.load("en", tokenizer)

rules.correct("He wants that you send him an email.")
# returns: 'He wants you to send him an email.'

rules.correct("I can due his homework.")
# returns: 'I can do his homework.'

for s in rules.suggest("She was not been here since Monday."):
    print(s.start, s.end, s.replacements, s.source, s.message)
# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?

for sentence in tokenizer.pipe("A brief example is shown."):
    for token in sentence:
        print(
            repr(token.text).ljust(10),
            repr(token.span).ljust(10),
            repr(token.tags).ljust(24),
            repr(token.lemmas).ljust(24),
            repr(token.chunks).ljust(24),
        )
# prints:
# ''         (0, 0)     ['SENT_START']           []                       []                      
# 'A'        (0, 1)     ['DT']                   ['A', 'a']               ['B-NP-singular']       
# 'brief'    (2, 7)     ['JJ']                   ['brief']                ['I-NP-singular']       
# 'example'  (8, 15)    ['NN:UN']                ['example']              ['E-NP-singular']       
# 'is'       (16, 18)   ['VBZ']                  ['be', 'is']             ['B-VP']                
# 'shown'    (19, 24)   ['VBN']                  ['show', 'shown']        ['I-VP']                
# '.'        (24, 25)   ['.', 'PCT', 'SENT_END'] ['.']                    ['O']

Rust Usage

Recommended setup:

Cargo.toml

[dependencies]
nlprule = "<version>"

[build-dependencies]
nlprule-build = "<version>" # must be the same as the nlprule version!

build.rs

fn main() {
    println!("cargo:rerun-if-changed=build.rs");

    nlprule_build::BinaryBuilder::new(
        &["en"],
        std::env::var("OUT_DIR").expect("OUT_DIR is set when build.rs is running"),
    )
    .build()
    .validate();
}

src/main.rs

use nlprule::{Rules, Tokenizer, tokenizer_filename, rules_filename};

fn main() {
    let mut tokenizer_bytes: &'static [u8] = include_bytes!(concat!(
        env!("OUT_DIR"),
        "/",
        tokenizer_filename!("en")
    ));
    let mut rules_bytes: &'static [u8] = include_bytes!(concat!(
        env!("OUT_DIR"),
        "/",
        rules_filename!("en")
    ));

    let tokenizer = Tokenizer::from_reader(&mut tokenizer_bytes).expect("tokenizer binary is valid");
    let rules = Rules::from_reader(&mut rules_bytes).expect("rules binary is valid");

    assert_eq!(
        rules.correct("She was not been here since Monday.", &tokenizer),
        String::from("She was not here since Monday.")
    );
}

nlprule and nlprule-build versions are kept in sync.

Main features

Rule-based Grammatical Error Correction through multiple thousand rules.
A text processing pipeline doing sentence segmentation, part-of-speech tagging, lemmatization, chunking and disambiguation.
Support for English, German and Spanish.
Spellchecking. (in progress)

Goals

A single place to apply spellchecking and grammatical error correction for a downstream task.
Fast, low-resource NLP suited for running:
1. as a pre- / postprocessing step for more sophisticated (i. e. ML) approaches.
2. in the background of another application with low overhead.
3. client-side in the browser via WebAssembly.
100% Rust code and dependencies.

Comparison to LanguageTool

	\|Disambiguation rules\|	\|Grammar rules\|	LT version	nlprule time	LanguageTool time
English	843 (100%)	3725 (~ 85%)	5.2	1	1.7 - 2.0
German	486 (100%)	2970 (~ 90%)	5.2	1	2.4 - 2.8
Spanish	Experimental support. Not fully tested yet.

See the benchmark issue for details.

Projects using nlprule

prosemd: a proofreading and linting language server for markdown files with VSCode integration.
cargo-spellcheck: a tool to check all your Rust documentation for spelling and grammar mistakes.

Please submit a PR to add your project!

Acknowledgements

All credit for the resources used in nlprule goes to LanguageTool who have made a Herculean effort to create high-quality resources for Grammatical Error Correction and broader NLP.

License

nlprule is licensed under the MIT license or Apache-2.0 license, at your option.

The nlprule binaries (*.bin) are derived from LanguageTool v5.2 and licensed under the LGPLv2.1 license. nlprule statically and dynamically links to these binaries. Under LGPLv2.1 §6(a) this does not have any implications on the license of nlprule itself.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 309

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗