All Projects → bminixhofer → Nlprule

bminixhofer / Nlprule

Licence: other
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

Programming Languages

rust
11053 projects
grammar
57 projects

Projects that are alternatives of or similar to Nlprule

Hunspell
The most popular spellchecking library.
Stars: ✭ 1,196 (+287.06%)
Mutual labels:  natural-language-processing, spellcheck
Languagetool
Style and Grammar Checker for 25+ Languages
Stars: ✭ 5,641 (+1725.57%)
Mutual labels:  natural-language-processing, spellcheck
Nuspell
🖋️ Fast and safe spellchecking C++ library
Stars: ✭ 108 (-65.05%)
Mutual labels:  natural-language-processing, spellcheck
Kts linguistics
Spellcheck, phonetics, text processing and more
Stars: ✭ 18 (-94.17%)
Mutual labels:  natural-language-processing, spellcheck
Hunspell Dict Ko
Korean spellchecking dictionary for Hunspell
Stars: ✭ 187 (-39.48%)
Mutual labels:  natural-language-processing, spellcheck
Proofreader
Simple text proofreader based on 'write-good' (hemingway-app-like suggestions) and 'nodehun' (spelling).
Stars: ✭ 285 (-7.77%)
Mutual labels:  spellcheck
Deep Learning Nlp Rl Papers
Recent Deep Learning papers in NLU and RL
Stars: ✭ 288 (-6.8%)
Mutual labels:  natural-language-processing
Languagecrunch
LanguageCrunch NLP server docker image
Stars: ✭ 281 (-9.06%)
Mutual labels:  natural-language-processing
Awesome Distributed Deep Learning
A curated list of awesome Distributed Deep Learning resources.
Stars: ✭ 277 (-10.36%)
Mutual labels:  natural-language-processing
Nlp
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
Stars: ✭ 304 (-1.62%)
Mutual labels:  natural-language-processing
Textfooler
A Model for Natural Language Attack on Text Classification and Inference
Stars: ✭ 298 (-3.56%)
Mutual labels:  natural-language-processing
Medacy
🏥 Medical Text Mining and Information Extraction with spaCy
Stars: ✭ 287 (-7.12%)
Mutual labels:  natural-language-processing
Link Grammar
The CMU Link Grammar natural language parser
Stars: ✭ 286 (-7.44%)
Mutual labels:  natural-language-processing
Gector
Official implementation of the paper “GECToR – Grammatical Error Correction: Tag, Not Rewrite” // Published on BEA15 Workshop (co-located with ACL 2020) https://www.aclweb.org/anthology/2020.bea-1.16.pdf
Stars: ✭ 287 (-7.12%)
Mutual labels:  natural-language-processing
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-8.41%)
Mutual labels:  natural-language-processing
Pyresparser
A simple resume parser used for extracting information from resumes
Stars: ✭ 297 (-3.88%)
Mutual labels:  natural-language-processing
Swem
The Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
Stars: ✭ 279 (-9.71%)
Mutual labels:  natural-language-processing
Trade Dst
Source code for transferable dialogue state generator (TRADE, Wu et al., 2019). https://arxiv.org/abs/1905.08743
Stars: ✭ 287 (-7.12%)
Mutual labels:  natural-language-processing
Libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Stars: ✭ 3,312 (+971.84%)
Mutual labels:  natural-language-processing
Text2sql Data
A collection of datasets that pair questions with SQL queries.
Stars: ✭ 287 (-7.12%)
Mutual labels:  natural-language-processing

nlprule

PyPI Crates.io Docs.rs PyPI Downloads License

A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based approach to NLP using resources from LanguageTool.

Python Usage

Install: pip install nlprule

Use:

from nlprule import Tokenizer, Rules

tokenizer = Tokenizer.load("en")
rules = Rules.load("en", tokenizer)
rules.correct("He wants that you send him an email.")
# returns: 'He wants you to send him an email.'

rules.correct("I can due his homework.")
# returns: 'I can do his homework.'

for s in rules.suggest("She was not been here since Monday."):
    print(s.start, s.end, s.replacements, s.source, s.message)
# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?
for sentence in tokenizer.pipe("A brief example is shown."):
    for token in sentence:
        print(
            repr(token.text).ljust(10),
            repr(token.span).ljust(10),
            repr(token.tags).ljust(24),
            repr(token.lemmas).ljust(24),
            repr(token.chunks).ljust(24),
        )
# prints:
# ''         (0, 0)     ['SENT_START']           []                       []                      
# 'A'        (0, 1)     ['DT']                   ['A', 'a']               ['B-NP-singular']       
# 'brief'    (2, 7)     ['JJ']                   ['brief']                ['I-NP-singular']       
# 'example'  (8, 15)    ['NN:UN']                ['example']              ['E-NP-singular']       
# 'is'       (16, 18)   ['VBZ']                  ['be', 'is']             ['B-VP']                
# 'shown'    (19, 24)   ['VBN']                  ['show', 'shown']        ['I-VP']                
# '.'        (24, 25)   ['.', 'PCT', 'SENT_END'] ['.']                    ['O']
Rust Usage

Recommended setup:

Cargo.toml

[dependencies]
nlprule = "<version>"

[build-dependencies]
nlprule-build = "<version>" # must be the same as the nlprule version!

build.rs

fn main() {
    println!("cargo:rerun-if-changed=build.rs");

    nlprule_build::BinaryBuilder::new(
        &["en"],
        std::env::var("OUT_DIR").expect("OUT_DIR is set when build.rs is running"),
    )
    .build()
    .validate();
}

src/main.rs

use nlprule::{Rules, Tokenizer, tokenizer_filename, rules_filename};

fn main() {
    let mut tokenizer_bytes: &'static [u8] = include_bytes!(concat!(
        env!("OUT_DIR"),
        "/",
        tokenizer_filename!("en")
    ));
    let mut rules_bytes: &'static [u8] = include_bytes!(concat!(
        env!("OUT_DIR"),
        "/",
        rules_filename!("en")
    ));

    let tokenizer = Tokenizer::from_reader(&mut tokenizer_bytes).expect("tokenizer binary is valid");
    let rules = Rules::from_reader(&mut rules_bytes).expect("rules binary is valid");

    assert_eq!(
        rules.correct("She was not been here since Monday.", &tokenizer),
        String::from("She was not here since Monday.")
    );
}

nlprule and nlprule-build versions are kept in sync.

Main features

  • Rule-based Grammatical Error Correction through multiple thousand rules.
  • A text processing pipeline doing sentence segmentation, part-of-speech tagging, lemmatization, chunking and disambiguation.
  • Support for English, German and Spanish.
  • Spellchecking. (in progress)

Goals

  • A single place to apply spellchecking and grammatical error correction for a downstream task.
  • Fast, low-resource NLP suited for running:
    1. as a pre- / postprocessing step for more sophisticated (i. e. ML) approaches.
    2. in the background of another application with low overhead.
    3. client-side in the browser via WebAssembly.
  • 100% Rust code and dependencies.

Comparison to LanguageTool

|Disambiguation rules| |Grammar rules| LT version nlprule time LanguageTool time
English 843 (100%) 3725 (~ 85%) 5.2 1 1.7 - 2.0
German 486 (100%) 2970 (~ 90%) 5.2 1 2.4 - 2.8
Spanish Experimental support. Not fully tested yet.

See the benchmark issue for details.

Projects using nlprule

  • prosemd: a proofreading and linting language server for markdown files with VSCode integration.
  • cargo-spellcheck: a tool to check all your Rust documentation for spelling and grammar mistakes.

Please submit a PR to add your project!

Acknowledgements

All credit for the resources used in nlprule goes to LanguageTool who have made a Herculean effort to create high-quality resources for Grammatical Error Correction and broader NLP.

License

nlprule is licensed under the MIT license or Apache-2.0 license, at your option.

The nlprule binaries (*.bin) are derived from LanguageTool v5.2 and licensed under the LGPLv2.1 license. nlprule statically and dynamically links to these binaries. Under LGPLv2.1 §6(a) this does not have any implications on the license of nlprule itself.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].