All Projects → nuspell → Nuspell

nuspell / Nuspell

Licence: lgpl-3.0
🖋️ Fast and safe spellchecking C++ library

Projects that are alternatives of or similar to Nuspell

Languagetool
Style and Grammar Checker for 25+ Languages
Stars: ✭ 5,641 (+5123.15%)
Mutual labels:  natural-language-processing, spellcheck
Hunspell Dict Ko
Korean spellchecking dictionary for Hunspell
Stars: ✭ 187 (+73.15%)
Mutual labels:  natural-language-processing, spellcheck
Kts linguistics
Spellcheck, phonetics, text processing and more
Stars: ✭ 18 (-83.33%)
Mutual labels:  natural-language-processing, spellcheck
Nlprule
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Stars: ✭ 309 (+186.11%)
Mutual labels:  natural-language-processing, spellcheck
Hunspell
The most popular spellchecking library.
Stars: ✭ 1,196 (+1007.41%)
Mutual labels:  natural-language-processing, spellcheck
Pynlp
A pythonic wrapper for Stanford CoreNLP.
Stars: ✭ 103 (-4.63%)
Mutual labels:  natural-language-processing
Ios ml
List of Machine Learning, AI, NLP solutions for iOS. The most recent version of this article can be found on my blog.
Stars: ✭ 1,409 (+1204.63%)
Mutual labels:  natural-language-processing
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+1175.93%)
Mutual labels:  natural-language-processing
D2l En
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.
Stars: ✭ 11,837 (+10860.19%)
Mutual labels:  natural-language-processing
Ua Gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (+0%)
Mutual labels:  natural-language-processing
Nltk
NLTK Source
Stars: ✭ 10,309 (+9445.37%)
Mutual labels:  natural-language-processing
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-4.63%)
Mutual labels:  natural-language-processing
Repo 2016
R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation
Stars: ✭ 103 (-4.63%)
Mutual labels:  natural-language-processing
Easy Bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Stars: ✭ 106 (-1.85%)
Mutual labels:  natural-language-processing
Texting
[ACL 2020] Tensorflow implementation for "Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks"
Stars: ✭ 103 (-4.63%)
Mutual labels:  natural-language-processing
Allennlp
An open-source NLP research library, built on PyTorch.
Stars: ✭ 10,699 (+9806.48%)
Mutual labels:  natural-language-processing
Atis.keras
Spoken Language Understanding(SLU)/Slot Filling in Keras
Stars: ✭ 100 (-7.41%)
Mutual labels:  natural-language-processing
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+1190.74%)
Mutual labels:  natural-language-processing
Linguistic Style Transfer
Neural network parametrized objective to disentangle and transfer style and content in text
Stars: ✭ 106 (-1.85%)
Mutual labels:  natural-language-processing
Anago
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Stars: ✭ 1,392 (+1188.89%)
Mutual labels:  natural-language-processing

About Nuspell

Nuspell is a fast and safe spelling checker software program. It is designed for languages with rich morphology and complex word compounding. Nuspell is written in modern C++ and it supports Hunspell dictionaries.

Main features of Nuspell spelling checker:

  • Provides software library and command-line tool.
  • Suggests high-quality spelling corrections.
  • Backward compatibility with Hunspell dictionary file format.
  • Up to 3 times faster than Hunspell.
  • Full Unicode support backed by ICU.
  • Twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.).
  • Supports complex compounds (for example, Hungarian, German and Dutch).
  • Supports advanced features, for example: special casing rules (Turkish dotted i or German sharp s), conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms.
  • Free and open source software. Licensed under GNU LGPL v3 or later.

Building Nuspell

Dependencies

Build-only dependencies:

  • C++ 17 compiler, GCC >= v7, Clang >= v5, MSVC >= 2017
  • CMake >= v3.8
  • Catch2 >= v2.3.0 (It is only needed when building the tests. If it is not available as a system package, the the Git submodule will be used.)
  • Pandoc (optional, needed for building the manpage)

Run-time (and build-time) dependencies:

  • ICU4C

Recommended tools for developers: qtcreator, ninja, clang-format, gdb, vim, doxygen.

Building on GNU/Linux and Unixes

We first need to download the dependencies. Some may already be preinstalled.

For Ubuntu and Debian:

sudo apt install git cmake libicu-dev

Then run the following commands inside the Nuspell directory:

mkdir build
cd build
cmake ..
make
sudo make install

For faster build process run make -j, or use Ninja instead of Make.

If you are making a Linux distribution package (dep, rpm) you need some additional configurations on the CMake invocation. For example:

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr

Building on OSX and macOS

  1. Install Apple's Command-line tools.
  2. Install Homebrew package manager.
  3. Install dependencies with the next commands.
brew install cmake icu4c
export ICU_ROOT=$(brew --prefix icu4c)

Then run the standard cmake and make. See above. The ICU_ROOT variable is needed because icu4c is keg-only package in Homebrew and CMake can not find it by default. Alternatively, you can use -DICU_ROOT=... on the cmake command line.

If you want to build with GCC instead of Clang, you need to pull GCC with Homebrew and rebuild all the dependencies with it. See Homewbrew manuals.

Building on Windows

Compiling with Visual C++

  1. Install Visual Studio 2017 or newer. Alternatively, you can use Visual Studio Build Tools.
  2. Install Git for Windows and Cmake.
  3. Install vcpkg in some folder, e.g. in c:\vcpkg.
  4. With vcpkg install: icu.
  5. Run the commands bellow.
mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=c:\vcpkg\scripts\buildsystems\vcpkg.cmake -A Win32
cmake --build .

Compiling with Mingw64 and MSYS2

Download MSYS2, update everything and install the following packages:

pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-icu \
          mingw-w64-x86_64-cmake

Then from inside the Nuspell folder run:

mkdir build
cd build
cmake .. -G "Unix Makefiles"
make
make install

Building in Cygwin environment

Download the above mentioned dependencies with Cygwin package manager. Then compile the same way as on Linux. Cygwin builds depend on Cygwin1.dll.

Building on FreeBSD

Install the following required packages

pkg cmake icu catch

Then run the standard cmake and make as on Linux. See above.

Using the software

Using the command-line tool

The main executable is located in src/nuspell.

After compiling and installing you can run the Nuspell spell checker with a Nuspell, Hunspell or Myspell dictionary:

nuspell -d en_US text.txt

For more details see the man-page.

Using the Library

Sample program:

#include <iostream>
#include <nuspell/dictionary.hxx>
#include <nuspell/finder.hxx>

using namespace std;

int main()
{
	auto dict_list = vector<pair<string, string>>();
	nuspell::search_default_dirs_for_dicts(dict_list);
	auto dict_name_and_path = nuspell::find_dictionary(dict_list, "en_US");
	if (dict_name_and_path == end(dict_list))
		return 1; // Return error because we can not find the requested
		          // dictionary in the list.
	auto& dict_path = dict_name_and_path->second;

	auto dict = nuspell::Dictionary::load_from_path(dict_path);

	auto word = string();
	auto sugs = vector<string>();
	while (cin >> word) {
		if (dict.spell(word)) {
			cout << "Word \"" << word << "\" is ok.\n";
			continue;
		}

		cout << "Word \"" << word << "\" is incorrect.\n";
		dict.suggest(word, sugs);
		if (sugs.empty())
			continue;
		cout << "  Suggestions are: ";
		for (auto& sug : sugs)
			cout << sug << ' ';
		cout << '\n';
	}
}

On the command line you can link like this:

g++ example.cxx -std=c++17 -lnuspell -licuuc -licudata
# or better, use pkg-config
g++ example.cxx -std=c++17 $(pkg-config --cflags --libs nuspell)

Within Cmake you can use find_package() to link. For example:

find_package(Nuspell)
add_executable(myprogram main.cpp)
target_link_libraries(myprogram Nuspell::nuspell)

Dictionaries

Myspell, Hunspell and Nuspell dictionaries:

https://github.com/nuspell/nuspell/wiki/Dictionaries-and-Contacts

Advanced topics

Debugging Nuspell

First, always install the debugger:

sudo apt install gdb

For debugging we need to create a debug build and then we need to start gdb.

mkdir debug
cd debug
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j
gdb src/nuspell/nuspell

We recommend debugging to be done with an IDE.

Testing

To run the tests, run the following command after building:

ctest

See also

Full documentation in the wiki.

API Documentation for developers can be generated from the source files by running:

doxygen

The result can be viewed by opening doxygen/html/index.html in a web browser.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].