All Projects → OleanderSoftware → OleanderStemmingLibrary

OleanderSoftware / OleanderStemmingLibrary

Licence: other
Porter stemming library (C++)

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to OleanderStemmingLibrary

CISTEM
Stemmer for German
Stars: ✭ 33 (-10.81%)
Mutual labels:  stemming, stemming-algorithm
Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (+351.35%)
Mutual labels:  text-analysis
Lexisnexistools
📰 Working with newspaper data from 'LexisNexis'
Stars: ✭ 59 (+59.46%)
Mutual labels:  text-analysis
Stanza Old
Stanford NLP group's shared Python tools.
Stars: ✭ 142 (+283.78%)
Mutual labels:  text-analysis
Orange3 Text
🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (+124.32%)
Mutual labels:  text-analysis
Wikitextparser
A simple WikiText parsing library for MediaWiki
Stars: ✭ 149 (+302.7%)
Mutual labels:  text-analysis
Ore
An R interface to the Onigmo regular expression library
Stars: ✭ 54 (+45.95%)
Mutual labels:  text-analysis
Shifterator
Interpretable data visualizations for understanding how texts differ at the word level
Stars: ✭ 209 (+464.86%)
Mutual labels:  text-analysis
Textclean
Tools for cleaning and normalizing text data
Stars: ✭ 159 (+329.73%)
Mutual labels:  text-analysis
Smltar
Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge
Stars: ✭ 125 (+237.84%)
Mutual labels:  text-analysis
Padatious
A neural network intent parser
Stars: ✭ 124 (+235.14%)
Mutual labels:  text-analysis
R Text Data
List of textual data sources to be used for text mining in R
Stars: ✭ 85 (+129.73%)
Mutual labels:  text-analysis
Applied Ml
Code and Resources for "Applied Machine Learning"
Stars: ✭ 156 (+321.62%)
Mutual labels:  text-analysis
Awesome Customer Analytics
A curated list of awesome customer analytics content
Stars: ✭ 79 (+113.51%)
Mutual labels:  text-analysis
Woke
✊ Detect non-inclusive language in your source code.
Stars: ✭ 190 (+413.51%)
Mutual labels:  text-analysis
Javascript Text Expander
Expands texts as you type, naturally
Stars: ✭ 58 (+56.76%)
Mutual labels:  text-analysis
Ml Dl Scripts
The repository provides usefull python scripts for ML and data analysis
Stars: ✭ 119 (+221.62%)
Mutual labels:  text-analysis
Qdap
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
Stars: ✭ 146 (+294.59%)
Mutual labels:  text-analysis
wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
Stars: ✭ 78 (+110.81%)
Mutual labels:  text-analysis
Fake news detection
Fake News Detection in Python
Stars: ✭ 194 (+424.32%)
Mutual labels:  text-analysis

Oleander Stemming Library

logo

cppcheck doxygen unit-tests

About

C++ library for stemming words down to their roots.

Stemming is useful for Natural Language Processing systems. The first step in an NLP system is to strip words down to their roots. Afterwards, these roots can be combined, tabulated, categorized, etc. Stemming provides this first step for NLP.

Features

  • Based on the Porter/Snowball stemming family of algorithms
  • Header-only library
  • Case insensitive
  • Includes Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish

Example

#include "danish_stem.h"
#include "dutch_stem.h"
#include "english_stem.h"
#include "finnish_stem.h"
#include "french_stem.h"
#include "german_stem.h"
#include "italian_stem.h"
#include "norwegian_stem.h"
#include "portuguese_stem.h"
#include "russian_stem.h"
#include "spanish_stem.h"
#include "swedish_stem.h"

#include <iostream>
#include <string>

int main()
    {
    // the word to be stemmed
    std::wstring word(L"documentation");

    /* Create an instance of a "english_stem" class. The template argument for the
       stemmers are the type of std::basic_string that you are trying to stem,
       by default std::wstring (Unicode strings).
       As long as the char type of your basic_string is wchar_t, then you can use
       any type of basic_string.
       This is to say, if your basic_string has a custom char_traits or allocator,
       then just specify it in your template argument to the stemmer. For example:

       using myString = std::basic_string<wchar_t, myTraits, myAllocator>;
       myString word(L"documentation");

       stemming::english_stem<myString> StemEnglish;
       StemEnglish(word);
    */

    stemming::english_stem<> StemEnglish;
    std::wcout << L"(English) Original text:\t" << word.c_str() << std::endl;

    // The "english_stem" has its operator() overloaded, so you can
    // treat your class instance like it's a function.  In this case,
    // pass in the std::wstring to be stemmed. Note that this alters
    // the original std::wstring, so when the call is done the string will
    // be stemmed.
    StemEnglish(word);
    // now the variable "word" should equal "document"
    std::wcout << L"(English) Stemmed text:\t" << word.c_str() << std::endl;

    // try a similar word that should have the same stem
    word = L"documenting";
    std::wcout << L"(English) Original text:\t" << word.c_str() << std::endl;
    StemEnglish(word);
    // now the variable "word" should equal "document"
    std::wcout << L"(English) Stemmed text:\t" << word.c_str() << std::endl;

    // now try a French word
    stemming::french_stem<> StemFrench;
    word = L"continuellement";
    std::wcout << L"\n(French) Original text:\t" << word.c_str() << std::endl;
    StemFrench(word);
    // now the variable "word" should equal "continuel"
    std::wcout << L"(French) Stemmed text:\t" << word.c_str() << std::endl;

    // many other stemmers are also available
    stemming::danish_stem<> StemDanish;
    stemming::dutch_stem<> StemDutch;
    stemming::finnish_stem<> StemFinnish;
    stemming::italian_stem<> StemItalian;
    stemming::german_stem<> StemGerman;
    stemming::norwegian_stem<> StemNorwegian;
    stemming::portuguese_stem<> StemPortuguese;
    stemming::russian_stem<> StemRussian;
    stemming::spanish_stem<> StemSpanish;
    stemming::swedish_stem<> StemSwedish;

    return 0;
    }
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].