OleanderSoftware / OleanderStemmingLibrary

Licence: other

Porter stemming library (C++)

Programming Languages

C++

36643 projects - #6 most used programming language

CMake

9771 projects

Projects that are alternatives of or similar to OleanderStemmingLibrary

CISTEM

Stemmer for German

Stars: ✭ 33 (-10.81%)

Mutual labels: stemming, stemming-algorithm

Textvec

Text vectorization tool to outperform TFIDF for classification tasks

Stars: ✭ 167 (+351.35%)

Mutual labels: text-analysis

Lexisnexistools

📰 Working with newspaper data from 'LexisNexis'

Stars: ✭ 59 (+59.46%)

Mutual labels: text-analysis

Stanza Old

Stanford NLP group's shared Python tools.

Stars: ✭ 142 (+283.78%)

Mutual labels: text-analysis

Orange3 Text

🍊 📄 Text Mining add-on for Orange3

Stars: ✭ 83 (+124.32%)

Mutual labels: text-analysis

Wikitextparser

A simple WikiText parsing library for MediaWiki

Stars: ✭ 149 (+302.7%)

Mutual labels: text-analysis

Ore

An R interface to the Onigmo regular expression library

Stars: ✭ 54 (+45.95%)

Mutual labels: text-analysis

Shifterator

Interpretable data visualizations for understanding how texts differ at the word level

Stars: ✭ 209 (+464.86%)

Mutual labels: text-analysis

Textclean

Tools for cleaning and normalizing text data

Stars: ✭ 159 (+329.73%)

Mutual labels: text-analysis

Smltar

Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge

Stars: ✭ 125 (+237.84%)

Mutual labels: text-analysis

Padatious

A neural network intent parser

Stars: ✭ 124 (+235.14%)

Mutual labels: text-analysis

R Text Data

List of textual data sources to be used for text mining in R

Stars: ✭ 85 (+129.73%)

Mutual labels: text-analysis

Applied Ml

Code and Resources for "Applied Machine Learning"

Stars: ✭ 156 (+321.62%)

Mutual labels: text-analysis

Awesome Customer Analytics

A curated list of awesome customer analytics content

Stars: ✭ 79 (+113.51%)

Mutual labels: text-analysis

Woke

✊ Detect non-inclusive language in your source code.

Stars: ✭ 190 (+413.51%)

Mutual labels: text-analysis

Javascript Text Expander

Expands texts as you type, naturally

Stars: ✭ 58 (+56.76%)

Mutual labels: text-analysis

Ml Dl Scripts

The repository provides usefull python scripts for ML and data analysis

Stars: ✭ 119 (+221.62%)

Mutual labels: text-analysis

Qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

Stars: ✭ 146 (+294.59%)

Mutual labels: text-analysis

wordhoard

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

Stars: ✭ 78 (+110.81%)

Mutual labels: text-analysis

Fake news detection

Fake News Detection in Python

Stars: ✭ 194 (+424.32%)

Mutual labels: text-analysis

View All Similar Projects ➔

Oleander Stemming Library

About

C++ library for stemming words down to their roots.

Stemming is useful for Natural Language Processing systems. The first step in an NLP system is to strip words down to their roots. Afterwards, these roots can be combined, tabulated, categorized, etc. Stemming provides this first step for NLP.

Features

Based on the Porter/Snowball stemming family of algorithms
Header-only library
Case insensitive
Includes Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish

Example

#include "danish_stem.h"
#include "dutch_stem.h"
#include "english_stem.h"
#include "finnish_stem.h"
#include "french_stem.h"
#include "german_stem.h"
#include "italian_stem.h"
#include "norwegian_stem.h"
#include "portuguese_stem.h"
#include "russian_stem.h"
#include "spanish_stem.h"
#include "swedish_stem.h"

#include <iostream>
#include <string>

int main()
    {
    // the word to be stemmed
    std::wstring word(L"documentation");

    /* Create an instance of a "english_stem" class. The template argument for the
       stemmers are the type of std::basic_string that you are trying to stem,
       by default std::wstring (Unicode strings).
       As long as the char type of your basic_string is wchar_t, then you can use
       any type of basic_string.
       This is to say, if your basic_string has a custom char_traits or allocator,
       then just specify it in your template argument to the stemmer. For example:

       using myString = std::basic_string<wchar_t, myTraits, myAllocator>;
       myString word(L"documentation");

       stemming::english_stem<myString> StemEnglish;
       StemEnglish(word);
    */

    stemming::english_stem<> StemEnglish;
    std::wcout << L"(English) Original text:\t" << word.c_str() << std::endl;

    // The "english_stem" has its operator() overloaded, so you can
    // treat your class instance like it's a function.  In this case,
    // pass in the std::wstring to be stemmed. Note that this alters
    // the original std::wstring, so when the call is done the string will
    // be stemmed.
    StemEnglish(word);
    // now the variable "word" should equal "document"
    std::wcout << L"(English) Stemmed text:\t" << word.c_str() << std::endl;

    // try a similar word that should have the same stem
    word = L"documenting";
    std::wcout << L"(English) Original text:\t" << word.c_str() << std::endl;
    StemEnglish(word);
    // now the variable "word" should equal "document"
    std::wcout << L"(English) Stemmed text:\t" << word.c_str() << std::endl;

    // now try a French word
    stemming::french_stem<> StemFrench;
    word = L"continuellement";
    std::wcout << L"\n(French) Original text:\t" << word.c_str() << std::endl;
    StemFrench(word);
    // now the variable "word" should equal "continuel"
    std::wcout << L"(French) Stemmed text:\t" << word.c_str() << std::endl;

    // many other stemmers are also available
    stemming::danish_stem<> StemDanish;
    stemming::dutch_stem<> StemDutch;
    stemming::finnish_stem<> StemFinnish;
    stemming::italian_stem<> StemItalian;
    stemming::german_stem<> StemGerman;
    stemming::norwegian_stem<> StemNorwegian;
    stemming::portuguese_stem<> StemPortuguese;
    stemming::russian_stem<> StemRussian;
    stemming::spanish_stem<> StemSpanish;
    stemming::swedish_stem<> StemSwedish;

    return 0;
    }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

OleanderSoftware / OleanderStemmingLibrary

Programming Languages

Labels

Projects that are alternatives of or similar to OleanderStemmingLibrary

Oleander Stemming Library

About

Features

Example