All Projects → jonsafari → perstem

jonsafari / perstem

Licence: other
Persian stemmer and morphological analyzer

Programming Languages

perl
6916 projects

Projects that are alternatives of or similar to perstem

PersianStemmer-Python
PersianStemmer-Python
Stars: ✭ 43 (+138.89%)
Mutual labels:  persian, persian-language, stemmer, persian-nlp, persian-stemmer
Persian-Sentiment-Analyzer
Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )
Stars: ✭ 30 (+66.67%)
Mutual labels:  persian, persian-nlp
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (+533.33%)
Mutual labels:  persian-language, persian-nlp
persian
Simple Python tool for Persian language localization.
Stars: ✭ 141 (+683.33%)
Mutual labels:  persian, persian-language
vue-persian-tools
Persian tools wrapper for vue.js
Stars: ✭ 21 (+16.67%)
Mutual labels:  persian, persian-language
Saaghar
“Saaghar” (ساغر) is a Persian poetry software written by C++ under Qt framework, it uses "ganjoor" database as its database. It has tab feature in both its “Viewer” and its “Search” page that cause it be suitable for research goals.
Stars: ✭ 42 (+133.33%)
Mutual labels:  persian, persian-language
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+233.33%)
Mutual labels:  persian, persian-language
PersianNER
Named-Entity Recognition in Persian Language
Stars: ✭ 48 (+166.67%)
Mutual labels:  persian-language, persian-nlp
Persian-Summarization
Statistical and Semantical Text Summarizer in Persian Language
Stars: ✭ 38 (+111.11%)
Mutual labels:  persian-language, persian-nlp
py-persian-tools
An anthology of a variety of tools for the Persian language in Python
Stars: ✭ 106 (+488.89%)
Mutual labels:  persian, persian-language
persian
Some utilities for Persian language in Go (Golang)
Stars: ✭ 65 (+261.11%)
Mutual labels:  persian, persian-language
bash-mardom-azar
بشِ مردم آزار!
Stars: ✭ 19 (+5.56%)
Mutual labels:  persian
Nozha-rtl-Dashboard
Nozha is a rtl / ltr Admin Panel with Dark Mode
Stars: ✭ 31 (+72.22%)
Mutual labels:  persian
DNTPersianComponents.Blazor
A collection of Persian components for Blazor
Stars: ✭ 27 (+50%)
Mutual labels:  persian
PersianDateRangePicker
Select range of date and time in the Persian
Stars: ✭ 41 (+127.78%)
Mutual labels:  persian
easy-materialize-rtl
Simple way to set RTL for materializecss.com.
Stars: ✭ 20 (+11.11%)
Mutual labels:  persian
ICU4N
International Components for Unicode for .NET
Stars: ✭ 18 (+0%)
Mutual labels:  transliterator
minimalist
ویرایشگر مارک‌داون برای متون فارسی
Stars: ✭ 17 (-5.56%)
Mutual labels:  persian
aspdotnet-core-fundamentals
Persian notes for ASP.NET Core Fundamentals course (Pluralsight)
Stars: ✭ 25 (+38.89%)
Mutual labels:  persian
BankPayment
Persian Bank Payment Server
Stars: ✭ 37 (+105.56%)
Mutual labels:  persian

Perstem: Persian stemmer, morphological analyzer, transliterator

Persian (Farsi) stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger. Input may be encoded as Perso-Arabic script UTF-8, ISIRI 3342, Windows-1256, SGML/HTML/XML-style numeric character references (ncr), or dehdari-transliterated latin-script text. Use the -i flag to specify input encoding. Output is handled similarly.

Usage

  perl perstem.pl [options] < input > output

Options

 -f, --form <x>         Output forms as one of the following:
                          dict: as they appear in a dictionary (default)
                          linked: show all morphemes, linked together
                          unlinked: show all morphemes as separate tokens
                          untouched: don't stem/analyze; mostly for char-set conversion
     --flush            Autoflush buffer output after every line
 -h, --help             Print this usage
 -i, --input <type>     Input character encoding type {cp1256,isiri3342,ncr,
                        translit,utf8} (default: utf8)
     --irreg-stem {0|1} Resolve irregular present-tense verb stems to their
                        past-tense stems (eg. kon -> kar).  (default: 1 == true)
 -n, --noroman          Delete all non-Arabic script characters (eg. HTML tags)
 -o, --output <type>    Output character encoding type {arabtex,cp1256,
                        isiri3342,ncr,translit,utf8} (default: utf8)
 -p, --pos              Tag inflected words for parts of speech
     --pos-sep <char>   Separate words from their parts of speech by <char>
                        (default: "/" )
 -r, --recall           Increase recall by parsing ambiguous affixes; may lower
                        precision
     --skip-comments    Skip commented-out lines, without printing them
 -s, --stem             Return only word stems
 -t, --tokenize {0|1}   Tokenize punctuation (default: 1 == true)
 -u, --unvowel          Remove short vowels
 -v, --version          Print version
 -z, --zwnj {0|1}       Insert Zero Width Non-Joiners where they should be (default: 1 == true)

Acknowledgements

Thanks to Jace Livingston, David Zajic, and Corey Miller for their comprehensive error analysis and other suggestions. Thanks to Jay Ritch and Artyom Lukanin for spotting bugs.

Citation

If you use this software please cite the following

Dehdari, Jon, and Deryle Lonsdale. 2008. A link grammar parser for Persian. In Karimi, S., Samiian, V., and Stilo, D., editors, Aspects of Iranian Linguistics, volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 (BibTeX)

Jadidinejad, Amir Hossein, Fariborz Mahmoudi, and Jon Dehdari. 2010. Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian. In Peters, C., Nunzio, G. D., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., and Roda, G., editors, Multilingual Information Access Evaluation I. Text Retrieval Experiments, volume 6241 of Lecture Notes in Computer Science, pages 98–101. Springer, Heidelberg. (BibTeX)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].