All Projects → gutfeeling → Word_forms

gutfeeling / Word_forms

Licence: mit
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Word forms

Vocabs
📚 A lightweight online dictionary integration to the command line. No browsers. No paperbacks.
Stars: ✭ 226 (-51.19%)
Mutual labels:  dictionary, words
Dictionary
A list of the most popular English words.
Stars: ✭ 135 (-70.84%)
Mutual labels:  dictionary, words
langua
A suite of language tools
Stars: ✭ 29 (-93.74%)
Mutual labels:  dictionary, words
Russian Words
List of Russian words
Stars: ✭ 168 (-63.71%)
Mutual labels:  dictionary, words
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (-0.65%)
Mutual labels:  natural-language-processing, stemmer
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (-7.99%)
Mutual labels:  natural-language-processing
Jionlp
中文 NLP 任务预处理工具包,准确、高效、零使用门槛
Stars: ✭ 449 (-3.02%)
Mutual labels:  natural-language-processing
Ernie
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
Stars: ✭ 4,659 (+906.26%)
Mutual labels:  natural-language-processing
Deep Learning Nlp
📡 Organized Resources for Deep Learning in Natural Language Processing
Stars: ✭ 421 (-9.07%)
Mutual labels:  natural-language-processing
Web App
Dictionary database with future API and bot integrations
Stars: ✭ 461 (-0.43%)
Mutual labels:  dictionary
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+908.64%)
Mutual labels:  natural-language-processing
Practical Pytorch
Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
Stars: ✭ 4,329 (+834.99%)
Mutual labels:  natural-language-processing
Pyshorttextcategorization
Various Algorithms for Short Text Mining
Stars: ✭ 429 (-7.34%)
Mutual labels:  natural-language-processing
Practical Nlp
Official Repository for 'Practical Natural Language Processing' by O'Reilly Media
Stars: ✭ 452 (-2.38%)
Mutual labels:  natural-language-processing
Corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+827.21%)
Mutual labels:  words
Ml Visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Stars: ✭ 5,676 (+1125.92%)
Mutual labels:  natural-language-processing
Bert Embedding
🔡 Token level embeddings from BERT model on mxnet and gluonnlp
Stars: ✭ 424 (-8.42%)
Mutual labels:  natural-language-processing
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (-5.4%)
Mutual labels:  natural-language-processing
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-1.94%)
Mutual labels:  natural-language-processing
Cs224n 2019 Solutions
Complete solutions for Stanford CS224n, winter, 2019
Stars: ✭ 436 (-5.83%)
Mutual labels:  natural-language-processing
word forms logo

Accurately generate all possible forms of an English word

Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy!

Examples

Some very timely examples :-P

>>> from word_forms.word_forms import get_word_forms
>>> get_word_forms("president")
>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'},
     'a': {'presidential'},
     'v': {'preside', 'presided', 'presiding', 'presides'},
     'r': {'presidentially'}}
>>> get_word_forms("elect")
>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'},
     'a': {'eligible', 'electoral', 'elective', 'elect'},
     'v': {'electing', 'elects', 'elected', 'elect'},
     'r': set()}
>>> get_word_forms("politician")
>>> {'n': {'politician', 'politics', 'politicians'},
     'a': {'political'},
     'v': set(),
     'r': {'politically'}}
>>> get_word_forms("am")
>>> {'n': {'being', 'beings'},
     'a': set(),
     'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'},
     'r': set()}
>>> get_word_forms("ran")
>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'},
     'a': {'running', 'runny'},
     'v': {'running', 'run', 'ran', 'runs'},
     'r': set()}
>>> get_word_forms('continent', 0.8) # with configurable similarity threshold
>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'},
     'a': {'continental', 'continent'},
     'v': set(),
     'r': set()}

As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-)

Help can be obtained at any time by typing the following:

>>> help(get_word_forms)

Why?

In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable" or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a base word and then comparing the base words. The process is called Stemming. For example, the Porter Stemmer reduces both "love" and "lovely" into the base word "love".

Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word. For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate. For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of rational rules for finding the base words, and as we all know, the English language does not always behave very rationally.

Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The WordNet Lemmatizer included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate".

Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc.

Bonus: A simple lemmatizer

We also offer a very simple lemmatizer based on word_forms. Here is how to use it.

>>> from word_forms.lemmatizer import lemmatize
>>> lemmatize("operations")
'operant'
>>> lemmatize("operate")
'operant'

Enjoy!

Compatibility

Tested on Python 3

Installation

Using pip:

pip install -U word_forms

From source

Or you can install it from source:

  1. Clone the repository:
git clone https://github.com/gutfeeling/word_forms.git
  1. Install it using pip or setup.py
pip install -e word_forms
% or
cd word_forms
python setup.py install

Acknowledgement

  1. The XTAG project for information on verb conjugations.
  2. WordNet

Maintainer

Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me at [email protected].

Contributors

  • Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0.
  • Sajal Sharma @sajal2692 ia a major contributor.
  • Pamphile Roy @tupui is responsible for the PyPI package.

Contributions

Word Forms is not perfect. In particular, a couple of aspects can be improved.

  1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is not perfect. At the moment, I am using inflect for it.

If you like this package, feel free to contribute. Your pull requests are most welcome.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].