All Projects → sedthh → lara-hungarian-nlp

sedthh / lara-hungarian-nlp

Licence: MIT license
NLP class for rapid ChatBot development in Hungarian language

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to lara-hungarian-nlp

Lunr Languages
A collection of languages stemmers and stopwords for Lunr Javascript library
Stars: ✭ 296 (+996.3%)
Mutual labels:  stemmer
Kelime kok ayirici
Derin Öğrenme Tabanlı - seq2seq - Türkçe için kelime kökü bulma web uygulaması - Turkish Stemmer (tr_stemmer)
Stars: ✭ 76 (+181.48%)
Mutual labels:  stemmer
Cadmium
Natural Language Processing (NLP) library for Crystal
Stars: ✭ 172 (+537.04%)
Mutual labels:  stemmer
Word forms
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
Stars: ✭ 463 (+1614.81%)
Mutual labels:  stemmer
Arabic Light Stemmer
Arabic light stemmer. Light stemming for Arabic words removes prefixes and suffixes and normalizes words
Stars: ✭ 14 (-48.15%)
Mutual labels:  stemmer
Php Stemmer
Native PHP Stemmer
Stars: ✭ 84 (+211.11%)
Mutual labels:  stemmer
gwizo
Simple Go implementation of the Porter Stemmer algorithm with powerful features.
Stars: ✭ 26 (-3.7%)
Mutual labels:  stemmer
sastrawijs
Indonesian language stemmer. Javascript port of PHP Sastrawi project.
Stars: ✭ 30 (+11.11%)
Mutual labels:  stemmer
Nlp Js Tools French
POS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (+18.52%)
Mutual labels:  stemmer
Stemmer
An English (Porter2) stemming implementation in Elixir.
Stars: ✭ 134 (+396.3%)
Mutual labels:  stemmer
Snowball
Snowball version of the Porter stemmer for the Lithuanian language.
Stars: ✭ 5 (-81.48%)
Mutual labels:  stemmer
Ptstem
Stemming Algorithms for the Portuguese Language
Stars: ✭ 13 (-51.85%)
Mutual labels:  stemmer
Stemmer
Fast Porter stemmer implementation
Stars: ✭ 86 (+218.52%)
Mutual labels:  stemmer
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+1603.7%)
Mutual labels:  stemmer
elasticsearch-analysis-morfologik
Morfologik Polish Lemmatizer plugin for Elasticsearch
Stars: ✭ 75 (+177.78%)
Mutual labels:  lemmatizer
Ruby Stemmer
Expose libstemmer_c to Ruby
Stars: ✭ 254 (+840.74%)
Mutual labels:  stemmer
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (+211.11%)
Mutual labels:  stemmer
golem
A lemmatizer implemented in Go
Stars: ✭ 54 (+100%)
Mutual labels:  lemmatizer
lemmy
🤘Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪
Stars: ✭ 68 (+151.85%)
Mutual labels:  lemmatizer
Arabicstemmer
Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.
Stars: ✭ 102 (+277.78%)
Mutual labels:  stemmer

Lara is a super fast, lightweight Python3 NLP library for ChatBot AI development in Hungarian language.

Instead of being an all purpose NLP tool, Lara was created to fit the quirks and uniqueness of the Hungarian (online) language as much as possible. The library is capable of matching inflected forms of keywords in text messages written in Hungarian. It also comes with functions for text processing, and can even identify common expressions and small talk topics in discussions.

Table of contents

  1. About Lara
    1. Find intents
    2. Extract information
    3. Handle common topics
    4. Create ML features
    5. And much more
  2. Licensing

About Lara

Here is a short list of things you can easily do with Lara in Hungarian. For full documentation and further examples, CHECK OUT THE WIKI. A complete case study on how to make ChatBots and Virtual Assistants in foreign languages is also available.

Find intents

With the Intents() Class, developers can easily match almost every possible inflected form of any keyword in Hungarian language. For example:

from lara import parser

ragozott_forma	= {
	"to_do"		: [{"stem":"csinál","wordclass":"verb"}],
}
ragozott_talalat= parser.Intents(ragozott_forma)

Will match the intent "to_do" in the following sentences:

  • Ő mit csinál a szobában?
  • Mit fogok még csinálni?
  • Mikor csináltad meg a szekrényt?
  • Megcsináltatták a berendezést.
  • Teljesen kicsinálva érzem magamat ettől a melegtől.
  • Csinálhatott volna mást is.
  • Visszacsinalnad az ekezeteket a billentyuzetemen, kerlek?
  • Vigyázz, hogy el ne gépeld a csniálni igét!

By defining the wordclass and stem of a keyword, Lara will generate possible patterns for text matching, without having to rely on large dictionaries!

from lara import parser

alma_intents	= {
	"alma"		: [{"stem":"alma","wordclass":"noun"}],
	"szed"		: [{"stem":"szed","wordclass":"verb"}],
	"piros"		: [{"stem":"piros","wordclass":"adjective"}]
}
alma_test	= parser.Intents(alma_intents)
print(alma_test.match("Mikor szedjük le a pirosabb almákat?"))

>>> {'alma': 1, 'szed': 2, 'piros': 2}

Extract information

It allows simple text processing:

from lara import parser

tweet		= 'A robotok elveszik a munkát! #NLP #ChatBot'
hashtags	= parser.Extract(tweet).hashtags()
print(hashtags)

>>> ['#nlp','#chatbot']

And normalization of extracted strings:

from lara import parser

sms		= 'Hívj fel! A számom 30/123 4567!'
info		= parser.Extract(sms)
print(info.phone_numbers(False))
print(info.phone_numbers(True))

>>> ['30/123 4567']
>>> ['+36 30 1234567']

It uses Black Magic™:

from lara import parser

sorcery		= 'Hívj fel ezen a számon 2018 IV. huszadikán mondjuk délután nyolc perccel háromnegyed kettő előtt!'
info		= parser.Extract(sorcery)
print(info.dates())
print(info.times())
	
>>> ['2018-04-20']
>>> ['13:37']

Handle common topics

Common entities are included:

from lara import parser, entities

user_text	= 'Igen, köszönöm a segítséget!'

common	= entities.common()
print(parser.Intents(common).match_set(user_text))

>>> {'yes', 'thx', 'help'}

Several small talk topics are also automatically handled:

from lara import parser, entities

user_text	= 'Te egy ember vagy, vagy egy intelligens számítógép vagy?'

chitchat	= entities.smalltalk()
chitchat_match	= parser.Intents(chitchat).match_set(user_text)
if 'user_love' in chitchat_match:
	print('Én is téged.')
elif 'are_you_a_robot' in chitchat_match:
	print('Egy számítógépet akkor nevezhetünk intelligensnek, ha át tud verni egy embert, hogy őt is embernek higgye.')
	
>>> Egy számítógépet akkor nevezhetünk intelligensnek, ha át tud verni egy embert, hogy őt is embernek higgye.

Create ML features

Rule based stemmers can help you create features from short Hungarian texts for Machine Learning models, without the need for large dictionaries:

from lara import stemmer, nlp

text 	= '''
	A szövegbányászat a strukturálatlan vagy kis mértékben strukturált 
	szöveges állományokból történő ismeret kinyerésének tudománya; 
	olyan különböző dokumentumforrásokból származó szöveges ismeretek
	és információk gépi intelligenciával történő kigyűjtése és 
	reprezentációja, amely a feldolgozás előtt rejtve és feltáratlanul 
	maradt az elemző előtt. 
	'''

clean	= nlp.remove_stopwords(text)
stems	= stemmer.tippmix(clean)
bigrams = nlp.ngram(stems,2)
print(bigrams)

>>> ['szövegbányász strukturál', 'strukturál kis', 'kis mér', 'mér strukturál', 'strukturál szöveg', 'szöveg állományok', ... 'mar elemz']

And much more

Use keywords in actual sentences:

from lara import nlp, stemmer

query	= "Toto - Afrika"
	
parts	= query.split('-')
artist	= stemmer.inverse(parts[0],'től')	# "tól" and "től" are both valid
title	= stemmer.inverse(parts[1],'t')
the	= nlp.az(title)
	
print('A zenelejátszó program az alábbi számot játssza:')
print(artist,the,title)

>>> A zenelejátszó program az alábbi számot játssza:
>>> Tototól az Afrikát

Better understand poetry:

from lara import nlp

huszt	= ['Bús düledékeiden, Husztnak romvára megállék;',
	'Csend vala, felleg alól szállt fel az éjjeli hold.']

for line in husz:
	print(nlp.metre(line))
	
>>> ['-', 'u', 'u', '-', 'u', 'u', '-', '-', '-', '-', '-', 'u', 'u', '-', '-']
>>> ['-', 'u', 'u', '-', 'u', 'u', '-', '-', 'u', 'u', '-', 'u', 'u', '-']

Licensing

Lara is available under the MIT license starting from version 2.0.0 and up.

Feel free to use it for your Hungarian ChatBot solutions and NLP Research purposes. Let me know if you've used it in an interesting project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].