All Projects → aaaton → golem

aaaton / golem

Licence: MIT license
A lemmatizer implemented in Go

Programming Languages

go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to golem

lemmy
🤘Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪
Stars: ✭ 68 (+25.93%)
Mutual labels:  lemmatizer
elasticsearch-analysis-morfologik
Morfologik Polish Lemmatizer plugin for Elasticsearch
Stars: ✭ 75 (+38.89%)
Mutual labels:  lemmatizer
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+4562.96%)
Mutual labels:  lemmatizer
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-40.74%)
Mutual labels:  lemmatizer
mystem-scala
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-61.11%)
Mutual labels:  lemmatizer
lemma
A Morphological Parser (Analyser) / Lemmatizer written in Elixir.
Stars: ✭ 45 (-16.67%)
Mutual labels:  lemmatizer
wink-lemmatizer
English lemmatizer
Stars: ✭ 53 (-1.85%)
Mutual labels:  lemmatizer
alix
A Lucene Indexer for XML, with lexical analysis (lemmatization for French)
Stars: ✭ 15 (-72.22%)
Mutual labels:  lemmatizer
jargon
Tokenizers and lemmatizers for Go
Stars: ✭ 98 (+81.48%)
Mutual labels:  lemmatizer
Turkish-Lemmatizer
Lemmatization for Turkish Language
Stars: ✭ 72 (+33.33%)
Mutual labels:  lemmatizer
libmorph
libmorph rus/ukr - fast & accurate morphological analyzer/analyses for Russian and Ukrainian
Stars: ✭ 16 (-70.37%)
Mutual labels:  lemmatizer
GrammarEngine
Грамматический Словарь Русского Языка (+ английский, японский, etc)
Stars: ✭ 68 (+25.93%)
Mutual labels:  lemmatizer
lara-hungarian-nlp
NLP class for rapid ChatBot development in Hungarian language
Stars: ✭ 27 (-50%)
Mutual labels:  lemmatizer

GoLem

This project is a dictionary based lemmatizer written in go.

Since v4 all dictionaries need to be gotten individually.

go get github.com/aaaton/golem/v4

What?

A lemmatizer is a tool that finds the base form of words.

Lang Input Output
English aligning align
Swedish sprungit springa
French abattaient abattre

It's based on the dictionaries found on michmech/lemmatization-lists, which are available under the Open Database License. This project would not be feasible without them.

Languages

At the moment golem supports English, Swedish, French, Spanish, Italian & German, but adding another language should be no more trouble than getting the dictionary for that language. Some of which are already available on lexiconista. Please let me know if there is something you would like to see in here, or fork the project and create a pull request.

English

go get github.com/aaaton/golem/v4/dicts/en

Swedish

go get github.com/aaaton/golem/v4/dicts/sv

French

go get github.com/aaaton/golem/v4/dicts/fr

German

go get github.com/aaaton/golem/v4/dicts/de

Spanish

go get github.com/aaaton/golem/v4/dicts/es

Italian

go get github.com/aaaton/golem/v4/dicts/it

Basic usage

package main

import (
	"github.com/aaaton/golem/v4"
	"github.com/aaaton/golem/v4/dicts/en"
)

func main() {
	// the language packages are available under golem/dicts
	// "en" is for english
	lemmatizer, err := golem.New(en.New())
	if err != nil {
		panic(err)
	}
	word := lemmatizer.Lemma("Abducting")
	if word != "abduct" {
		panic("The output is not what is expected!")
	}
}

Contributors

  • axamon
  • charlesgiroux
  • glaslos
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].