All Projects → sorenlind → lemmy

sorenlind / lemmy

Licence: MIT license
🤘Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to lemmy

wink-lemmatizer
English lemmatizer
Stars: ✭ 53 (-22.06%)
Mutual labels:  lemmatizer, lemma
Semantic-Textual-Similarity
Natural Language Processing using NLTK and Spacy
Stars: ✭ 30 (-55.88%)
Mutual labels:  spacy
Neuralcoref
✨Fast Coreference Resolution in spaCy with Neural Networks
Stars: ✭ 2,453 (+3507.35%)
Mutual labels:  spacy
Holmes Extractor
Information extraction from English and German texts based on predicate logic
Stars: ✭ 233 (+242.65%)
Mutual labels:  spacy
Summarizer
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.
Stars: ✭ 213 (+213.24%)
Mutual labels:  spacy
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-57.35%)
Mutual labels:  spacy
Displacy Ent
💥 displaCy-ent.js: An open-source named entity visualiser for the modern web
Stars: ✭ 191 (+180.88%)
Mutual labels:  spacy
inception-external-recommender
Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-service compatible with the external recommender API of INCEpTION.
Stars: ✭ 36 (-47.06%)
Mutual labels:  spacy
prodigy-scratch
Prodigy thing(z)
Stars: ✭ 13 (-80.88%)
Mutual labels:  spacy
Spacy Services
💫 REST microservices for various spaCy-related tasks
Stars: ✭ 230 (+238.24%)
Mutual labels:  spacy
Prodigy Recipes
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
Stars: ✭ 229 (+236.76%)
Mutual labels:  spacy
Spacy Lookup
Named Entity Recognition based on dictionaries
Stars: ✭ 212 (+211.76%)
Mutual labels:  spacy
spaczz
Fuzzy matching and more functionality for spaCy.
Stars: ✭ 215 (+216.18%)
Mutual labels:  spacy
Spacyr
R wrapper to spaCy NLP
Stars: ✭ 202 (+197.06%)
Mutual labels:  spacy
spacy-dbpedia-spotlight
A spaCy wrapper for DBpedia Spotlight
Stars: ✭ 85 (+25%)
Mutual labels:  spacy
Thinc
🔮 A refreshing functional take on deep learning, compatible with your favorite libraries
Stars: ✭ 2,422 (+3461.76%)
Mutual labels:  spacy
Question Generation
Generating multiple choice questions from text using Machine Learning.
Stars: ✭ 227 (+233.82%)
Mutual labels:  spacy
spacy-sentence-bert
Sentence transformers models for SpaCy
Stars: ✭ 88 (+29.41%)
Mutual labels:  spacy
spacy-langdetect
A fully customisable language detection pipeline for spaCy
Stars: ✭ 86 (+26.47%)
Mutual labels:  spacy
replaCy
spaCy match and replace, maintaining conjugation
Stars: ✭ 29 (-57.35%)
Mutual labels:  spacy

🤘 Lemmy

Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪. It comes ready for use. The Danish model is trained on Dansk Sprognævn's (DSN) word list (‘fuldformliste’) and the Danish Universal Dependencies. The Swedish model is trained on the SALDO's morphology dataset and the Swedish Universal Dependencies (Talbanken). Lemmy also supports training on your own dataset.

The models included in Lemmy were evaluated on the respective Universal Dependencies dev datasets. The Danish model scored > 99% accuracy, while the Swedish model scored > 97%. All reported scores were obtained when supplying Lemmy with POS tags.

You can use Lemmy as a spaCy extension, more specifcally a spaCy pipeline component. This is highly recommended and makes the lemmas easily accessible from the spaCy tokens. Lemmy makes use of POS tags to predict the lemmas. When wired up to the spaCy pipeline, Lemmy has the benefit of using spaCy’s builtin POS tagger.

Lemmy can also by used without spaCy, as a standalone lemmatizer. In that case, you will have to provide the POS tags. Alternatively, you can use Lemmy without POS tags, though most likely the accuracy will suffer. Currrently, only the Danish Lemmy model comes with a model trained for use without POS tags. That is, if you want to use Lemmy on Swedish text without POS tags, you must train your own Lemmy model.

Lemmy is heavily inspired by the CST Lemmatizer for Danish.

Install

pip install lemmy

Basic Usage Without POS tags

import lemmy

# Create an instance of the standalone lemmatizer.
lemmatizer = lemmy.load("da")

# Find lemma for the word 'akvariernes'. First argument is an empty POS tag.
lemmatizer.lemmatize("", "akvariernes")

Basic Usage With POS tags

import lemmy

# Create an instance of the standalone lemmatizer.
# Replace 'da' with 'sv' for the Swedish lemmatizer.
lemmatizer = lemmy.load("da")

# Find lemma for the word 'akvariernes'. First argument is the user-provided POS tag.
lemmatizer.lemmatize("NOUN", "akvariernes")

Usage with spaCy Model

import da_custom_model as da # replace da_custom_model with name of your spaCy model
import lemmy.pipe
nlp = da.load()

# Create an instance of Lemmy's pipeline component for spaCy.
# Replace 'da' with 'sv' for the Swedish lemmatizer.
pipe = lemmy.pipe.load('da')

# Add the component to the spaCy pipeline.
nlp.add_pipe(pipe, after='tagger')

# Lemmas can now be accessed using the `._.lemmas` attribute on the tokens.
nlp("akvariernes")[0]._.lemmas

Training

The notebooks folder contains examples showing how to train your own model using Lemmy.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].