All Projects → recognai → Spacy Wordnet

recognai / Spacy Wordnet

Licence: mit
spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Spacy Wordnet

Go spider
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+1018.59%)
Mutual labels:  pipeline
Spacy Course
👩‍🏫 Advanced NLP with spaCy: A free online course
Stars: ✭ 1,920 (+1130.77%)
Mutual labels:  spacy
Pipeline Live
Pipeline Extension for Live Trading
Stars: ✭ 154 (-1.28%)
Mutual labels:  pipeline
Demo Jenkins Config As Code
Demo of Jenkins Configuration-As-Code with Docker and Groovy Hook Scripts
Stars: ✭ 143 (-8.33%)
Mutual labels:  pipeline
Wheelwright
🎡 Automated build repo for Python wheels and source packages
Stars: ✭ 148 (-5.13%)
Mutual labels:  spacy
Motorway
Cloud ready pure-python streaming data pipeline library
Stars: ✭ 150 (-3.85%)
Mutual labels:  pipeline
Jenkins Pipeline Library
wcm.io Jenkins Pipeline Library for CI/CD
Stars: ✭ 134 (-14.1%)
Mutual labels:  pipeline
Ects
Elastic Crontab System 简单易用的分布式定时任务管理系统
Stars: ✭ 156 (+0%)
Mutual labels:  pipeline
Rangeless
c++ LINQ -like library of higher-order functions for data manipulation
Stars: ✭ 148 (-5.13%)
Mutual labels:  pipeline
Metl
mito ETL tool
Stars: ✭ 153 (-1.92%)
Mutual labels:  pipeline
Bodywork Core
Deploy machine learning projects developed in Python, to Kubernetes. Accelerated MLOps 🚀
Stars: ✭ 145 (-7.05%)
Mutual labels:  pipeline
Pipcook
Machine learning platform for Web developers
Stars: ✭ 2,186 (+1301.28%)
Mutual labels:  pipeline
Spacymoji
💙 Emoji handling and meta data for spaCy with custom extension attributes
Stars: ✭ 151 (-3.21%)
Mutual labels:  spacy
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+1097.44%)
Mutual labels:  spacy
Open Solution Toxic Comments
Open solution to the Toxic Comment Classification Challenge
Stars: ✭ 154 (-1.28%)
Mutual labels:  pipeline
Rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Stars: ✭ 13,219 (+8373.72%)
Mutual labels:  spacy
Pyfunctional
Python library for creating data pipelines with chain functional programming
Stars: ✭ 1,943 (+1145.51%)
Mutual labels:  pipeline
Batchflow
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (+0%)
Mutual labels:  pipeline
Fluids
Fluid dynamics component of Chemical Engineering Design Library (ChEDL)
Stars: ✭ 154 (-1.28%)
Mutual labels:  pipeline
Pipelinedashboard
Dashboard for your Deployment pipeline https://dashboardhub.io/
Stars: ✭ 151 (-3.21%)
Mutual labels:  pipeline

spaCy WordNet

spaCy Wordnet is a simple custom component for using WordNet, MultiWordnet and WordNet domains with spaCy.

The component combines the NLTK wordnet interface with WordNet domains to allow users to:

  • Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word bank.
  • Get and filter synsets by domain. For example, getting synonyms of the verb withdraw in the financial domain.

Getting started

The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:

Prerequisites

  • Python 3.X
  • spaCy

You also need to install the following NLTK wordnet data:

python -m nltk.downloader wordnet
python -m nltk.downloader omw

Install

pip install spacy-wordnet

Supported languages

We currently support Spanish, English and Portuguese, but we welcome contributions in order to add and test new languages supported by spaCy and NLTK.

Usage

English example

import spacy

from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

# Load an spacy model (supported models are "es", "en" and "pt") 
nlp = spacy.load('en')
nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')
token = nlp('prices')[0]

# wordnet object link spacy token with nltk wordnet interface by giving acces to
# synsets and lemmas 
token._.wordnet.synsets()
token._.wordnet.lemmas()

# And automatically tags with wordnet domains
token._.wordnet.wordnet_domains()

spaCy WordNet lets you find synonyms by domain of interest for example economy

economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp('I want to withdraw 5,000 euros')

# For each token in the sentence
for token in sentence:
    # We get those synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
    if not synsets:
        enriched_sentence.append(token.text)
    else:
        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names()]
        # If we found a synset in the economy domains
        # we get the variants and add them to the enriched sentence
        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))

# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> I (need|want|require) to (draw|withdraw|draw_off|take_out) 5,000 euros
    

Portuguese example

import spacy

from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

# Load an spacy model (you need to download the spacy pt model) 
nlp = spacy.load('pt')
nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')
text = "Eu quero retirar 5.000 euros"
economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp(text)

# For each token in the sentence
for token in sentence:
    # We get those synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
    if not synsets:
        enriched_sentence.append(token.text)
    else:
        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names('por')]
        # If we found a synset in the economy domains
        # we get the variants and add them to the enriched sentence
        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))

# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> Eu (querer|desejar|esperar) retirar 5.000 euros
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].