All Projects β†’ ines β†’ Spacymoji

ines / Spacymoji

Licence: mit
πŸ’™ Emoji handling and meta data for spaCy with custom extension attributes

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Spacymoji

spacymoji
πŸ’™ Emoji handling and meta data for spaCy with custom extension attributes
Stars: ✭ 174 (+15.23%)
Mutual labels:  emoji, spacy, emojis
Tageditor
πŸ–TagEditor - Annotation tool for spaCy
Stars: ✭ 92 (-39.07%)
Mutual labels:  natural-language-processing, spacy
Emojipacks
CLI to bulk upload emojis to your Slack
Stars: ✭ 1,275 (+744.37%)
Mutual labels:  emoji, emojis
Pytextrank
Python implementation of TextRank for phrase extraction and summarization of text documents
Stars: ✭ 1,675 (+1009.27%)
Mutual labels:  natural-language-processing, spacy
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (-52.32%)
Mutual labels:  natural-language-processing, spacy
Spacy Graphql
πŸ€Ήβ€β™€οΈ Query spaCy's linguistic annotations using GraphQL
Stars: ✭ 81 (-46.36%)
Mutual labels:  natural-language-processing, spacy
Jupyterlab Prodigy
🧬 A JupyterLab extension for annotating data with Prodigy
Stars: ✭ 97 (-35.76%)
Mutual labels:  natural-language-processing, spacy
Spacy Lookups Data
πŸ“‚ Additional lookup tables and data resources for spaCy
Stars: ✭ 48 (-68.21%)
Mutual labels:  natural-language-processing, spacy
Spacy Dev Resources
πŸ’« Scripts, tools and resources for developing spaCy
Stars: ✭ 123 (-18.54%)
Mutual labels:  natural-language-processing, spacy
Whatsbook
Create books from WhatsApp group chats with Python and LaTeX
Stars: ✭ 147 (-2.65%)
Mutual labels:  emoji, emojis
Textacy
NLP, before and after spaCy
Stars: ✭ 1,849 (+1124.5%)
Mutual labels:  natural-language-processing, spacy
Sense2vec
πŸ¦† Contextually-keyed word vectors
Stars: ✭ 1,184 (+684.11%)
Mutual labels:  natural-language-processing, spacy
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+649.67%)
Mutual labels:  natural-language-processing, spacy
React Native Animated Emoji
Animated Floating Reactions like Facebook πŸ‘
Stars: ✭ 82 (-45.7%)
Mutual labels:  emoji, emojis
Awesome Emoji Picker
Add-on/WebExtension that provides a modern emoji picker that you can use to find and copy/insert emoji into the active web page.
Stars: ✭ 54 (-64.24%)
Mutual labels:  emoji, emojis
Emojica
A Swift framework for using custom emoji in strings.
Stars: ✭ 93 (-38.41%)
Mutual labels:  emoji, emojis
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+1137.09%)
Mutual labels:  natural-language-processing, spacy
Spacy Transformers
πŸ›Έ Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Stars: ✭ 919 (+508.61%)
Mutual labels:  natural-language-processing, spacy
Styleguide Git Commit Message
/sBin/StyleGuide/Git/CommitMessage
Stars: ✭ 934 (+518.54%)
Mutual labels:  emoji, emojis
Spacy Js
πŸŽ€ JavaScript API for spaCy with Python REST API
Stars: ✭ 123 (-18.54%)
Mutual labels:  natural-language-processing, spacy

spacymoji: emoji for spaCy


spaCy v2.0 <https://spacy.io/usage/v2>_ extension and pipeline component for adding emoji meta data to Doc objects. Detects emoji consisting of one or more unicode characters, and can optionally merge multi-char emoji (combined pictures, emoji with skin tone modifiers) into one token. Human-readable emoji descriptions are added as a custom attribute, and an optional lookup table can be provided for your own descriptions. The extension sets the custom Doc, Token and Span attributes ._.is_emoji, ._.emoji_desc, ._.has_emoji and ._.emoji. You can read more about custom pipeline components and extension attributes here <https://spacy.io/usage/processing-pipelines>_.

Emoji are matched using spaCy's PhraseMatcher, and looked up in the data table provided by the "emoji" package <https://github.com/carpedm20/emoji>_.

.. image:: https://img.shields.io/github/release/ines/spacymoji.svg?style=flat-square :target: https://github.com/ines/spacymoji/releases :alt: Current Release Version

.. image:: https://img.shields.io/pypi/v/spacymoji.svg?style=flat-square :target: https://pypi.python.org/pypi/spacymoji :alt: pypi Version

⏳ Installation

spacymoji requires spacy v2.0.0 or higher.

.. code:: bash

pip install spacymoji

☝️ Usage

Import the component and initialise it with the shared nlp object (i.e. an instance of Language), which is used to initialise the PhraseMatcher with the shared vocab, and create the match patterns. Then add the component anywhere in your pipeline.

.. code:: python

import spacy
from spacymoji import Emoji

nlp = spacy.load('en')
emoji = Emoji(nlp)
nlp.add_pipe(emoji, first=True)

doc = nlp(u"This is a test 😻 πŸ‘πŸΏ")
assert doc._.has_emoji == True
assert doc[2:5]._.has_emoji == True
assert doc[0]._.is_emoji == False
assert doc[4]._.is_emoji == True
assert doc[5]._.emoji_desc == u'thumbs up dark skin tone'
assert len(doc._.emoji) == 2
assert doc._.emoji[1] == (u'πŸ‘πŸΏ', 5, u'thumbs up dark skin tone')

spacymoji only cares about the token text, so you can use it on a blank Language instance (it should work for all available languages <https://spacy.io/usage/models#languages>_!), or in a pipeline with a loaded model. If you're loading a model and your pipeline includes a tagger, parser and entity recognizer, make sure to add the emoji component as first=True, so the spans are merged right after tokenization, and before the document is parsed. If your text contains a lot of emoji, this might even give you a nice boost in parser accuracy.

Available attributes

The extension sets attributes on the Doc, Span and Token. You can change the attribute names on initialisation of the extension. For more details on custom components and attributes, see the processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>_.

====================== ======= === Token._.is_emoji bool Whether the token is an emoji. Token._.emoji_desc unicode A human-readable description of the emoji. Doc._.has_emoji bool Whether the document contains emoji. Doc._.emoji list (emoji, index, description) tuples of the document's emoji. Span._.has_emoji bool Whether the span contains emoji. Span._.emoji list (emoji, index, description) tuples of the span's emoji. ====================== ======= ===

Settings

On initialisation of Emoji, you can define the following settings:

=============== ============ === nlp Language The shared nlp object. Used to initialise the matcher with the shared Vocab, and create Doc match patterns. attrs tuple Attributes to set on the ._ property. Defaults to ('has_emoji', 'is_emoji', 'emoji_desc', 'emoji'). pattern_id unicode ID of match pattern, defaults to 'EMOJI'. Can be changed to avoid ID conflicts. merge_spans bool Merge spans containing multi-character emoji, defaults to True. Will only merge combined emoji resulting in one icon, not sequences. lookup dict Optional lookup table that maps emoji unicode strings to custom descriptions, e.g. translations or other annotations. =============== ============ ===

.. code:: python

emoji = Emoji(nlp, attrs=('has_e', 'is_e', 'e_desc', 'e'), lookup={u'πŸ‘¨β€πŸŽ€': u'David Bowie'})
nlp.add_pipe(emoji)
doc = nlp(u"We can be πŸ‘¨β€πŸŽ€ heroes")
assert doc[3]._.is_e
assert doc[3]._.e_desc == u'David Bowie'

πŸ›£ Roadmap

This extension is still experimental, but here are some features that might be cool to add in the future:

  • Add match patterns and attributes for emoji shortcodes, e.g. πŸ‘. The shortcodes could optionally be merged into one token, and receive a NORM attribute with the unicode emoji. The NORM is used as a feature for training, so πŸ‘ and πŸ‘ would automatically receive similar representations.

  • Add support for the Unicode Emoji Annotations project. The JavaScript package <https://github.com/dematerializer/unicode-emoji-annotations>_ also comes with pre-compiled JSON data <https://github.com/dematerializer/unicode-emoji-annotations/tree/master/res>_, including both standardised and community-contributed annotations in English and German.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].