All Projects β†’ explosion β†’ spacymoji

explosion / spacymoji

Licence: MIT license
πŸ’™ Emoji handling and meta data for spaCy with custom extension attributes

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to spacymoji

spacy conll
Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doc and its sentences and tokens. Can also be used as a command-line tool.
Stars: ✭ 60 (-65.52%)
Mutual labels:  spacy, spacy-pipeline, spacy-extension
Neuralcoref
✨Fast Coreference Resolution in spaCy with Neural Networks
Stars: ✭ 2,453 (+1309.77%)
Mutual labels:  spacy, spacy-pipeline, spacy-extension
Spacymoji
πŸ’™ Emoji handling and meta data for spaCy with custom extension attributes
Stars: ✭ 151 (-13.22%)
Mutual labels:  emoji, spacy, emojis
AllGithubEmojis
A list of all supported github emojis updated weekly. https://jzeferino.github.io/AllGithubEmojis/
Stars: ✭ 82 (-52.87%)
Mutual labels:  emoji, emoji-unicode, emojis
spacy-iwnlp
German lemmatization with IWNLP as extension for spaCy
Stars: ✭ 22 (-87.36%)
Mutual labels:  spacy, spacy-pipeline, spacy-extension
extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Stars: ✭ 47 (-72.99%)
Mutual labels:  spacy, spacy-pipeline, spacy-extension
slack-emoji-for-techies
100s of Slack emoji, many tech-related
Stars: ✭ 123 (-29.31%)
Mutual labels:  emoji, emojis
Supernova Emoji
library to implement and render emojis For Android
Stars: ✭ 334 (+91.95%)
Mutual labels:  emoji, emojis
Styleguide Git Commit Message
/sBin/StyleGuide/Git/CommitMessage
Stars: ✭ 934 (+436.78%)
Mutual labels:  emoji, emojis
Emojipacks
CLI to bulk upload emojis to your Slack
Stars: ✭ 1,275 (+632.76%)
Mutual labels:  emoji, emojis
EmojiReader
A simple tool to recognize Emoji in string. (JavaScript & Java)
Stars: ✭ 61 (-64.94%)
Mutual labels:  emoji, emojis
Awesome Emoji Picker
Add-on/WebExtension that provides a modern emoji picker that you can use to find and copy/insert emoji into the active web page.
Stars: ✭ 54 (-68.97%)
Mutual labels:  emoji, emojis
spacy-langdetect
A fully customisable language detection pipeline for spaCy
Stars: ✭ 86 (-50.57%)
Mutual labels:  spacy, spacy-extension
latexemoji
Latex package to include emoji in Latex document
Stars: ✭ 17 (-90.23%)
Mutual labels:  emoji, emojis
emoji-extractor-plus
Extract emojis from Apple font in PNG format
Stars: ✭ 42 (-75.86%)
Mutual labels:  emoji, emojis
Oji
(β—•β€Ώβ—•) Text Emoticons Maker
Stars: ✭ 668 (+283.91%)
Mutual labels:  emoji, emojis
DreamBig
☁🌝☁ 3D emoji drawing iPad app with ARKit and the Apple Pencil ☁🌝☁
Stars: ✭ 24 (-86.21%)
Mutual labels:  emoji, emojis
React Native Animated Emoji
Animated Floating Reactions like Facebook πŸ‘
Stars: ✭ 82 (-52.87%)
Mutual labels:  emoji, emojis
Whatsbook
Create books from WhatsApp group chats with Python and LaTeX
Stars: ✭ 147 (-15.52%)
Mutual labels:  emoji, emojis
ermoji
πŸ€·β€β™‚οΈ RStudio Addin to Search and Copy Emoji
Stars: ✭ 26 (-85.06%)
Mutual labels:  emoji, emoji-unicode

spacymoji: emoji for spaCy

spaCy extension and pipeline component for adding emoji meta data to Doc objects. Detects emoji consisting of one or more unicode characters, and can optionally merge multi-char emoji (combined pictures, emoji with skin tone modifiers) into one token. Human-readable emoji descriptions are added as a custom attribute, and an optional lookup table can be provided for your own descriptions. The extension sets the custom Doc, Token and Span attributes ._.is_emoji, ._.emoji_desc, ._.has_emoji and ._.emoji. You can read more about custom pipeline components and extension attributes here.

Emoji are matched using spaCy's PhraseMatcher, and looked up in the data table provided by the emoji package.

Azure Pipelines Current Release Version pypi Version

⏳ Installation

spacymoji requires spacy v3.0.0 or higher. For spaCy v2.x, instally spacymoji==2.0.0.

pip install spacymoji

☝️ Usage

Import the component and add it anywhere in your pipeline using the string name of the "emoji" component factory:

import spacy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("emoji", first=True)
doc = nlp("This is a test 😻 πŸ‘πŸΏ")
assert doc._.has_emoji is True
assert doc[2:5]._.has_emoji is True
assert doc[0]._.is_emoji is False
assert doc[4]._.is_emoji is True
assert doc[5]._.emoji_desc == "thumbs up dark skin tone"
assert len(doc._.emoji) == 2
assert doc._.emoji[1] == ("πŸ‘πŸΏ", 5, "thumbs up dark skin tone")

spacymoji only cares about the token text, so you can use it on a blank Language instance (it should work for all available languages!), or in a pipeline with a loaded pipeline. If your pipeline includes a tagger, parser and entity recognizer, make sure to add the emoji component as first=True, so the spans are merged right after tokenization, and before the document is parsed. If your text contains a lot of emoji, this might even give you a nice boost in parser accuracy.

Available attributes

The extension sets attributes on the Doc, Span and Token. You can change the attribute names (and other parameters of the Emoji component) by passing them via the config parameter in the nlp.add_pipe(...) method. For more details on custom components and attributes, see the processing pipelines documentation.

Attribute Type Description
Token._.is_emoji bool Whether the token is an emoji.
Token._.emoji_desc str A human-readable description of the emoji.
Doc._.has_emoji bool Whether the document contains emoji.
Doc._.emoji List[Tuple[str, int, str]] (emoji, index, description) tuples of the document's emoji.
Span._.has_emoji bool  Whether the span contains emoji.
Span._.emoji List[Tuple[str, int, str]] (emoji, index, description) tuples of the span's emoji.

Settings

You can configure the emoji factory by setting any of the following parameters in the config dictionary:

Setting Type Description
attrs Tuple[str, str, str, str] Attributes to set on the ._ property. Defaults to ('has_emoji', 'is_emoji', 'emoji_desc', 'emoji').
pattern_id str ID of match pattern, defaults to 'EMOJI'. Can be changed to avoid ID conflicts.
merge_spans bool Merge spans containing multi-character emoji, defaults to True. Will only merge combined emoji resulting in one icon, not sequences.
lookup Dict[str, str] Optional lookup table that maps emoji strings to custom descriptions, e.g. translations or other annotations.
emoji_config = {"attrs": ("has_e", "is_e", "e_desc", "e"), lookup={"πŸ‘¨β€πŸŽ€": "David Bowie"})
nlp.add_pipe(emoji, first=True, config=emoji_config)
doc = nlp("We can be πŸ‘¨β€πŸŽ€ heroes")
assert doc[3]._.is_e
assert doc[3]._.e_desc == "David Bowie"

If you're training a pipeline, you can define the component config in your config.cfg:

[nlp]
pipeline = ["emoji", "ner"]
# ...

[components.emoji]
factory = "emoji"
merge_spans = false
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].