All Projects → swen128 → twitter-text-python

swen128 / twitter-text-python

Licence: MIT license
Twitter Text Libraries for Python

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to twitter-text-python

Libasciidoc
A Golang library for processing Asciidoc files.
Stars: ✭ 129 (+486.36%)
Mutual labels:  text-processing
Jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Stars: ✭ 157 (+613.64%)
Mutual labels:  text-processing
Pyarabic
pyarabic
Stars: ✭ 183 (+731.82%)
Mutual labels:  text-processing
Prenlp
Preprocessing Library for Natural Language Processing
Stars: ✭ 130 (+490.91%)
Mutual labels:  text-processing
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+572.73%)
Mutual labels:  text-processing
Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (+659.09%)
Mutual labels:  text-processing
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+463.64%)
Mutual labels:  text-processing
Stringi
THE String Processing Package for R (with ICU)
Stars: ✭ 204 (+827.27%)
Mutual labels:  text-processing
Japanese.js
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Stars: ✭ 150 (+581.82%)
Mutual labels:  text-processing
Sd
Intuitive find & replace CLI (sed alternative)
Stars: ✭ 2,755 (+12422.73%)
Mutual labels:  text-processing
Tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Stars: ✭ 135 (+513.64%)
Mutual labels:  text-processing
Browsecloud
A web app to create and browse text visualizations for automated customer listening.
Stars: ✭ 143 (+550%)
Mutual labels:  text-processing
Text Detector
Tool which allow you to detect and translate text.
Stars: ✭ 173 (+686.36%)
Mutual labels:  text-processing
Konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Stars: ✭ 130 (+490.91%)
Mutual labels:  text-processing
Rust Unic
UNIC: Unicode and Internationalization Crates for Rust
Stars: ✭ 189 (+759.09%)
Mutual labels:  text-processing
Padatious
A neural network intent parser
Stars: ✭ 124 (+463.64%)
Mutual labels:  text-processing
Nlpre
Python library for Natural Language Preprocessing (NLPre)
Stars: ✭ 158 (+618.18%)
Mutual labels:  text-processing
PCF-Controls
Repos of Powerapps Component Framework (PCF) Controls
Stars: ✭ 33 (+50%)
Mutual labels:  character-counter
Regex Automata
A low level regular expression library that uses deterministic finite automata.
Stars: ✭ 203 (+822.73%)
Mutual labels:  text-processing
Fastnlp
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+10995.45%)
Mutual labels:  text-processing

twitter-text-python

Documentation Status https://travis-ci.com/swen128/twitter-text-python.svg?branch=master

This is a Python port of the twitter/twitter-text libraries, fully compliant with the official conformance test suite.

Features

This library calculates length of a tweet message according to the documentation from Twitter Developers, so that you can validate the tweet without calling the Web API at all. Although counting characters might seem an easy task, in actual fact it is very complicated, especially when the text contains CJK characters, URLs, or emojis.

The original twitter-text libraries have hit-highlighting and auto-linking features as well, however they are not yet supported by this Python port.

Usage

Installation

$ pip install twitter-text-parser

Examples

See the API reference for more details.

from twitter_text import parse_tweet, extract_emojis_with_indices, extract_urls_with_indices

text = 'english text 日本語 😷 https://example.com'

assert parse_tweet(text).asdict() == {
    'weightedLength': 46,
    'valid': True,
    'permillage': 164,
    'validRangeStart': 0,
    'validRangeEnd': 38,
    'displayRangeStart': 0,
    'displayRangeEnd': 38
}

assert extract_urls_with_indices(text) == [{
    'url': 'https://example.com',
    'indices': [19, 38]
}]

assert extract_emojis_with_indices(text) == [{
    'emoji': '😷',
    'indices': [17, 18]
}]

Related Links

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].