All Projects → pudo → Normality

pudo / Normality

Licence: mit
A tiny library for Python text normalisation. Useful for ad-hoc text processing.

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Normality

Keytokey
Rust keyboard firmware library
Stars: ✭ 54 (-42.55%)
Mutual labels:  unicode
Knayi Myscript
Myanmar Language Script Library
Stars: ✭ 63 (-32.98%)
Mutual labels:  unicode
Demoji
Accurately find/replace/remove emojis in text strings
Stars: ✭ 82 (-12.77%)
Mutual labels:  unicode
Mdetect
Stars: ✭ 54 (-42.55%)
Mutual labels:  unicode
Sinais
🔣 Desenvolvimento passo a passo do exemplo `sinais` em Go.
Stars: ✭ 59 (-37.23%)
Mutual labels:  unicode
Locale2
💪 Try as hard as possible to detect the client's language tag ("locale") in node or the browser. Browserify and Webpack friendly!
Stars: ✭ 65 (-30.85%)
Mutual labels:  unicode
Sheenbidi
A sophisticated implementation of Unicode Bidirectional Algorithm
Stars: ✭ 52 (-44.68%)
Mutual labels:  unicode
Open Arrow
Open Arrow is an open-source font that contains 112 arrow symbols from U+2190 to U+21ff
Stars: ✭ 89 (-5.32%)
Mutual labels:  unicode
Yawysiwygee
Yet another what-you-see-is-what-you-get equation editor
Stars: ✭ 60 (-36.17%)
Mutual labels:  unicode
Unicode
Unicode normalization library. (Mirror of Yoshida-san's code base to maintain the RubyGem.)
Stars: ✭ 81 (-13.83%)
Mutual labels:  unicode
Python Myanmar
Python library for Myanmar text processing
Stars: ✭ 53 (-43.62%)
Mutual labels:  unicode
Glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
Stars: ✭ 1,099 (+1069.15%)
Mutual labels:  unicode
Ucdn
Unicode Database and Normalization
Stars: ✭ 78 (-17.02%)
Mutual labels:  unicode
Awesome Emoji Picker
Add-on/WebExtension that provides a modern emoji picker that you can use to find and copy/insert emoji into the active web page.
Stars: ✭ 54 (-42.55%)
Mutual labels:  unicode
U2c
Unicode To Chinese -- U2C : A burpsuite Extender That Convert Unicode To Chinese 【Unicode编码转中文的burp插件】
Stars: ✭ 83 (-11.7%)
Mutual labels:  unicode
Weird Json
A collection of strange encoded JSONs. For connoisseurs.
Stars: ✭ 53 (-43.62%)
Mutual labels:  unicode
Emoji Regex
A regular expression to match all Emoji-only symbols as per the Unicode Standard.
Stars: ✭ 1,134 (+1106.38%)
Mutual labels:  unicode
String Extra
Unicode/String support for Twig
Stars: ✭ 92 (-2.13%)
Mutual labels:  unicode
Ofxfontstash
Easy (and fast) unicode string rendering addon for OpenFrameworks. FontStash is made by Andreas Krinke and Mikko Mononen
Stars: ✭ 84 (-10.64%)
Mutual labels:  unicode
Lehar
Visualize data using relative ordering
Stars: ✭ 81 (-13.83%)
Mutual labels:  unicode

normality

build

Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characters, such as diacritics, punctuation etc. This is useful as a preparation to further text analysis.

WARNING: This library works much better when used in combination with pyicu, a Python binding for the International Components for Unicode C library. ICU provides much better text transliteration than the default text-unidecode.

Example

# coding: utf-8
from normality import normalize, slugify, collapse_spaces

text = normalize('Nie wieder "Grüne Süppchen" kochen!')
assert text == 'nie wieder grune suppchen kochen'

slug = slugify('My first blog post!')
assert slug == 'my-first-blog-post'

text = 'this \n\n\r\nhas\tlots of \nodd spacing.'
assert collapse_spaces(text) == 'this has lots of odd spacing.'

License

normality is open source, licensed under a standard MIT license (included in this repository as LICENSE).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].