pudo / Normality
Licence: mit
A tiny library for Python text normalisation. Useful for ad-hoc text processing.
Stars: ✭ 94
Programming Languages
python
139335 projects - #7 most used programming language
Labels
Projects that are alternatives of or similar to Normality
Demoji
Accurately find/replace/remove emojis in text strings
Stars: ✭ 82 (-12.77%)
Mutual labels: unicode
Sinais
🔣 Desenvolvimento passo a passo do exemplo `sinais` em Go.
Stars: ✭ 59 (-37.23%)
Mutual labels: unicode
Locale2
💪 Try as hard as possible to detect the client's language tag ("locale") in node or the browser. Browserify and Webpack friendly!
Stars: ✭ 65 (-30.85%)
Mutual labels: unicode
Sheenbidi
A sophisticated implementation of Unicode Bidirectional Algorithm
Stars: ✭ 52 (-44.68%)
Mutual labels: unicode
Open Arrow
Open Arrow is an open-source font that contains 112 arrow symbols from U+2190 to U+21ff
Stars: ✭ 89 (-5.32%)
Mutual labels: unicode
Yawysiwygee
Yet another what-you-see-is-what-you-get equation editor
Stars: ✭ 60 (-36.17%)
Mutual labels: unicode
Unicode
Unicode normalization library. (Mirror of Yoshida-san's code base to maintain the RubyGem.)
Stars: ✭ 81 (-13.83%)
Mutual labels: unicode
Python Myanmar
Python library for Myanmar text processing
Stars: ✭ 53 (-43.62%)
Mutual labels: unicode
Glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
Stars: ✭ 1,099 (+1069.15%)
Mutual labels: unicode
Awesome Emoji Picker
Add-on/WebExtension that provides a modern emoji picker that you can use to find and copy/insert emoji into the active web page.
Stars: ✭ 54 (-42.55%)
Mutual labels: unicode
U2c
Unicode To Chinese -- U2C : A burpsuite Extender That Convert Unicode To Chinese 【Unicode编码转中文的burp插件】
Stars: ✭ 83 (-11.7%)
Mutual labels: unicode
Weird Json
A collection of strange encoded JSONs. For connoisseurs.
Stars: ✭ 53 (-43.62%)
Mutual labels: unicode
Emoji Regex
A regular expression to match all Emoji-only symbols as per the Unicode Standard.
Stars: ✭ 1,134 (+1106.38%)
Mutual labels: unicode
Ofxfontstash
Easy (and fast) unicode string rendering addon for OpenFrameworks. FontStash is made by Andreas Krinke and Mikko Mononen
Stars: ✭ 84 (-10.64%)
Mutual labels: unicode
normality
Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characters, such as diacritics, punctuation etc. This is useful as a preparation to further text analysis.
WARNING: This library works much better when used in combination
with pyicu
, a Python binding for the International Components for
Unicode C library. ICU provides much better text transliteration than
the default text-unidecode
.
Example
# coding: utf-8
from normality import normalize, slugify, collapse_spaces
text = normalize('Nie wieder "Grüne Süppchen" kochen!')
assert text == 'nie wieder grune suppchen kochen'
slug = slugify('My first blog post!')
assert slug == 'my-first-blog-post'
text = 'this \n\n\r\nhas\tlots of \nodd spacing.'
assert collapse_spaces(text) == 'this has lots of odd spacing.'
License
normality
is open source, licensed under a standard MIT license
(included in this repository as LICENSE
).
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].