Voca_rs is the ultimate Rust string library inspired by Voca.js, string.py and Inflector, implemented as independent functions and on Foreign Types (String and str).

Stars: ✭ 167 (+138.57%)

Mutual labels: unicode, utf-8

Tomlplusplus

Header-only TOML config file parser and serializer for C++17 (and later!).

Stars: ✭ 403 (+475.71%)

Mutual labels: unicode, utf-8

Lehar

Visualize data using relative ordering

Stars: ✭ 81 (+15.71%)

Mutual labels: unicode, ascii

Stringz

💯 Super fast unicode-aware string manipulation Javascript library

Stars: ✭ 181 (+158.57%)

Mutual labels: unicode, utf-8

Urlify

A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.

Stars: ✭ 633 (+804.29%)

Mutual labels: unicode, ascii

jurl

Fast and simple URL parsing for Java, with UTF-8 and path resolving support

Stars: ✭ 84 (+20%)

Mutual labels: unicode, utf-8

Unicopy

Unicode command-line codepoint dumper

Stars: ✭ 16 (-77.14%)

Mutual labels: unicode, utf-8

Cowsay Files

A collection of additional/alternative cowsay files.

Stars: ✭ 216 (+208.57%)

Mutual labels: unicode, ascii

ocreval

Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support

Stars: ✭ 48 (-31.43%)

Mutual labels: unicode, utf-8

table2ascii

Python library for converting lists to fancy ASCII tables for displaying in the terminal and on Discord

Stars: ✭ 31 (-55.71%)

Mutual labels: unicode, ascii

Encoding.js

Convert or detect character encoding in JavaScript

Stars: ✭ 338 (+382.86%)

Mutual labels: unicode, utf-8

Bstr

A string type for Rust that is not required to be valid UTF-8.

Stars: ✭ 348 (+397.14%)

Mutual labels: unicode, utf-8

Diagon

Interactive ASCII art diagram generators. 🌟

Stars: ✭ 189 (+170%)

Mutual labels: unicode, ascii

View All Similar Projects ➔

THE PROJECT IS ARCHIVED

Forks: https://github.com/orsinium/forks

Homoglyphs

Homoglyphs -- python library for getting homoglyphs and converting to ASCII.

Features

It's smarter version of confusable_homoglyphs:

Autodect or manual choosing category (aliases from ISO 15924).
Auto or manual load only needed alphabets in memory.
Converting to ASCII.
More configurable.
More stable.

Installation

sudo pip install homoglyphs

Usage

Best way to explain something is show how it works. So, let's have a look on the real usage.

Importing:

import homoglyphs as hg

Languages

#detect
hg.Languages.detect('w')
# {'pl', 'da', 'nl', 'fi', 'cz', 'sr', 'pt', 'it', 'en', 'es', 'sk', 'de', 'fr', 'ro'}
hg.Languages.detect('т')
# {'mk', 'ru', 'be', 'bg', 'sr'}
hg.Languages.detect('.')
# set()

# get alphabet for languages
hg.Languages.get_alphabet(['ru'])
# {'в', 'Ё', 'К', 'Т', ..., 'Р', 'З', 'Э'}

# get all languages
hg.Languages.get_all()
# {'nl', 'lt', ..., 'de', 'mk'}

Homoglyphs

Get homoglyphs:

# get homoglyphs (latin alphabet initialized by default)
hg.Homoglyphs().get_combinations('q')
# ['q', '𝐪', '𝑞', '𝒒', '𝓆', '𝓺', '𝔮', '𝕢', '𝖖', '𝗊', '𝗾', '𝘲', '𝙦', '𝚚']

Alphabet loading:

# load alphabet on init by categories
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC'))  # alphabet loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы', 'ꭇы', 'ꭈы', '𝐫ы', '𝑟ы', '𝒓ы', '𝓇ы', '𝓻ы', '𝔯ы', '𝕣ы', '𝖗ы', '𝗋ы', '𝗿ы', '𝘳ы', '𝙧ы', '𝚛ы']

# load alphabet on init by languages
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'})  # alphabet will be loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы']

# manual set alphabet on init      # eng rus
homoglyphs = hg.Homoglyphs(alphabet='abc абс')
homoglyphs.get_combinations('с')
# ['c', 'с']

# load alphabet on demand
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# ^ alphabet will be loaded here for "en" language
homoglyphs.get_combinations('гы')
# ^ alphabet will be loaded here for "ru" language
# ['rы', 'гы']

You can combine categories, languages, alphabet and any strategies as you want. The strategies specify how to handle any characters not already loaded:

STRATEGY_LOAD: load category for this character
STRATEGY_IGNORE: add character to result
STRATEGY_REMOVE: remove character from result

Converting glyphs to ASCII chars

homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)

# convert
homoglyphs.to_ascii('ТЕСТ')
# ['TECT']
homoglyphs.to_ascii('ХР123.')  # this is cyrillic "х" and "р"
# ['XP123.', 'XPI23.', 'XPl23.']

# string with chars which can't be converted by default will be ignored
homoglyphs.to_ascii('лол')
# []

# you can set strategy for removing not converted non-ASCII chars from result
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('лол')
# ['o']

# also you can set up range of allowed char codes for ascii (0-128 by default):
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
    ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ХР123.')
# ['l']
homoglyphs.to_ascii('хр123.')
# ['xpl']

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

life4 / homoglyphs

Programming Languages

Labels

Projects that are alternatives of or similar to homoglyphs

THE PROJECT IS ARCHIVED

Homoglyphs

Features

Installation

Usage

Languages

Categories

Homoglyphs

Converting glyphs to ASCII chars