All Projects โ†’ life4 โ†’ homoglyphs

life4 / homoglyphs

Licence: MIT license
Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to homoglyphs

Unibits
Visualize different Unicode encodings in the terminal
Stars: โœญ 125 (+78.57%)
Mutual labels:  unicode, ascii, utf-8
Portable Utf8
๐Ÿ‰‘ Portable UTF-8 library - performance optimized (unicode) string functions for php.
Stars: โœญ 405 (+478.57%)
Mutual labels:  unicode, ascii, utf-8
Transliteration
UTF-8 to ASCII transliteration / slugify module for node.js, browser, Web Worker, React Native, Electron and CLI.
Stars: โœญ 444 (+534.29%)
Mutual labels:  unicode, ascii, utf-8
characteristics
Character info under different encodings
Stars: โœญ 25 (-64.29%)
Mutual labels:  unicode, ascii, utf-8
Slug Generator
Slug Generator Library for PHP, based on Unicodeโ€™s CLDR data
Stars: โœญ 740 (+957.14%)
Mutual labels:  unicode, ascii
Awesome Unicode
๐Ÿ˜‚ ๐Ÿ‘Œ A curated list of delightful Unicode tidbits, packages and resources.
Stars: โœญ 693 (+890%)
Mutual labels:  unicode, utf-8
Weird Json
A collection of strange encoded JSONs. For connoisseurs.
Stars: โœญ 53 (-24.29%)
Mutual labels:  unicode, ascii
Voca rs
Voca_rs is the ultimate Rust string library inspired by Voca.js, string.py and Inflector, implemented as independent functions and on Foreign Types (String and str).
Stars: โœญ 167 (+138.57%)
Mutual labels:  unicode, utf-8
Tomlplusplus
Header-only TOML config file parser and serializer for C++17 (and later!).
Stars: โœญ 403 (+475.71%)
Mutual labels:  unicode, utf-8
Lehar
Visualize data using relative ordering
Stars: โœญ 81 (+15.71%)
Mutual labels:  unicode, ascii
Stringz
๐Ÿ’ฏ Super fast unicode-aware string manipulation Javascript library
Stars: โœญ 181 (+158.57%)
Mutual labels:  unicode, utf-8
Urlify
A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.
Stars: โœญ 633 (+804.29%)
Mutual labels:  unicode, ascii
jurl
Fast and simple URL parsing for Java, with UTF-8 and path resolving support
Stars: โœญ 84 (+20%)
Mutual labels:  unicode, utf-8
Unicopy
Unicode command-line codepoint dumper
Stars: โœญ 16 (-77.14%)
Mutual labels:  unicode, utf-8
Cowsay Files
A collection of additional/alternative cowsay files.
Stars: โœญ 216 (+208.57%)
Mutual labels:  unicode, ascii
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: โœญ 48 (-31.43%)
Mutual labels:  unicode, utf-8
table2ascii
Python library for converting lists to fancy ASCII tables for displaying in the terminal and on Discord
Stars: โœญ 31 (-55.71%)
Mutual labels:  unicode, ascii
Encoding.js
Convert or detect character encoding in JavaScript
Stars: โœญ 338 (+382.86%)
Mutual labels:  unicode, utf-8
Bstr
A string type for Rust that is not required to be valid UTF-8.
Stars: โœญ 348 (+397.14%)
Mutual labels:  unicode, utf-8
Diagon
Interactive ASCII art diagram generators. ๐ŸŒŸ
Stars: โœญ 189 (+170%)
Mutual labels:  unicode, ascii

THE PROJECT IS ARCHIVED

Forks: https://github.com/orsinium/forks


Homoglyphs

Homoglyphs logo Build Status PyPI version Status Code size License

Homoglyphs -- python library for getting homoglyphs and converting to ASCII.

Features

It's smarter version of confusable_homoglyphs:

  • Autodect or manual choosing category (aliases from ISO 15924).
  • Auto or manual load only needed alphabets in memory.
  • Converting to ASCII.
  • More configurable.
  • More stable.

Installation

sudo pip install homoglyphs

Usage

Best way to explain something is show how it works. So, let's have a look on the real usage.

Importing:

import homoglyphs as hg

Languages

#detect
hg.Languages.detect('w')
# {'pl', 'da', 'nl', 'fi', 'cz', 'sr', 'pt', 'it', 'en', 'es', 'sk', 'de', 'fr', 'ro'}
hg.Languages.detect('ั‚')
# {'mk', 'ru', 'be', 'bg', 'sr'}
hg.Languages.detect('.')
# set()

# get alphabet for languages
hg.Languages.get_alphabet(['ru'])
# {'ะฒ', 'ะ', 'ะš', 'ะข', ..., 'ะ ', 'ะ—', 'ะญ'}

# get all languages
hg.Languages.get_all()
# {'nl', 'lt', ..., 'de', 'mk'}

Categories

Categories -- (aliases from ISO 15924).

#detect
hg.Categories.detect('w')
# 'LATIN'
hg.Categories.detect('ั‚')
# 'CYRILLIC'
hg.Categories.detect('.')
# 'COMMON'

# get alphabet for categories
hg.Categories.get_alphabet(['CYRILLIC'])
# {'ำ—', 'ิŒ', 'า', 'ะฏ', ..., 'ะญ', 'ิ•', 'ำป'}

# get all categories
hg.Categories.get_all()
# {'RUNIC', 'DESERET', ..., 'SOGDIAN', 'TAI_LE'}

Homoglyphs

Get homoglyphs:

# get homoglyphs (latin alphabet initialized by default)
hg.Homoglyphs().get_combinations('q')
# ['q', '๐ช', '๐‘ž', '๐’’', '๐“†', '๐“บ', '๐”ฎ', '๐•ข', '๐––', '๐—Š', '๐—พ', '๐˜ฒ', '๐™ฆ', '๐šš']

Alphabet loading:

# load alphabet on init by categories
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC'))  # alphabet loaded here
homoglyphs.get_combinations('ะณั‹')
# ['rั‹', 'ะณั‹', '๊ญ‡ั‹', '๊ญˆั‹', '๐ซั‹', '๐‘Ÿั‹', '๐’“ั‹', '๐“‡ั‹', '๐“ปั‹', '๐”ฏั‹', '๐•ฃั‹', '๐–—ั‹', '๐—‹ั‹', '๐—ฟั‹', '๐˜ณั‹', '๐™งั‹', '๐š›ั‹']

# load alphabet on init by languages
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'})  # alphabet will be loaded here
homoglyphs.get_combinations('ะณั‹')
# ['rั‹', 'ะณั‹']

# manual set alphabet on init      # eng rus
homoglyphs = hg.Homoglyphs(alphabet='abc ะฐะฑั')
homoglyphs.get_combinations('ั')
# ['c', 'ั']

# load alphabet on demand
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# ^ alphabet will be loaded here for "en" language
homoglyphs.get_combinations('ะณั‹')
# ^ alphabet will be loaded here for "ru" language
# ['rั‹', 'ะณั‹']

You can combine categories, languages, alphabet and any strategies as you want. The strategies specify how to handle any characters not already loaded:

  • STRATEGY_LOAD: load category for this character
  • STRATEGY_IGNORE: add character to result
  • STRATEGY_REMOVE: remove character from result

Converting glyphs to ASCII chars

homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)

# convert
homoglyphs.to_ascii('ะขะ•ะกะข')
# ['TECT']
homoglyphs.to_ascii('ะฅะ 123.')  # this is cyrillic "ั…" and "ั€"
# ['XP123.', 'XPI23.', 'XPl23.']

# string with chars which can't be converted by default will be ignored
homoglyphs.to_ascii('ะปะพะป')
# []

# you can set strategy for removing not converted non-ASCII chars from result
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('ะปะพะป')
# ['o']

# also you can set up range of allowed char codes for ascii (0-128 by default):
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
    ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ะฅะ 123.')
# ['l']
homoglyphs.to_ascii('ั…ั€123.')
# ['xpl']
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].