All Projects → janlelis → Unicode Confusable

janlelis / Unicode Confusable

Licence: mit
Unicode::Confusable.confusable? "ℜսᖯʏ", "Ruby"

Programming Languages

ruby
36898 projects - #4 most used programming language

Labels

Projects that are alternatives of or similar to Unicode Confusable

Lexical Sort
Sort Unicode strings lexicographically
Stars: ✭ 23 (-51.06%)
Mutual labels:  unicode
Pysamesame
This is a python version of samesame repo to generate homograph strings
Stars: ✭ 20 (-57.45%)
Mutual labels:  unicode
Unicode Bidirectional
A Javascript implementation of the Unicode 9.0.0 Bidirectional Algorithm
Stars: ✭ 35 (-25.53%)
Mutual labels:  unicode
Git Praise
A nicer git blame.
Stars: ✭ 24 (-48.94%)
Mutual labels:  unicode
Utf8.h
📚 single header utf8 string functions for C and C++
Stars: ✭ 875 (+1761.7%)
Mutual labels:  unicode
Tcodeedit
Lightweight and syntax hilighted UNICODE editor
Stars: ✭ 27 (-42.55%)
Mutual labels:  unicode
Nepali Romanized Pro
Nepali Romanized Keyboard Layout with installer for macOS
Stars: ✭ 18 (-61.7%)
Mutual labels:  unicode
Icu
The new home of the ICU project source code.
Stars: ✭ 1,011 (+2051.06%)
Mutual labels:  unicode
Unicode 9.0.0
JavaScript-compatible Unicode data. Arrays of code points, arrays of symbols, and regular expressions for Unicode v9.0.0’s categories, scripts, blocks, bidi, and other properties.
Stars: ✭ 15 (-68.09%)
Mutual labels:  unicode
Unidump
hexdump(1) for Unicode data
Stars: ✭ 31 (-34.04%)
Mutual labels:  unicode
Idutf8lib
Idiot's UTF-8 Library
Stars: ✭ 12 (-74.47%)
Mutual labels:  unicode
Myanmar Unicode Fonts
Fonts preview for list of Myanmar Unicode fonts
Stars: ✭ 14 (-70.21%)
Mutual labels:  unicode
Awesome Typography
✏︎ Curated list about digital typography 🔥
Stars: ✭ 947 (+1914.89%)
Mutual labels:  unicode
Alfred Unicode
Preview Unicode characters and emoji in Alfred
Stars: ✭ 23 (-51.06%)
Mutual labels:  unicode
Unicode 8.0.0
JavaScript-compatible Unicode data. Arrays of code points, arrays of symbols, and regular expressions for Unicode v8.0.0’s categories, scripts, blocks, bidi, and other properties.
Stars: ✭ 38 (-19.15%)
Mutual labels:  unicode
Nim Unicodedb
Unicode Character Database (UCD, tr44) for Nim
Stars: ✭ 19 (-59.57%)
Mutual labels:  unicode
Php Confusable Homoglyphs
A PHP port of https://github.com/vhf/confusable_homoglyphs
Stars: ✭ 27 (-42.55%)
Mutual labels:  unicode
Phobos
The standard library of the D programming language
Stars: ✭ 1,038 (+2108.51%)
Mutual labels:  unicode
Unicode Tr51
Emoji data extracted from Unicode Technical Report #51.
Stars: ✭ 38 (-19.15%)
Mutual labels:  unicode
Unicode 10.0.0
JavaScript-compatible Unicode data. Arrays of code points, arrays of symbols, and regular expressions for Unicode v10.0.0’s categories, scripts, blocks, bidi, and other properties.
Stars: ✭ 30 (-36.17%)
Mutual labels:  unicode

Unicode::Confusable [version] [travis]

Compares two strings if they are visually confusable as described in Unicode® Technical Standard #39: Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string (NFD), replacing confusable characters, and normalizing the string again.

Unicode version: 13.0.0 (March 2020)

Supported Rubies: 2.7, 2.6, 2.5, 2.4

Old Rubies which might still work: 2.3, 2.2

Usage

Confusable?

require "unicode/confusable"

Unicode::Confusable.confusable? "a", "b" # => false
Unicode::Confusable.confusable? "C", "С" # => true
Unicode::Confusable.confusable? "ℜ𝘂ᖯʏ", "Ruby" # => true
Unicode::Confusable.confusable? "Michael", "Michae1" # => true
Unicode::Confusable.confusable? "⁇", "?" # => false
Unicode::Confusable.confusable? "⁇", "??" # => true

Skeleton

Unicode::Confusable.skeleton "ℜ𝘂ᖯʏ" # => "Ruby"

Please note: The skeleton is an intermediate representation, not meant for any other use than testing confusability, according to the standard.

List

List all confusables of a specific character:

Unicode::Confusable.list("o", false)
# => ["ం", "ಂ", "ം", "ං", "०", "੦", "૦", "௦", "౦", "೦", "൦", "๐", "໐", "၀", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬"]

If you omit the second parameter, it will also show confusables, where the given character is just a part of:

Unicode::Confusable.list("o")
# => ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "੦", "૦", "௦", "౦", "೦", "൦", "๐", "໐", "၀", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬", "ۿ", "ø", "ꬾ", "ɵ", "ꝋ", "ө", "ѳ", "ꮎ", "ꮻ", "ꭴ", "ﳙ", "ơ", "œ", "ɶ", "∞", "ꝏ", "ꚙ", "ﳗ", "ﱑ", "ﳘ", "ﱒ", "ﶓ", "ﶔ", "ﱓ", "ﱔ", "ൟ", "တ", "ꭣ", "ﲠ", "ﳢ", "ﲥ", "ﳤ", "ﷻ", "ﴱ", "ﳨ", "ﴲ", "ﳪ", "ﷺ", "ﷷ", "ﳍ", "ﳖ", "ﳯ", "ﳞ", "ﳱ", "ﳦ", "ﲛ", "ﳠ", "ﯭ", "ﯬ"]

No Advanced Detection

TR 39 also describes mechanisms for a more exact recognition of confusables, also within the same string:

  • Single-script confusable
  • Mixed-script confusable
  • Whole-script confusable

This is currently not supported by this gem.

See unicode-x for more Unicode related micro libraries.

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].