All Projects → nitely → Nim Unicodedb

nitely / Nim Unicodedb

Licence: mit
Unicode Character Database (UCD, tr44) for Nim

Programming Languages

nim
578 projects

Labels

Projects that are alternatives of or similar to Nim Unicodedb

Figures
Unicode symbols with Windows CMD fallbacks
Stars: ✭ 438 (+2205.26%)
Mutual labels:  unicode
Uni
Query the Unicode database from the commandline, with good support for emojis
Stars: ✭ 633 (+3231.58%)
Mutual labels:  unicode
Zws
Shorten URLs using invisible spaces.
Stars: ✭ 780 (+4005.26%)
Mutual labels:  unicode
Last Resort Font
Last Resort Font
Stars: ✭ 462 (+2331.58%)
Mutual labels:  unicode
Ugrep
🔍NEW ugrep v3.1: ultra fast grep with interactive query UI and fuzzy search: search file systems, source code, text, binary files, archives (cpio/tar/pax/zip), compressed files (gz/Z/bz2/lzma/xz/lz4), documents and more. A faster, user-friendly and compatible grep replacement.
Stars: ✭ 626 (+3194.74%)
Mutual labels:  unicode
Awesome Unicode
😂 👌 A curated list of delightful Unicode tidbits, packages and resources.
Stars: ✭ 693 (+3547.37%)
Mutual labels:  unicode
Portable Utf8
🉑 Portable UTF-8 library - performance optimized (unicode) string functions for php.
Stars: ✭ 405 (+2031.58%)
Mutual labels:  unicode
Unicopy
Unicode command-line codepoint dumper
Stars: ✭ 16 (-15.79%)
Mutual labels:  unicode
Urlify
A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.
Stars: ✭ 633 (+3231.58%)
Mutual labels:  unicode
Slug Generator
Slug Generator Library for PHP, based on Unicode’s CLDR data
Stars: ✭ 740 (+3794.74%)
Mutual labels:  unicode
Julia Vim
Vim support for Julia.
Stars: ✭ 556 (+2826.32%)
Mutual labels:  unicode
Weird Fonts
𝑨 𝑱𝒂𝒗𝒂𝑺𝒄𝒓𝒊𝒑𝒕 𝒑𝒂𝒄𝒌𝒂𝒈𝒆 𝒕𝒉𝒂𝒕 𝒕𝒖𝒓𝒏 𝒂𝒍𝒑𝒉𝒂𝒏𝒖𝒎𝒆𝒓𝒊𝒄 𝒄𝒉𝒂𝒓𝒂𝒄𝒕𝒆𝒓𝒔 𝒊𝒏𝒕𝒐 𝒘𝒆𝒊𝒓𝒅 𝒇𝒐𝒏𝒕 𝒔𝒕𝒚𝒍𝒆.
Stars: ✭ 602 (+3068.42%)
Mutual labels:  unicode
String
Provides an object-oriented API to strings and deals with bytes, UTF-8 code points and grapheme clusters in a unified way.
Stars: ✭ 709 (+3631.58%)
Mutual labels:  unicode
Transliteration
UTF-8 to ASCII transliteration / slugify module for node.js, browser, Web Worker, React Native, Electron and CLI.
Stars: ✭ 444 (+2236.84%)
Mutual labels:  unicode
Unicode Types
Basic Unicode Types of a Ruby String
Stars: ✭ 5 (-73.68%)
Mutual labels:  unicode
Wxmedit
wxMEdit, a cross-platform Text/Hex Editor, an improved version of MadEdit
Stars: ✭ 424 (+2131.58%)
Mutual labels:  unicode
Ecoji
Encodes (and decodes) data as emojis
Stars: ✭ 671 (+3431.58%)
Mutual labels:  unicode
Nepali Romanized Pro
Nepali Romanized Keyboard Layout with installer for macOS
Stars: ✭ 18 (-5.26%)
Mutual labels:  unicode
Pragmatapro
PragmataPro font is designed to help pros to work better
Stars: ✭ 887 (+4568.42%)
Mutual labels:  unicode
Unicodeplots.jl
Unicode-based scientific plotting for working in the terminal
Stars: ✭ 724 (+3710.53%)
Mutual labels:  unicode

UnicodeDB

Build Status licence

This library aims to bring the unicode database to Nim. Main goal is having O(1) access for every API and be lightweight in size.

Note: this library doesn't provide Unicode Common Locale Data (UCLD / CLDR data)

Install

nimble install unicodedb

Compatibility

Nim 0.18.0, +0.19.0, +0.20.0

Usage

Properties

import unicode
import unicodedb/properties

assert Rune('A'.ord).unicodeCategory() == ctgLu  # 'L'etter, 'u'ppercase
assert Rune('A'.ord).unicodeCategory() in ctgLm+ctgLo+ctgLu+ctgLl+ctgLt
assert Rune('A'.ord).unicodeCategory() in ctgL

echo Rune(0x0660).bidirectional() # 'A'rabic, 'N'umber
# "AN"

echo Rune(0x860).combining()
# 0

echo nfcQcNo in Rune(0x0374).quickCheck()
# true

docs

Names

import unicode
import unicodedb/names

echo lookupStrict("LEFT CURLY BRACKET")  # '{'
# Rune(0x007B)

echo "/".runeAt(0).name()
# "SOLIDUS"

docs

Compositions

import unicode
import unicodedb/compositions

echo composition(Rune(108), Rune(803))
# Rune(7735)

docs

Decompositions

import unicode
import unicodedb/decompositions

echo Rune(0x0F9D).decomposition()
# @[Rune(0x0F9C), Rune(0x0FB7)]

docs

Types

import unicode
import unicodedb/types

assert utmDecimal in Rune(0x0030).unicodeTypes()
assert utmDigit in Rune(0x00B2).unicodeTypes()
assert utmNumeric in Rune(0x2CFD).unicodeTypes()
assert utmLowercase in Rune(0x1E69).unicodeTypes()
assert utmUppercase in Rune(0x0041).unicodeTypes()
assert utmCased in Rune(0x0041).unicodeTypes()
assert utmWhiteSpace in Rune(0x0009).unicodeTypes()
assert utmWord in Rune(0x1E69).unicodeTypes()

const alphaNumeric = utmLowercase + utmUppercase + utmNumeric
assert alphaNumeric in Rune(0x2CFD).unicodeTypes()
assert alphaNumeric in Rune(0x1E69).unicodeTypes()
assert alphaNumeric in Rune(0x0041).unicodeTypes()

docs

Widths

import unicode
import unicodedb/widths

assert "🕺".runeAt(0).unicodeWidth() == uwdtWide

docs

Scripts

import unicode
import unicodedb/scripts

assert "諸".runeAt(0).unicodeScript() == sptHan

docs

Casing

import sequtils
import unicode
import unicodedb/casing

assert toSeq("Ⓗ".runeAt(0).lowerCase) == @["ⓗ".runeAt(0)]
assert toSeq("İ".runeAt(0).lowerCase) == @[0x0069.Rune, 0x0307.Rune]

assert toSeq("ⓗ".runeAt(0).upperCase) == @["Ⓗ".runeAt(0)]
assert toSeq("ffi".runeAt(0).upperCase) == @['F'.ord.Rune, 'F'.ord.Rune, 'I'.ord.Rune]

assert toSeq("ß".runeAt(0).titleCase) == @['S'.ord.Rune, 's'.ord.Rune]

assert toSeq("ᾈ".runeAt(0).caseFold) == @["ἀ".runeAt(0), "ι".runeAt(0)]

docs

Segmentation

import unicode
import unicodedb/segmentation

assert 0x000B.Rune.wordBreakProp == sgwNewline

docs

Related libraries

Storage

Storage is based on multi-stage tables and minimal perfect hashing data-structures.

Sizes

These are the current collections sizes:

  • properties is 40KB. Used by properties(1), category(1), bidirectional(1), combining(1) and quickCheck(1)
  • compositions is 12KB. Used by: composition(1)
  • decompositions is 89KB. Used by decomposition(1) and canonicalDecomposition(1)
  • names is 578KB. Used by name(1) and lookupStrict(1)
  • names (lookup) is 241KB. Used by lookupStrict(1)

Missing APIs

New APIs will be added from time to time. If you need something that's missing, please open an issue or PR (please, do mention the use-case).

Upgrading Unicode version

Note: PR's upgrading the unicode version won't get merged, open an issue instead!

  • Run nimble gen to check there are no changes to ./src/*_data.nim. If there are try an older Nim version and fix the generators accordingly
  • Run nimble gen_tests to update all test data to current unicode version. The tests for a new unicode version run against the previous unicode version.
  • Run tests and fix all failing tests. This should require just temporarily commenting out all checks for missing unicode points.
  • Overwrite ./gen/UCD data with latest unicode UCD.
  • Run nimble gen to generate the new data.
  • Run tests. Add checks for missing unicode points back. A handful of unicode points may have change its data, check the unicode changelog page, make sure they are correct and skip them.

Tests

Initial tests were ran against [a dump of] Python's unicodedata module to ensure correctness. Also, the related libraries have their own custom tests (some of the test data is provided by the unicode consortium).

nimble test

Contributing

I plan to work on most missing related libraries (case folding, etc). If you would like to work in one of those, please let me know and I'll add it to the list. If you find the required database data is missing, either open an issue or a PR.

LICENSE

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].