All Projects → aperezdc → lua-wcwidth

aperezdc / lua-wcwidth

Licence: other
Pure Lua implementation of the wcwidth() function

Programming Languages

lua
6591 projects
shell
77523 projects

Projects that are alternatives of or similar to lua-wcwidth

Rofimoji
An emoji and character picker for rofi 😁
Stars: ✭ 319 (+2178.57%)
Mutual labels:  unicode, character
urdu-characters
📄 Complete collection of Urdu language characters & unicode code points.
Stars: ✭ 24 (+71.43%)
Mutual labels:  unicode, character
unihandecode
unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities
Stars: ✭ 71 (+407.14%)
Mutual labels:  unicode, character
Transliteration
UTF-8 to ASCII transliteration / slugify module for node.js, browser, Web Worker, React Native, Electron and CLI.
Stars: ✭ 444 (+3071.43%)
Mutual labels:  unicode, character
Contour
Modern C++ Terminal Emulator
Stars: ✭ 191 (+1264.29%)
Mutual labels:  unicode
Keyman
Keyman cross platform input methods system running on Android, iOS, Linux, macOS, Windows and mobile and desktop web
Stars: ✭ 156 (+1014.29%)
Mutual labels:  unicode
Transliterate
Convert Unicode characters to Latin characters using transliteration
Stars: ✭ 152 (+985.71%)
Mutual labels:  unicode
Guide To Swift Strings Sample Code
Xcode Playground Sample Code for the Flight School Guide to Swift Strings
Stars: ✭ 136 (+871.43%)
Mutual labels:  unicode
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (+14.29%)
Mutual labels:  unicode
Twitter Text
Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.
Stars: ✭ 2,627 (+18664.29%)
Mutual labels:  unicode
Diagon
Interactive ASCII art diagram generators. 🌟
Stars: ✭ 189 (+1250%)
Mutual labels:  unicode
Rabbit
Another Zawgyi <=> Unicode Converter
Stars: ✭ 157 (+1021.43%)
Mutual labels:  unicode
Regexpu
A source code transpiler that enables the use of ES2015 Unicode regular expressions in ES5.
Stars: ✭ 201 (+1335.71%)
Mutual labels:  unicode
Harfbuzz
HarfBuzz text shaping engine
Stars: ✭ 2,206 (+15657.14%)
Mutual labels:  unicode
V Emoji Picker
🌟 A Lightweight and customizable package of Emoji Picker in Vue using emojis natives (unicode).
Stars: ✭ 231 (+1550%)
Mutual labels:  unicode
Idna
Internationalized Domain Names for Python (IDNA 2008 and UTS #46)
Stars: ✭ 138 (+885.71%)
Mutual labels:  unicode
Rust Unic
UNIC: Unicode and Internationalization Crates for Rust
Stars: ✭ 189 (+1250%)
Mutual labels:  unicode
Cowsay Files
A collection of additional/alternative cowsay files.
Stars: ✭ 216 (+1442.86%)
Mutual labels:  unicode
Stringz
💯 Super fast unicode-aware string manipulation Javascript library
Stars: ✭ 181 (+1192.86%)
Mutual labels:  unicode
Voca rs
Voca_rs is the ultimate Rust string library inspired by Voca.js, string.py and Inflector, implemented as independent functions and on Foreign Types (String and str).
Stars: ✭ 167 (+1092.86%)
Mutual labels:  unicode

lua-wcwidth

Build Status Coverage Status LuaRocks

When writing output to a fixed-width output system (such as a terminal), the displayed length of a string does not always match the number of characters (also known as runes, or code points) contained by the string. Some characters occupy two spaces (full-wide characters), and others occupy none.

POSIX.1-2001 and POSIX.1-2008 specify the wcwidth(3) function which can be used to know how many spaces (or cells) must be used to display a Unicode code point. This Lua contains a portable and standalone implementation based on the Unicode Standard release files.

This module is useful mainly for implementing programs which must produce output to terminals, while handling proper alignment for double-width and zero-width Unicode code points.

Usage

The following snippet defines a function which can determine the display width for a string:

local wcwidth, utf8 = require "wcwidth", require "utf8"

local function display_width(s)
  local len = 0
  for _, rune in utf8.codes(s) do
    local l = wcwidth(rune)
    if l >= 0 then
      len = len + l
    end
  end
  return len
end

The function above can be used to print any UTF-8 string properly right-aligned to a terminal:

local function alignright(s, cols)
  local numspaces = cols - display_width(s)
  local spaces = ""
  while numspaces > 0 do
    numspaces = numspaces - 1
    spaces = spaces .. " "
  end
  return spaces .. s
end

print(alignright("コンニチハ", 80))

The wcwidth() function takes a Unicode code point as argument, and returns one of the following values:

  • -1: Width cannot be determined (the code point is not printable).
  • 0: The code point does not advance the cursor (e.g. NULL, or a combining character).
  • 2: The character is East Asian wide (W) or East Asian full-width (F), and is displayed using two spaces.
  • 1: All the rest of characters, which take a single space.

Note that the wcswidth(3) companion function is deliberately not provided by this module: while Lua 5.3 provides utf8.codes() and utf8.codepoint() to convert UTF8 byte sequences to code points, for other Lua versions it would be needed to depend on a third party module, and that would be against the goal of wcwidth being standalone. If needed be, wcswidth() can be implemented as follows using the Lua 5.3 utf8 module (or any other implementation which provides a compatible implementation):

-- Calculates the printable length of first "n" characters of string "s"
-- on a terminal. Returns the number of cells or -1 if the string contains
-- non-printable characters. Raises an error on invalid UTF8 input.
function wcswidth(s, n)
  local cells = 0
  if n then
    local count = 0
    for _, rune in utf8.codes(s) do
      local w = wcwidth(rune)
      if w < 0 then return -1 end
      count = count + 1
      if count >= n then break end
    end
  else
    for _, rune in utf8.codes(s) do
      local w = wcwidth(rune)
      if w < 0 then return -1 end
      cells = cells + w
    end
  end
  return cells
end

Installation

LuaRocks is recommended for installation.

The stable version (recommended) can be installed with:

luarocks install wcwidth

The development version can be installed with:

luarocks install --server=https://luarocks.org/dev wcwidth

Unicode Tables

The update-tables script downloads the following resources from the Unicode Consortium website:

With them, it generates the following files:

The most current version of wcwidth uses the following versions of the above Unicode Standard release files:

  • EastAsianWidth-13.0.0.txt, Date: 2029-01-21, 18:14:00 GMT [KW, LI], © 2020 Unicode®, Inc.
  • DerivedGeneralCategory-13.0.0.txt, Date: 2019-10-21, 14:30:32 GMT, © 2019 Unicode®, Inc.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].