All Projects → dart-lang → characters

dart-lang / characters

Licence: BSD-3-Clause license
A package for characters represented as unicode extended grapheme clusters

Programming Languages

dart
5743 projects

Labels

Projects that are alternatives of or similar to characters

unicode-lookup
The web's best unicode lookup tool!
Stars: ✭ 49 (-9.26%)
Mutual labels:  unicode
ara
ع Command line tool that displays Arabic text in terminal.
Stars: ✭ 27 (-50%)
Mutual labels:  unicode
android-unicode
Android unicode UTF-7 input apk
Stars: ✭ 23 (-57.41%)
Mutual labels:  unicode
unicode display width
Displayed width of UTF-8 strings in Modern C++
Stars: ✭ 30 (-44.44%)
Mutual labels:  unicode
unigem-objective-c
Unicode Gems, a Mac app, an iOS app, and an iOS keyboard for letter-like unicode.
Stars: ✭ 22 (-59.26%)
Mutual labels:  unicode
CJK-character-count
Program that counts the amount of CJK characters based on Unicode ranges and Chinese encoding standards 字体汉字计数软件
Stars: ✭ 195 (+261.11%)
Mutual labels:  unicode
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: ✭ 48 (-11.11%)
Mutual labels:  unicode
characteristics
Character info under different encodings
Stars: ✭ 25 (-53.7%)
Mutual labels:  unicode
unicode
A Flask-Based Web-App for Exploring Unicode
Stars: ✭ 12 (-77.78%)
Mutual labels:  unicode
unicodia
Encyclopedia of Unicode characters
Stars: ✭ 17 (-68.52%)
Mutual labels:  unicode
glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
Stars: ✭ 422 (+681.48%)
Mutual labels:  unicode
ruby-homograph-detector
🕵️‍♀️🕵️‍♂️ Ruby gem for determining whether a given URL is considered an IDN homograph attack
Stars: ✭ 29 (-46.3%)
Mutual labels:  unicode
utf8-validator
UTF-8 Validator
Stars: ✭ 18 (-66.67%)
Mutual labels:  unicode
unicode-data
Temporary holding place for my suggestions for future version of Unicode data files. Report bugs to https://www.unicode.org/reporting.html
Stars: ✭ 18 (-66.67%)
Mutual labels:  unicode
table2ascii
Python library for converting lists to fancy ASCII tables for displaying in the terminal and on Discord
Stars: ✭ 31 (-42.59%)
Mutual labels:  unicode
confusables
A nodejs library for removing confusable unicode characters from strings.
Stars: ✭ 50 (-7.41%)
Mutual labels:  unicode
hyphenation
Text hyphenation for Rust
Stars: ✭ 43 (-20.37%)
Mutual labels:  unicode
icu-swift
Swift APIs for ICU
Stars: ✭ 23 (-57.41%)
Mutual labels:  unicode
homoglyphs
Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.
Stars: ✭ 70 (+29.63%)
Mutual labels:  unicode
nepali utils
A pure dart package with collection of Nepali Utilities like Date converter, Date formatter, DateTime, Nepali Numbers, Nepali Unicode, Nepali Moments and many more.
Stars: ✭ 22 (-59.26%)
Mutual labels:  unicode

Build Status pub package package publisher

Characters are strings viewed as sequences of user-perceived characters, also known as Unicode (extended) grapheme clusters.

The Characters class allows access to the individual characters of a string, and a way to navigate back and forth between them using a CharacterRange.

Unicode characters and representations

There is no such thing as plain text.

Computers only know numbers, so any "text" on a computer is represented by numbers, which are again stored as bytes in memory.

The meaning of those bytes are provided by layers of interpretation, building up to the glyphs that the computer displays on the screen.

Abstraction Dart Type Usage Example
Bytes ByteBuffer,
Uint8List
Physical layout: Memory or network communication. file.readAsBytesSync()
Code units Uint8List (UTF‑8)
Uint16List, String (UTF‑16)
Standard formats for
encoding code points in memory.
Stored in memory using one (UTF‑8) or more (UTF‑16) bytes. One or more code units encode a code point.
string.codeUnits
string.codeUnitAt(index)
utf8.encode(string)
Code points Runes The Unicode unit of meaning. string.runes
Grapheme Clusters Characters Human perceived character. One or more code points. string.characters
Glyphs Visual rendering of grapheme clusters. print(string)

A Dart String is a sequence of UTF-16 code units, just like strings in JavaScript and Java. The runtime system decides on the underlying physical representation.

That makes plain strings inadequate when needing to manipulate the text that a user is viewing, or entering, because string operations are not working at the grapheme cluster level.

For example, to abbreviate a text to, say, the 15 first characters or glyphs, a string like "A 🇬🇧 text in English" should abbreviate to "A 🇬🇧 text in Eng… when counting characters, but will become "A 🇬🇧 text in …" if counting code units using String operations.

Whenever you need to manipulate strings at the character level, you should be using the Characters type, not the methods of the String class.

The Characters class

The Characters class exposes a string as a sequence of grapheme clusters. All operations on Characters operate on entire grapheme clusters, so it removes the risk of splitting combined characters or emojis that are inherent in the code-unit based String operations.

You can get a Characters object for a string using either the constructor Characters(string) or the extension getter string.characters.

At its core, the class is an Iterable<String> where the element strings are single grapheme clusters. This allows sequential access to the individual grapheme clusters of the original string.

On top of that, there are operations mirroring the operations of String that are not index, code-unit or code-point based, like startsWith or replaceAll. There are some differences between these and the String operations. For example the replace methods only accept characters as pattern. Regular expressions are not grapheme cluster aware, so they cannot be used safely on a sequence of characters.

Grapheme clusters have varying length in the underlying representation, so operations on a Characters sequence cannot be index based. Instead the CharacterRange iterator provided by Characters.iterator has been greatly enhanced. It can move both forwards and backwards, and it can span a range of grapheme cluster. Most operations that can be performed on a full Characters can also be performed on the grapheme clusters in the range of a CharacterRange. The range can be contracted, expanded or moved in various ways, not restricted to using moveNext, to move to the next grapheme cluster.

Example:

// Using String indices.
String firstTagString(String source) {
  var start = string.indexOf("<") + 1;
  if (start > 0) {
    var end = string.indexOf(">", start);
    if (end >= 0) {
	    return string.substring(start, end);
    }
  }
  return null;
}

// Using CharacterRange operations.
Characters firstTagCharacters(Characters source) {
  var range = source.findFirst("<".characters);
  if (range != null && range.moveUntil(">".characters)) {
    return range.currentCharacters;
  }
  return null;
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].