All Projects → rivo → Uniseg

rivo / Uniseg

Licence: mit
Unicode Text Segmentation for Go (or: How to Count Characters in a String)

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Labels

Projects that are alternatives of or similar to Uniseg

Yawysiwygee
Yet another what-you-see-is-what-you-get equation editor
Stars: ✭ 60 (-46.9%)
Mutual labels:  unicode
U2c
Unicode To Chinese -- U2C : A burpsuite Extender That Convert Unicode To Chinese 【Unicode编码转中文的burp插件】
Stars: ✭ 83 (-26.55%)
Mutual labels:  unicode
Unicode Display width
Monospace Unicode character width in Ruby
Stars: ✭ 98 (-13.27%)
Mutual labels:  unicode
Emoji Regex
A regular expression to match all Emoji-only symbols as per the Unicode Standard.
Stars: ✭ 1,134 (+903.54%)
Mutual labels:  unicode
Unicode
Unicode normalization library. (Mirror of Yoshida-san's code base to maintain the RubyGem.)
Stars: ✭ 81 (-28.32%)
Mutual labels:  unicode
Open Arrow
Open Arrow is an open-source font that contains 112 arrow symbols from U+2190 to U+21ff
Stars: ✭ 89 (-21.24%)
Mutual labels:  unicode
Glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
Stars: ✭ 1,099 (+872.57%)
Mutual labels:  unicode
Proposal Regexp Unicode Property Escapes
Proposal to add Unicode property escapes `\p{…}` and `\P{…}` to regular expressions in ECMAScript.
Stars: ✭ 112 (-0.88%)
Mutual labels:  unicode
Demoji
Accurately find/replace/remove emojis in text strings
Stars: ✭ 82 (-27.43%)
Mutual labels:  unicode
Pythonimproved
The best Python language definition for Sublime Text - ever. Includes full support for Unicode, as well as both Python 2 and Python 3 syntax. Check out the Neon Color Scheme for highlighting.
Stars: ✭ 95 (-15.93%)
Mutual labels:  unicode
Locale2
💪 Try as hard as possible to detect the client's language tag ("locale") in node or the browser. Browserify and Webpack friendly!
Stars: ✭ 65 (-42.48%)
Mutual labels:  unicode
Lehar
Visualize data using relative ordering
Stars: ✭ 81 (-28.32%)
Mutual labels:  unicode
String Extra
Unicode/String support for Twig
Stars: ✭ 92 (-18.58%)
Mutual labels:  unicode
Knayi Myscript
Myanmar Language Script Library
Stars: ✭ 63 (-44.25%)
Mutual labels:  unicode
Plotille
Plot in the terminal using braille dots.
Stars: ✭ 99 (-12.39%)
Mutual labels:  unicode
Sinais
🔣 Desenvolvimento passo a passo do exemplo `sinais` em Go.
Stars: ✭ 59 (-47.79%)
Mutual labels:  unicode
Ofxfontstash
Easy (and fast) unicode string rendering addon for OpenFrameworks. FontStash is made by Andreas Krinke and Mikko Mononen
Stars: ✭ 84 (-25.66%)
Mutual labels:  unicode
Tendo
Official repository of python tendo library, always welcoming new contributions.
Stars: ✭ 113 (+0%)
Mutual labels:  unicode
Hybrid Fonts
Monospaced fonts patched with Chinese characters and extra glyphs from Nerd Fonts
Stars: ✭ 102 (-9.73%)
Mutual labels:  unicode
Normality
A tiny library for Python text normalisation. Useful for ad-hoc text processing.
Stars: ✭ 94 (-16.81%)
Mutual labels:  unicode

Unicode Text Segmentation for Go

Godoc Reference Go Report

This Go package implements Unicode Text Segmentation according to Unicode Standard Annex #29 (Unicode version 12.0.0).

At this point, only the determination of grapheme cluster boundaries is implemented.

Background

In Go, strings are read-only slices of bytes. They can be turned into Unicode code points using the for loop or by casting: []rune(str). However, multiple code points may be combined into one user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples:

String Bytes (UTF-8) Code points (runes) Grapheme clusters
Käse 6 bytes: 4b 61 cc 88 73 65 5 code points: 4b 61 308 73 65 4 clusters: [4b],[61 308],[73],[65]
🏳️‍🌈 14 bytes: f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88 4 code points: 1f3f3 fe0f 200d 1f308 1 cluster: [1f3f3 fe0f 200d 1f308]
🇩🇪 8 bytes: f0 9f 87 a9 f0 9f 87 aa 2 code points: 1f1e9 1f1ea 1 cluster: [1f1e9 1f1ea]

This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of user-perceived characters, to split strings in their intended places, or to extract individual characters which form a unit.

Installation

go get github.com/rivo/uniseg

Basic Example

package uniseg

import (
	"fmt"

	"github.com/rivo/uniseg"
)

func main() {
	gr := uniseg.NewGraphemes("👍🏼!")
	for gr.Next() {
		fmt.Printf("%x ", gr.Runes())
	}
	// Output: [1f44d 1f3fc] [21]
}

Documentation

Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation.

Dependencies

This package does not depend on any packages outside the standard library.

Your Feedback

Add your issue here on GitHub. Feel free to get in touch if you have any questions.

Version

Version tags will be introduced once Golang modules are official. Consider this version 0.1.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].