All Projects → benkasminbullock → unicode-c

benkasminbullock / unicode-c

Licence: other
A C library for handling Unicode, UTF-8, surrogate pairs, etc.

Programming Languages

c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
perl
6916 projects
Makefile
30231 projects

Projects that are alternatives of or similar to unicode-c

UniObfuscator
Java obfuscator that hides code in comment tags and Unicode garbage by making use of Java's Unicode escapes.
Stars: ✭ 40 (+25%)
Mutual labels:  unicode, utf-8
jurl
Fast and simple URL parsing for Java, with UTF-8 and path resolving support
Stars: ✭ 84 (+162.5%)
Mutual labels:  unicode, utf-8
Unibits
Visualize different Unicode encodings in the terminal
Stars: ✭ 125 (+290.63%)
Mutual labels:  unicode, utf-8
Awesome Unicode
😂 👌 A curated list of delightful Unicode tidbits, packages and resources.
Stars: ✭ 693 (+2065.63%)
Mutual labels:  unicode, utf-8
characteristics
Character info under different encodings
Stars: ✭ 25 (-21.87%)
Mutual labels:  unicode, utf-8
Unicopy
Unicode command-line codepoint dumper
Stars: ✭ 16 (-50%)
Mutual labels:  unicode, utf-8
Stringz
💯 Super fast unicode-aware string manipulation Javascript library
Stars: ✭ 181 (+465.63%)
Mutual labels:  unicode, utf-8
Bstr
A string type for Rust that is not required to be valid UTF-8.
Stars: ✭ 348 (+987.5%)
Mutual labels:  unicode, utf-8
homoglyphs
Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.
Stars: ✭ 70 (+118.75%)
Mutual labels:  unicode, utf-8
utf8-validator
UTF-8 Validator
Stars: ✭ 18 (-43.75%)
Mutual labels:  unicode, utf-8
Transliteration
UTF-8 to ASCII transliteration / slugify module for node.js, browser, Web Worker, React Native, Electron and CLI.
Stars: ✭ 444 (+1287.5%)
Mutual labels:  unicode, utf-8
UnicodeBOMInputStream
Doing things right, in the name of Sun / Oracle
Stars: ✭ 36 (+12.5%)
Mutual labels:  unicode, utf-8
Portable Utf8
🉑 Portable UTF-8 library - performance optimized (unicode) string functions for php.
Stars: ✭ 405 (+1165.63%)
Mutual labels:  unicode, utf-8
umoji
😄 A lib convert emoji unicode to Surrogate pairs
Stars: ✭ 68 (+112.5%)
Mutual labels:  unicode, surrogate-pairs
Tomlplusplus
Header-only TOML config file parser and serializer for C++17 (and later!).
Stars: ✭ 403 (+1159.38%)
Mutual labels:  unicode, utf-8
Voca rs
Voca_rs is the ultimate Rust string library inspired by Voca.js, string.py and Inflector, implemented as independent functions and on Foreign Types (String and str).
Stars: ✭ 167 (+421.88%)
Mutual labels:  unicode, utf-8
Tiny Utf8
Unicode (UTF-8) capable std::string
Stars: ✭ 322 (+906.25%)
Mutual labels:  unicode, utf-8
Encoding.js
Convert or detect character encoding in JavaScript
Stars: ✭ 338 (+956.25%)
Mutual labels:  unicode, utf-8
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: ✭ 48 (+50%)
Mutual labels:  unicode, utf-8
simdutf8
SIMD-accelerated UTF-8 validation for Rust.
Stars: ✭ 426 (+1231.25%)
Mutual labels:  unicode, utf-8
This is a Unicode library in the C programming language which deals
with conversions to and from the UTF-8 format.

* Author: 

    Ben Bullock <[email protected]>, <[email protected]>

* Repository: 

    https://github.com/benkasminbullock/unicode-c

* Licence: 

You can use this C code under the BSD three-clause licence, the GNU
General Public Licence, either version 2 or later, or the Perl
artistic licence.

* Version:

There is no version for this, please use the git commit numbers.

* Documentation:

Documentation consists of the comments in the source code, a manual
page in doc/unicode.3, and an HTML version in doc/unicode.html.

The mdoc manual page unicode.3 and the html page doc/unicode.html are
both generated from the comments using Perl scripts in the "doc"
subdirectory of the repository. Running these scripts may require you
to install some Perl modules from CPAN. The following command will
download and install all the necessary modules:

    curl -L https://cpanmin.us | perl - App::cpanminus
    cpanm C::Tokenize Convert::Moji File::Slurper HTML::Make JSON::Create JSON::Parse List::UtilsBy Template Text::LineNumber

* Conventions

All of the functions except "unicode_code_to_error" return values of
type int32_t (32-bit signed integers). All of the UTF-8 inputs are of
the type "unsigned char".

* Testing:

Compile with -DTEST or use "make test" to run the tests. The tests are
contained in "unicode.c" itself. Please refer to the source
code. Running the tests requires the "prove" utility which is part of
Perl.

* Dependencies

This C code uses the definitions from the standard header file
"stdint.h" to get consistent integer types, and functions from
"string.h" to get lengths of strings, and for testing.

* Bugs:

Either send email or use the github "issues" pages to report bugs.

* Known problems:

** The library uses UCS2 where it should have said UTF-16. There are a
   few similar misnamings.

** 0xFF is regarded as a valid UTF-8 first byte by some routines.

* Online version

There is an online web version of this software here:

    http://www.lemoda.net/tools/uniconvert/index.html
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].