All Projects → sanette → ubase

sanette / ubase

Licence: LGPL-3.0 license
remove accents from utf8 strings

Programming Languages

ocaml
1615 projects

Projects that are alternatives of or similar to ubase

characteristics
Character info under different encodings
Stars: ✭ 25 (+78.57%)
Mutual labels:  utf-8, utf
simdutf8
SIMD-accelerated UTF-8 validation for Rust.
Stars: ✭ 426 (+2942.86%)
Mutual labels:  utf-8
bilibili-web-socket
弹幕 web websocket
Stars: ✭ 19 (+35.71%)
Mutual labels:  utf
cyan
Cyan Color Converter
Stars: ✭ 68 (+385.71%)
Mutual labels:  converter
romans
A Simple PHP Roman Numerals Library
Stars: ✭ 40 (+185.71%)
Mutual labels:  converter
Tracker
Even the best of apps have their issues
Stars: ✭ 113 (+707.14%)
Mutual labels:  converter
CTR-tools
Crash Team Racing (PS1) tools - a C# framework by DCxDemo and a set of tools to parse files found in the original kart racing game by Naughty Dog.
Stars: ✭ 93 (+564.29%)
Mutual labels:  converter
xml-to-json
Simple API that converts dynamic XML feeds to JSON through a URL or pasting the raw XML data. Made 100% in PHP.
Stars: ✭ 38 (+171.43%)
Mutual labels:  converter
color-converter
Command line tool for converting colors from RGB to HEX and vice versa.
Stars: ✭ 17 (+21.43%)
Mutual labels:  converter
bot-whatsapp
Unmaintained - Multipurpose WhatsApp Bot 🤖 using open-wa/wa-automate-nodejs library! ✨
Stars: ✭ 78 (+457.14%)
Mutual labels:  converter
to-ico
Convert PNG to ICO in memory
Stars: ✭ 115 (+721.43%)
Mutual labels:  converter
Rates
A currency rate converter App.
Stars: ✭ 13 (-7.14%)
Mutual labels:  converter
ical2json
A simple node package to convert ical data to json
Stars: ✭ 46 (+228.57%)
Mutual labels:  converter
CodeProject
Common code for unity project develop.
Stars: ✭ 28 (+100%)
Mutual labels:  converter
ipynb-py-convert
Convert .py files runnable in VSCode or Atom/Hydrogen to Jupyter .ipynb notebooks and vice versa
Stars: ✭ 38 (+171.43%)
Mutual labels:  converter
fast-text-encoding
Fast polyfill for TextEncoder and TextDecoder, only supports UTF-8
Stars: ✭ 78 (+457.14%)
Mutual labels:  utf-8
medium-to-markdown
Converts Medium posts to markdown.
Stars: ✭ 68 (+385.71%)
Mutual labels:  converter
mdtable2csv
convert tables in .md to .csv
Stars: ✭ 91 (+550%)
Mutual labels:  converter
react-in-out-textarea
A simple React.js User Interface Component that is like Google Translate with full TypeScript Support
Stars: ✭ 28 (+100%)
Mutual labels:  converter
form-data-json
A zero dependency, cross browser library to easily get or set/manipulate form input values as/from a json object.
Stars: ✭ 37 (+164.29%)
Mutual labels:  converter

Ubase

Ocaml library for removing diacritics (accents, etc.) from Latin letters in UTF8 string.

It should work for all utf8 strings, regardless of normalization NFC, NFD, NFKD, NFKC.

Please don't use this library to store your strings without accents! On the contrary, store them in full UTF8 encoding, and use this library to simplify searching and comparison.

Example

let nfc = "V\197\169 Ng\225\187\141c Phan";; 
let nfd = "Vu\204\131 Ngo\204\163c Phan";;

print_endline nfc;; 
Vũ Ngọc Phan

print_endline nfd;; 
Vũ Ngọc Phan

Ubase.from_utf8 nfc;;
- : string = "Vu Ngoc Phan"

Ubase.from_utf8 nfd;; 
- : string = "Vu Ngoc Phan"

Usage

val from_utf8 : ?malformed:string -> ?strip:string -> string -> string
(** Remove all diacritics on Latin letters from a standard string containing
   UTF8 text. Any malformed UTF8 will be replaced by the [malformed] parameter
   (by default "?"). If the optional parameter [strip] is present, all
   non-ASCII, non-Latin unicode characters will be replaced by the [strip]
   string (which can be empty). If both [malformed] and [strip] contain only
   ASCII characters, then the result of [from_utf8] is guaranteed to
   contain only ASCII characters. *)

If your accented string is encoded in isolatin, you first have to convert it to utf8 using isolatin_to_utf8 mystring.

Install

ubase is available on opam:

opam install ubase

That's it!

If you prefer to build a local version, download the repository, move into the ubase directory, and

dune build
opam install .

Ubase depends on uutf.

Quick test

Before installing

From the ubase directory:

dune utop

From the command line

Once you have installed the library, you can execute the ubase program from a terminal

$ ubase Déjà vu !
Deja vu !

$ ubase "et grønt træ"
et gront trae

$ ubase Anh xin lỗi các em bé vì đã đề tặng cuốn sách này cho một ông người lớn.
Anh xin loi cac em be vi da de tang cuon sach nay cho mot ong nguoi lon.

(Notice that the quotes "" are not required)

Doc

Documentation and API are available here.

Manually building the docs, from the ubase directory:

dune build @doc
firefox ./_build/default/_doc/_html/ubase/Ubase/index.html

Using Ubase for accent-insensitive searching

Have a look at Ufind, a small search engine based on Ubase.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].