All Projects → polygonplanet → Encoding.js

polygonplanet / Encoding.js

Licence: mit
Convert or detect character encoding in JavaScript

Programming Languages

javascript
184084 projects - #8 most used programming language
ecmascript
72 projects

Projects that are alternatives of or similar to Encoding.js

jurl
Fast and simple URL parsing for Java, with UTF-8 and path resolving support
Stars: ✭ 84 (-75.15%)
Mutual labels:  unicode, utf-8
Tiny Utf8
Unicode (UTF-8) capable std::string
Stars: ✭ 322 (-4.73%)
Mutual labels:  unicode, utf-8
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: ✭ 48 (-85.8%)
Mutual labels:  unicode, utf-8
Unibits
Visualize different Unicode encodings in the terminal
Stars: ✭ 125 (-63.02%)
Mutual labels:  unicode, utf-8
Lingo
Text encoding for modern C++
Stars: ✭ 28 (-91.72%)
Mutual labels:  unicode, utf-8
Voca rs
Voca_rs is the ultimate Rust string library inspired by Voca.js, string.py and Inflector, implemented as independent functions and on Foreign Types (String and str).
Stars: ✭ 167 (-50.59%)
Mutual labels:  unicode, utf-8
homoglyphs
Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.
Stars: ✭ 70 (-79.29%)
Mutual labels:  unicode, utf-8
Portable Utf8
🉑 Portable UTF-8 library - performance optimized (unicode) string functions for php.
Stars: ✭ 405 (+19.82%)
Mutual labels:  unicode, utf-8
UnicodeBOMInputStream
Doing things right, in the name of Sun / Oracle
Stars: ✭ 36 (-89.35%)
Mutual labels:  unicode, utf-8
simdutf8
SIMD-accelerated UTF-8 validation for Rust.
Stars: ✭ 426 (+26.04%)
Mutual labels:  unicode, utf-8
Unicopy
Unicode command-line codepoint dumper
Stars: ✭ 16 (-95.27%)
Mutual labels:  unicode, utf-8
unicode-c
A C library for handling Unicode, UTF-8, surrogate pairs, etc.
Stars: ✭ 32 (-90.53%)
Mutual labels:  unicode, utf-8
Awesome Unicode
😂 👌 A curated list of delightful Unicode tidbits, packages and resources.
Stars: ✭ 693 (+105.03%)
Mutual labels:  unicode, utf-8
Stringz
💯 Super fast unicode-aware string manipulation Javascript library
Stars: ✭ 181 (-46.45%)
Mutual labels:  unicode, utf-8
Transliteration
UTF-8 to ASCII transliteration / slugify module for node.js, browser, Web Worker, React Native, Electron and CLI.
Stars: ✭ 444 (+31.36%)
Mutual labels:  unicode, utf-8
utf8-validator
UTF-8 Validator
Stars: ✭ 18 (-94.67%)
Mutual labels:  unicode, utf-8
Bstr
A string type for Rust that is not required to be valid UTF-8.
Stars: ✭ 348 (+2.96%)
Mutual labels:  unicode, utf-8
Tomlplusplus
Header-only TOML config file parser and serializer for C++17 (and later!).
Stars: ✭ 403 (+19.23%)
Mutual labels:  unicode, utf-8
characteristics
Character info under different encodings
Stars: ✭ 25 (-92.6%)
Mutual labels:  unicode, utf-8
UniObfuscator
Java obfuscator that hides code in comment tags and Unicode garbage by making use of Java's Unicode escapes.
Stars: ✭ 40 (-88.17%)
Mutual labels:  unicode, utf-8

encoding.js

Build Status

Converts character encoding in JavaScript.

README(Japanese)

Installation

In Browser:

<script src="encoding.js"></script>

or

<script src="encoding.min.js"></script>

Object Encoding will be defined in the global scope.

Conversion and detection for the Array (like Array object).

In Node.js:

encoding.js is published by module name of encoding-japanese in npm.

npm install encoding-japanese
var encoding = require('encoding-japanese');

Each methods are also available for the Buffer in Node.js.

bower:

bower install encoding-japanese

CDN

encoding.js is available on cdnjs.com.

Convert character encoding (convert):

  • {Array.<number>|string} Encoding.convert ( data, to_encoding [, from_encoding ] )
    Converts character encoding.
    @param {Array.<number>|TypedArray|Buffer|string} data The target data.
    @param {(string|Object)} to_encoding The encoding name of conversion destination.
    @param {(string|Array.<string>)=} [from_encoding] The encoding name of source or 'AUTO'.
    @return {Array|string} Return the converted array/string.
// Convert character encoding to Shift_JIS from UTF-8.
var utf8Array = new Uint8Array(...) or [...] or Array(...) or Buffer(...);
var sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8');

// Convert character encoding by automatic detection (AUTO detect).
var sjisArray = Encoding.convert(utf8Array, 'SJIS');
// or  
var sjisArray = Encoding.convert(utf8Array, 'SJIS', 'AUTO');

// Detect the character encoding.
// The return value be one of the "Available Encodings" below.
var detected = Encoding.detect(utf8Array);
if (detected === 'UTF8') {
  console.log('Encoding is UTF-8');
}
Available Encodings:
  • 'UTF32' (detect only)
  • 'UTF16'
  • 'UTF16BE'
  • 'UTF16LE'
  • 'BINARY' (detect only)
  • 'ASCII' (detect only)
  • 'JIS'
  • 'UTF8'
  • 'EUCJP'
  • 'SJIS'
  • 'UNICODE' (JavaScript Unicode Array)

Note: UNICODE is an array that has a value of String.charCodeAt() in JavaScript.
(Each value in the array possibly has a number greater than 256.)

Specify the Object argument
var sjisArray = Encoding.convert(utf8Array, {
  to: 'SJIS', // to_encoding
  from: 'UTF8' // from_encoding
});

Readability improves by passing an object to the second argument.

Specify the string argument and 'type' option
var utf8String = \u0081\u0093ã\u0082\u0093ã\u0081«ã\u0081¡ã\u0081¯';
var unicodeString = Encoding.convert(utf8String, {
  to: 'UNICODE',
  from: 'UTF8',
  type: 'string' // Specify 'string' type. (Return as string)
});
console.log(unicodeString); // こんにちは

Following 'type' options are available:

  • 'string': Return as string.
  • 'arraybuffer': Return as ArrayBuffer.
  • 'array': Return as Array (default).
Specify BOM in UTF-16

It's possible to add the UTF16 BOM by specifying the bom option for conversion.

var utf16Array = Encoding.convert(utf8Array, {
  to: 'UTF16', // to_encoding
  from: 'UTF8', // from_encoding
  bom: true // With BOM
});

The byte order of UTF16 is big-endian by default.

Specify the 'LE' for the bom option if you want to convert as little-endian.

var utf16leArray = Encoding.convert(utf8Array, {
  to: 'UTF16', // to_encoding
  from: 'UTF8', // from_encoding
  bom: 'LE' // With BOM (little-endian)
});

You can specify UTF16LE or UTF16BE if the BOM is not required.

var utf16beArray = Encoding.convert(utf8Array, {
  to: 'UTF16BE',
  from: 'UTF8'
});

Note: UTF16, UTF16BE and UTF16LE are not JavaScript internal encodings, they are a byte arrays.

Detect character encoding (detect):

  • {string|boolean} Encoding.detect ( data [, encodings ] )
    Detect character encoding.
    @param {Array.<number>|TypedArray|string} data Target data
    @param {(string|Array.<string>)} [encodings] The encoding name that to specify the detection.
    @return {string|boolean} Return the detected character encoding, or false.
// Detect character encoding automatically. (AUTO detect).
var detected = Encoding.detect(utf8Array);
if (detected === 'UTF8') {
  console.log('Encoding is UTF-8');
}

// Detect character encoding by specific encoding name.
var isSJIS = Encoding.detect(sjisArray, 'SJIS');
if (isSJIS) {
  console.log('Encoding is SJIS');
}
URL Encode/Decode:
  • {string} Encoding.urlEncode ( data )
    URL(percent) encode.
    @param {Array.<number>|TypedArray} data Target data.
    @return {string} Return the encoded string.

  • {Array.<number>} Encoding.urlDecode ( string )
    URL(percent) decode.
    @param {string} string Target data.
    @return {Array.<number>} Return the decoded array.

// URL encode to an array that has character code.
var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205, 129,
  65, 130, 217, 130, 176, 129, 153, 130, 210, 130, 230
];

var encoded = Encoding.urlEncode(sjisArray);
console.log(encoded);
// output:
// '%82%B1%82%F1%82%C9%82%BF%82%CD%81A%82%D9%82%B0%81%99%82%D2%82%E6'

var decoded = Encoding.urlDecode(encoded);
console.log(decoded);
// output: [
//   130, 177, 130, 241, 130, 201, 130, 191, 130, 205, 129,
//    65, 130, 217, 130, 176, 129, 153, 130, 210, 130, 230
// ]
Base64 Encode/Decode:
  • {string} Encoding.base64Encode ( data )
    Base64 encode.
    @param {Array.<number>|TypedArray} data Target data.
    @return {string} Return the Base64 encoded string.

  • {Array.<number>} Encoding.base64Decode ( string )
    Base64 decode.
    @param {string} string Target data.
    @return {Array.<number>} Return the Base64 decoded array.

var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205
];
var encoded = Encoding.base64Encode(sjisArray);
console.log(encoded); // 'grGC8YLJgr+CzQ=='

var decoded = Encoding.base64Decode(encoded);
console.log(decoded);
// [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]

Example:

Example using the XMLHttpRequest and Typed arrays (Uint8Array):

This sample reads the text file written in Shift_JIS as binary data, and displays a string that is converted to Unicode by Encoding.convert.

var req = new XMLHttpRequest();
req.open('GET', '/my-shift_jis.txt', true);
req.responseType = 'arraybuffer';

req.onload = function (event) {
  var buffer = req.response;
  if (buffer) {
    // Shift_JIS Array
    var sjisArray = new Uint8Array(buffer);

    // Convert encoding to UNICODE (JavaScript Unicode Array).
    var unicodeArray = Encoding.convert(sjisArray, {
      to: 'UNICODE',
      from: 'SJIS'
    });

    // Join to string.
    var unicodeString = Encoding.codeToString(unicodeArray);
    console.log(unicodeString);
  }
};

req.send(null);
Convert encoding for file using the File APIs:

Reads file using the File APIs.
Detect file encoding and convert to Unicode, and display it.

<input type="file" id="file">
<div id="encoding"></div>
<textarea id="result" rows="5" cols="80"></textarea>

<script>
function onFileSelect(event) {
  var file = event.target.files[0];

  var reader = new FileReader();
  reader.onload = function(e) {
    var codes = new Uint8Array(e.target.result);
    var encoding = Encoding.detect(codes);
    document.getElementById('encoding').textContent = encoding;

    // Convert encoding to unicode
    var unicodeString = Encoding.convert(codes, {
      to: 'unicode',
      from: encoding,
      type: 'string'
    });
    document.getElementById('result').value = unicodeString;
  };

  reader.readAsArrayBuffer(file);
}

document.getElementById('file').addEventListener('change', onFileSelect, false);
</script>

Demo

Example of the character encoding conversion:
var eucjpArray = [
  164, 179, 164, 243, 164, 203, 164, 193, 164, 207, 161,
  162, 164, 219, 164, 178, 161, 249, 164, 212, 164, 232
];

var utf8Array = Encoding.convert(eucjpArray, {
  to: 'UTF8',
  from: 'EUCJP'
});
console.log( utf8Array );
// output: [
//   227, 129, 147, 227, 130, 147, 227, 129, 171,
//   227, 129, 161, 227, 129, 175, 227, 128, 129,
//   227, 129, 187, 227, 129, 146, 226, 152, 134,
//   227, 129, 180, 227, 130, 136
// ]
//   => 'こんにちは、ほげ☆ぴよ'
Example of converting a character code by automatic detection (Auto detect):
var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205, 129,
   65, 130, 217, 130, 176, 129, 153, 130, 210, 130, 230
];
var unicodeArray = Encoding.convert(sjisArray, {
  to: 'UNICODE',
  from: 'AUTO'
});
// codeToString is a utility method that Joins a character code array to string.
console.log( Encoding.codeToString(unicodeArray) );
// output: 'こんにちは、ほげ☆ぴよ'

Utilities

  • {string} Encoding.codeToString ( {Array.<number>|TypedArray} data )
    Joins a character code array to string.

  • {Array.<number>} Encoding.stringToCode ( {string} string )
    Splits string to an array of character codes.

Japanese Zenkaku/Hankaku

  • {Array.<number>|string} Encoding.toHankakuCase ( {Array.<number>|string} data )
    Convert the ascii symbols and alphanumeric characters to the zenkaku symbols and alphanumeric characters.

  • {Array.<number>|string} Encoding.toZenkakuCase ( {Array.<number>|string} data )
    Convert to the zenkaku symbols and alphanumeric characters from the ascii symbols and alphanumeric characters.

  • {Array.<number>|string} Encoding.toHiraganaCase ( {Array.<number>|string} data )
    Convert to the zenkaku hiragana from the zenkaku katakana.

  • {Array.<number>|string} Encoding.toKatakanaCase ( {Array.<number>|string} data )
    Convert to the zenkaku katakana from the zenkaku hiragana.

  • {Array.<number>|string} Encoding.toHankanaCase ( {Array.<number>|string} data )
    Convert to the hankaku katakana from the zenkaku katakana.

  • {Array.<number>|string} Encoding.toZenkanaCase ( {Array.<number>|string} data )
    Convert to the zenkaku katakana from the hankaku katakana.

  • {Array.<number>|string} Encoding.toHankakuSpace ({Array.<number>|string} data )
    Convert the em space(U+3000) to the single space(U+0020).

  • {Array.<number>|string} Encoding.toZenkakuSpace ( {Array.<number>|string} data )
    Convert the single space(U+0020) to the em space(U+3000).

Demo

Contributing

We're waiting for your pull requests and issues. Don't forget to execute npm run test before requesting. Accepted only requests without errors.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].