All Projects → sharkdp → content_inspector

sharkdp / content_inspector

Licence: Apache-2.0, MIT licenses found Licenses found Apache-2.0 LICENSE-APACHE MIT LICENSE-MIT
Fast inspection of binary buffers to guess/determine the type of content

Programming Languages

rust
11053 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to content inspector

readtext
an R package for reading text files
Stars: ✭ 102 (+264.29%)
Mutual labels:  encoding, text
Scodec
Scala combinator library for working with binary data
Stars: ✭ 709 (+2432.14%)
Mutual labels:  encoding, binary
Lingo
Text encoding for modern C++
Stars: ✭ 28 (+0%)
Mutual labels:  encoding, text
Render
Go package for easily rendering JSON, XML, binary data, and HTML templates responses.
Stars: ✭ 1,562 (+5478.57%)
Mutual labels:  binary, text
Js Codepage
💱 Codepages for JS
Stars: ✭ 119 (+325%)
Mutual labels:  encoding, text
sirdez
Glorious Binary Serialization and Deserialization for TypeScript.
Stars: ✭ 20 (-28.57%)
Mutual labels:  encoding, binary
Pbf
A low-level, lightweight protocol buffers implementation in JavaScript.
Stars: ✭ 618 (+2107.14%)
Mutual labels:  encoding, binary
Snodge
Randomly mutate JSON, XML, HTML forms, text and binary data for fuzz testing
Stars: ✭ 121 (+332.14%)
Mutual labels:  binary, text
Binary
Generic and fast binary serializer for Go
Stars: ✭ 86 (+207.14%)
Mutual labels:  encoding, binary
Bitmatch
A Rust crate that allows you to match, bind, and pack the individual bits of integers.
Stars: ✭ 82 (+192.86%)
Mutual labels:  encoding, binary
sia
Sia - Binary serialisation and deserialisation
Stars: ✭ 52 (+85.71%)
Mutual labels:  encoding, binary
jomini
Low level, performance oriented parser for save and game files from EU4, CK3, HOI4, Vic3, Imperator, and other PDS titles.
Stars: ✭ 40 (+42.86%)
Mutual labels:  binary, text
Bincode
A binary encoder / decoder implementation in Rust.
Stars: ✭ 1,100 (+3828.57%)
Mutual labels:  encoding, binary
Phpasn1
A PHP library to encode and decode arbitrary ASN.1 structures using ITU-T X.690 encoding rules.
Stars: ✭ 136 (+385.71%)
Mutual labels:  encoding, binary
ronin-support
A support library for Ronin. Like activesupport, but for hacking!
Stars: ✭ 23 (-17.86%)
Mutual labels:  encoding, binary
vue-scrollin
🎰 Scroll-in text component for Vue
Stars: ✭ 61 (+117.86%)
Mutual labels:  text
harmony-ecs
A small archetypal ECS focused on compatibility and performance
Stars: ✭ 33 (+17.86%)
Mutual labels:  binary
classy
Super simple text classifier using Naive Bayes. Plug-and-play, no dependencies
Stars: ✭ 12 (-57.14%)
Mutual labels:  text
Bois
Salar.Bois is a compact, fast and powerful binary serializer for .NET Framework. With Bois you can serialize your existing objects with almost no change.
Stars: ✭ 53 (+89.29%)
Mutual labels:  binary
joern
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs
Stars: ✭ 968 (+3357.14%)
Mutual labels:  binary

content_inspector

Crates.io Documentation

A simple library for fast inspection of binary buffers to guess the type of content.

This is mainly intended to quickly determine whether a given buffer contains "binary" or "text" data. Programs like grep or git diff use similar mechanisms to decide whether to treat some files as "binary data" or not.

The analysis is based on a very simple heuristic: Searching for NULL bytes (indicating "binary" content) and the detection of special byte order marks (indicating a particular kind of textual encoding). Note that this analysis can fail. For example, even if unlikely, UTF-8-encoded text can legally contain NULL bytes. Conversely, some particular binary formats (like binary PGM) may not contain NULL bytes. Also, for performance reasons, only the first 1024 bytes are checked for the NULL-byte (if no BOM was detected).

If this library reports a certain type of encoding (say UTF_16LE), there is no guarantee that the binary buffer can actually be decoded as UTF-16LE.

Usage

use content_inspector::{ContentType, inspect};

assert_eq!(ContentType::UTF_8, inspect(b"Hello"));
assert_eq!(ContentType::BINARY, inspect(b"\xFF\xE0\x00\x10\x4A\x46\x49\x46\x00"));

assert!(inspect(b"Hello").is_text());

CLI example

This crate also comes with a small example command-line program (see examples/inspect.rs) that demonstrates the usage:

> inspect
USAGE: inspect FILE [FILE...]

> inspect testdata/*
testdata/create_text_files.py: UTF-8
testdata/file_sources.md: UTF-8
testdata/test.jpg: binary
testdata/test.pdf: binary
testdata/test.png: binary
testdata/text_UTF-16BE-BOM.txt: UTF-16BE
testdata/text_UTF-16LE-BOM.txt: UTF-16LE
testdata/text_UTF-32BE-BOM.txt: UTF-32BE
testdata/text_UTF-32LE-BOM.txt: UTF-32LE
testdata/text_UTF-8-BOM.txt: UTF-8-BOM
testdata/text_UTF-8.txt: UTF-8

If you only want to detect whether something is a binary or text file, this is about a factor of 250 faster than file --mime ....

License

Licensed under either of

at your option.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].