All Projects → evincarofautumn → protodata

evincarofautumn / protodata

Licence: other
A textual language for binary data.

Programming Languages

C++
36643 projects - #6 most used programming language
shell
77523 projects
Makefile
30231 projects
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to protodata

Binaryserializer
A declarative serialization framework for controlling formatting of data at the byte and bit level using field bindings, converters, and code.
Stars: ✭ 197 (+462.86%)
Mutual labels:  serialization, binary
parco
🏇🏻 generalist, fast and tiny binary parser and compiler generator, powered by Go 1.18+ Generics
Stars: ✭ 57 (+62.86%)
Mutual labels:  serialization, binary
Chronicle Wire
A Java Serialisation Library that supports multiple formats
Stars: ✭ 204 (+482.86%)
Mutual labels:  serialization, binary
Bincode
A binary encoder / decoder implementation in Rust.
Stars: ✭ 1,100 (+3042.86%)
Mutual labels:  serialization, binary
Bois
Salar.Bois is a compact, fast and powerful binary serializer for .NET Framework. With Bois you can serialize your existing objects with almost no change.
Stars: ✭ 53 (+51.43%)
Mutual labels:  serialization, binary
Binary
Generic and fast binary serializer for Go
Stars: ✭ 86 (+145.71%)
Mutual labels:  serialization, binary
GenericProtocol
⚡️ A fast TCP event based buffered server/client protocol for transferring data over the (inter)net in .NET 🌐
Stars: ✭ 38 (+8.57%)
Mutual labels:  serialization, binary
Fastbinaryencoding
Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby, Swift
Stars: ✭ 421 (+1102.86%)
Mutual labels:  serialization, binary
GroBuf
Fast binary serializer
Stars: ✭ 56 (+60%)
Mutual labels:  serialization, binary
Apex.Serialization
High performance contract-less binary serializer for .NET
Stars: ✭ 82 (+134.29%)
Mutual labels:  serialization, binary
Beeschema
Binary Schema Library for C#
Stars: ✭ 46 (+31.43%)
Mutual labels:  serialization, binary
persistity
A persistence framework for game developers
Stars: ✭ 34 (-2.86%)
Mutual labels:  serialization, binary
Pbf
A low-level, lightweight protocol buffers implementation in JavaScript.
Stars: ✭ 618 (+1665.71%)
Mutual labels:  serialization, binary
Borer
Efficient CBOR and JSON (de)serialization in Scala
Stars: ✭ 131 (+274.29%)
Mutual labels:  serialization, binary
Iguana
universal serialization engine
Stars: ✭ 481 (+1274.29%)
Mutual labels:  serialization, binary
surge
Simple, specialised, and efficient binary marshaling
Stars: ✭ 36 (+2.86%)
Mutual labels:  serialization, binary
Fspickler
A fast multi-format message serializer for .NET
Stars: ✭ 299 (+754.29%)
Mutual labels:  serialization, binary
Ceras
Universal binary serializer for a wide variety of scenarios https://discord.gg/FGaCX4c
Stars: ✭ 374 (+968.57%)
Mutual labels:  serialization, binary
nason
🗜 Ultra tiny serializer / encoder with plugin-support. Useful to build binary files containing images, strings, numbers and more!
Stars: ✭ 30 (-14.29%)
Mutual labels:  serialization, binary
sirdez
Glorious Binary Serialization and Deserialization for TypeScript.
Stars: ✭ 20 (-42.86%)
Mutual labels:  serialization, binary

Protodata

Protodata is a language for describing binary data, originally written for prototyping and reverse engineering binary file formats. It is particularly useful for generating annotated data in a custom format, without having to write custom editing tools.

After Unix tradition, Protodata is designed to be lightweight and composable, and interact well with other utilities.

Building

git clone git://github.com/evincarofautumn/protodata.git
cd protodata
make
./pd --help

Protodata has been tested with G++ 4.6 and Clang 3.2 on Linux.

Examples

For the purposes of illustration, assume the following alias:

alias dump="hexdump -v -e '/1 \"%02X \"'"

Floating-point

View the representation of a floating-point number:

$ pd -e 'f64 1.0' | dump; echo

$ pd -e 'big f64 -inf' | dump; echo

Unicode

Convert from UTF-8 to different transmission formats:

$ pd -e 'utf16 "Some UTF-8 text I’d like to reëncode."' | dump; echo

$ pd -e 'ucs2 "Because who needs surrogate pairs?"' | dump; echo

$ pd -e 'u21 "Mind the bits and the bytes will look after themselves."' | dump; echo

More!

Suppose you have some raw vertex data in input.raw:

+1.0 +1.0 -1.0
+1.0 -1.0 -1.0
-1.0 -1.0 -1.0
-1.0 +1.0 -1.0
+1.0 +1.0 +1.0
-1.0 +1.0 +1.0
-1.0 -1.0 +1.0
+1.0 -1.0 +1.0

You want to pack these values as single-precision floats into output.bin in such a way that you can mmap() the output and index it as an array. If you didn’t have Protodata, you might write a one-off C program like this:

cat <<EOF | gcc -x c - && ./a.out
#include <stdio.h>
int main(int argc, char** argv) {
  FILE* in = fopen("input.raw", "r");
  FILE* out = fopen("output.bin", "w");
  float value;
  while (fscanf(in, " %f", &value) == 1)
    fwrite(&value, sizeof(value), 1, out);
  return 0;
}
EOF

To do the same thing with Protodata is straightforward:

$ pd -e 'f32' input.raw -o output.bin

Here, f32 is a Protodata command, which sets the output mode to 32-bit float. Now, both the C version and the Protodata version use platform endianness. If we wanted to change that to big-endian in the C version, we would have to detect endianness and manually swap byte order. In Protodata, however, we can simply use the big command:

$ pd -e 'big f32' input.raw -o output.bin

Like the C version, Protodata consumes input lazily and produces output eagerly, making it suitable for streaming:

$ cat input.raw | pd -e 'f32' - -o output.bin

Command-line Options

  • -h, --help

    Print a help message and exit.

  • FILE

    Read and evaluate FILE as Protodata source.

  • -e, --eval

    Evaluate a string as Protodata source.

  • -o, --output

    Specify an output file; default is standard output.

Multiple -e and FILE options may be specified; they are all concatenated in the order they appeared on the command line. This means any individual source file or -e string is allowed to contain semantically invalid Protodata source, as long as the concatenated source is semantically valid.

The Language

Features marked with a dagger (†) are subject to change.

A Protodata source file consists of a series of values and commands, delimited by whitespace and punctuated by comments. Comments begin with # and continue to the end of the line:

# A useless comment for demonstration purposes.

Source files must be encoded in UTF-8.

Values

Integer

 0      # Decimal, technically.
-12     # Decimal in the usual sense.
 020    # Still decimal.
+0b1010 # Binary.
-0o777  # Octal.
+0xFE   # Hexadecimal.

Integers with a sign character (+ or -) are interpreted as signed values—go figure—and must fit within the range of int64_t; unsigned integers are those without a sign character, and must fit within the range of uint64_t. Out-of-range values are reported as errors, e.g.:

s32 +0xFFFFFFFF

Floating-point

 0.0
-0.0
 0.5
+0.5
 6.28318
 inf
-inf
 nan
 epsilon

Floating-point values cannot be represented by utf or integer types. Unlike for integers, sign characters have no effect on floating-point representation, only sign bit.

String

""
"salut"

A string is syntactic sugar for a sequence of integer values corresponding to Unicode code points. Strings are not null-terminated; an empty string is a no-op. The following common escapes are allowed:

EscapeCharacterUnicode
\aBELU+0007
\bBSU+0008
\tHTU+0009
\nLFU+000A
\vVTU+000B
\fFFU+000C
\rCRU+000D
\eESCU+001B
\""U+0022
\\\U+005C

Commands

Size and Encoding

  • sN

    A signed integer with N bits, where 1 ≤ N ≤ 64.

  • uN

    An unsigned integer with N bits, where 1 ≤ N ≤ 64.

  • f32, f64

    A single- or double-precision floating-point number.

  • utf8, utf16, utf32, ucs2

    A Unicode code point in the given transmission format. utf32 is an alias for u32, and ucs2 for u16.

    The following produces three copies of the same (two-byte) sequence:

     u8 0xC3 0xA7
     utf8 "ç"
     utf8 231
    

If the whole output ends on a non-8-bit boundary, it will be padded with trailing zero bits. Individual values are not automatically aligned.

The default type is unsigned int, however wide int happens to be on the current platform.†

Endianness

  • native

  • little

  • big

The default endianness is native.

Endianness affects only 16-, 32-, and 64-bit types!

Compiler State

The compiler state consists of the current type and endianness. It can be saved and restored with the following commands:

CommandDescription
{Push the compiler state.
}Pop the compiler state.

These are perhaps best explained by example. Here, 10, 20, and 30 will be written as u32, while 1 through 6 will all be written as u8:

u32
10 { u8 1 }
20 { u8 2 3 }
30 { u8 4 5 6 }
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].