All Projects → velveteer → hermes

velveteer / hermes

Licence: MIT license
A Haskell library for fast, memory-efficient decoding of JSON documents using the simdjson C++ library

Programming Languages

C++
36643 projects - #6 most used programming language
haskell
3896 projects

Projects that are alternatives of or similar to hermes

Ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Stars: ✭ 13,376 (+36051.35%)
Mutual labels:  simd
Streamvbyte
Fast integer compression in C using the StreamVByte codec
Stars: ✭ 195 (+427.03%)
Mutual labels:  simd
Js
turbo.js - perform massive parallel computations in your browser with GPGPU.
Stars: ✭ 2,591 (+6902.7%)
Mutual labels:  simd
Base64 Avx512
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
Stars: ✭ 158 (+327.03%)
Mutual labels:  simd
Simdjson
Parsing gigabytes of JSON per second
Stars: ✭ 15,115 (+40751.35%)
Mutual labels:  simd
Reedsolomon
Reed-Solomon Erasure Code engine in Go, could more than 15GB/s per core
Stars: ✭ 203 (+448.65%)
Mutual labels:  simd
Thermite
Thermite SIMD: Melt your CPU
Stars: ✭ 141 (+281.08%)
Mutual labels:  simd
ternary-logic
Support for ternary logic in SSE, XOP, AVX2 and x86 programs
Stars: ✭ 21 (-43.24%)
Mutual labels:  simd
Laser
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Stars: ✭ 191 (+416.22%)
Mutual labels:  simd
Hh Suite
Remote protein homology detection suite.
Stars: ✭ 230 (+521.62%)
Mutual labels:  simd
Computelibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Stars: ✭ 2,123 (+5637.84%)
Mutual labels:  simd
Decomposed
CATransform3D manipulation made easy.
Stars: ✭ 184 (+397.3%)
Mutual labels:  simd
42 cheatsheet
Also referred to as "The C Man"
Stars: ✭ 204 (+451.35%)
Mutual labels:  simd
Compactcnncascade
A binary library for very fast face detection using compact CNNs.
Stars: ✭ 152 (+310.81%)
Mutual labels:  simd
Boost.simd
Boost SIMD
Stars: ✭ 238 (+543.24%)
Mutual labels:  simd
Ispc
Intel SPMD Program Compiler
Stars: ✭ 1,924 (+5100%)
Mutual labels:  simd
Fastnoise2
Modular node based noise generation library using SIMD, C++17 and templates
Stars: ✭ 196 (+429.73%)
Mutual labels:  simd
lsp-dsp-lib
DSP library for signal processing
Stars: ✭ 37 (+0%)
Mutual labels:  simd
Mipp
MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
Stars: ✭ 253 (+583.78%)
Mutual labels:  simd
Turbo Run Length Encoding
TurboRLE-Fastest Run Length Encoding
Stars: ✭ 212 (+472.97%)
Mutual labels:  simd

hermes

CI badge Hackage badge

A Haskell interface over the simdjson C++ library for decoding JSON documents. Hermes, messenger of the gods, was the maternal great-grandfather of Jason, son of Aeson.

Overview

This library exposes functions that can be used to write decoders for JSON documents using the simdjson On Demand API. From the simdjson On Demand design documentation:

Good applications for the On Demand API might be:

You are working from pre-existing large JSON files that have been vetted. You expect them to be well formed according to a known JSON dialect and to have a consistent layout. For example, you might be doing biomedical research or machine learning on top of static data dumps in JSON.

Both the generation and the consumption of JSON data is within your system. Your team controls both the software that produces the JSON and the software the parses it, your team knows and control the hardware. Thus you can fully test your system.

You are working with stable JSON APIs which have a consistent layout and JSON dialect.

With this in mind, Data.Hermes parsers can decode Haskell types faster than traditional Data.Aeson.FromJSON instances, especially in cases where you only need to decode a subset of the document. This is because Data.Aeson.FromJSON converts the entire document into a Data.Aeson.Value, which means memory usage increases linearly with the input size. The simdjson::ondemand API does not have this constraint because it iterates over the JSON string in memory without constructing an intermediate tree. This means decoders are truly lazy and you only pay for what you use.

For an incremental JSON parser in Haskell, see json-stream.

Usage

This library does not offer a Haskell API over the entire simdjson On Demand API. It currently binds only to what is needed for defining and running a Decoder. You can see the tests and benchmarks for example usage. simdjson::ondemand exceptions will be caught and re-thrown with enough information to troubleshoot. In the worst case you may run into a segmentation fault that is not caught, which you are encouraged to report as a bug.

Decoders

import qualified Data.ByteString as BS
import qualified Data.Hermes as H

personDecoder :: H.Decoder Person
personDecoder = H.withObject $ \obj ->
  Person
    <$> H.atKey "_id" H.text obj
    <*> H.atKey "index" H.int obj
    <*> H.atKey "guid" H.text obj
    <*> H.atKey "isActive" H.bool obj
    <*> H.atKey "balance" H.text obj
    <*> H.atKey "picture" (H.nullable H.text) obj
    <*> H.atKey "latitude" H.scientific obj

-- Decode a strict ByteString.
decodePersons :: BS.ByteString -> Either H.HermesException [Person]
decodePersons = H.decodeEither $ H.list personDecoder

Aeson Integration

While it is not recommended to use hermes if you need the full DOM, we still provide a performant interface to decode aeson Values. See an example of this in the hermes-aeson subpackage. Ideally, you could use hermes to selectively decode aeson Values on demand, for example:

> H.decodeEither (H.atPointer "/statuses/99/user/screen_name" H.hValueToAeson) twitter
Right (String "2no38mae")

Exceptions

When decoding fails for a known reason, you will get a Left HermesException indicating if the error came from simdjson or from an internal hermes call.

> decodeEither (withObject . atKey "hello" $ list text) "{ \"hello\": [\"world\", false] }"
Left (SIMDException (DocumentError {path = "/hello/1", errorMsg = "Error while getting value of type text. The JSON element does not have the requested type."))

Benchmarks

We benchmark the following operations using both hermes-json and aeson strict ByteString decoders:

  • Decode an array of 1 million 3-element arrays of doubles
  • Full decoding of a large-ish (12 MB) JSON array of Person objects
  • Partial decoding of Twitter status objects to highlight the on-demand benefits
  • Decoding entire documents into Data.Aeson.Value

Specs

  • GHC 9.4.4
  • aeson-2.1.2.1 (using Data.Aeson.Decoding) with text-2.0.2
  • Apple M1 Pro

Name Mean (ps) 2*Stdev (ps) Allocated Copied Peak Memory
All.Decode.Arrays.Hermes 267914650000 10610366160 503599934 439150544 541065216
All.Decode.Arrays.Aeson 2214928800000 160279563772 7094759111 2392723388 1166016512
All.Decode.Persons.Hermes 47338175000 4290343628 144901928 57032737 1166016512
All.Decode.Persons.Aeson 132864400000 9509102680 357269946 188529742 1166016512
All.Decode.Partial Twitter.Hermes 241083593 13856196 348540 3088 1166016512
All.Decode.Partial Twitter.JsonStream 2116192187 158907568 15259526 273821 1166016512
All.Decode.Partial Twitter.Aeson 4254060937 262619196 12538003 4634594 1166016512
All.Decode.Persons (Aeson Value).Hermes 106420425000 3747538126 303886293 135388183 1166016512
All.Decode.Persons (Aeson Value).Aeson 119489550000 9713032080 286148916 177027852 1166016512
All.Decode.Twitter (Aeson Value).Hermes 4164246875 240020934 12368752 4149211 1166016512
All.Decode.Twitter (Aeson Value).Aeson 4810817187 345165042 12539421 5527424 1166016512

Performance Tips

  • Use text >= 2.0 to benefit from its UTF-8 implementation.
  • Decode to Text instead of String wherever possible!
  • Decode to Int or Double instead of Scientific if you can.
  • Decode your object fields in order. If encoding with aeson, you can leverage toEncoding to enforce ordering.

If you need to decode in tight loops or long-running processes (like a server), consider using the withHermesEnv/mkHermesEnv and parseByteString functions instead of decodeEither. This ensures the simdjson instances are not re-created on each decode. Please see the simdjson performance docs for more info. But please ensure that you use one HermesEnv per thread, as simdjson is single-threaded by default.

Limitations

Because the On Demand API uses a forward-only iterator (except for object fields), you must be mindful to not access values out of order. This library tries to prevent this as much as possible, i.e. making Decoder Value impossible.

Because the On Demand API does not validate the entire document upon creating the iterator (besides UTF-8 validation and basic well-formed checks), it is possible to parse an invalid JSON document but not realize it until later. If you need the entire document to be validated up front then a DOM parser is a better fit for you.

The On Demand approach is less safe than DOM: we only validate the components of the JSON document that are used and it is possible to begin ingesting an invalid document only to find out later that the document is invalid. Are you fine ingesting a large JSON document that starts with well formed JSON but ends with invalid JSON content?

This library currently cannot decode scalar documents, e.g. a single string, number, boolean, or null as a JSON document.

Portability

Per the simdjson documentation:

A recent compiler (LLVM clang6 or better, GNU GCC 7.4 or better, Xcode 11 or better) on a 64-bit (PPC, ARM or x64 Intel/AMD) POSIX systems such as macOS, freeBSD or Linux. We require that the compiler supports the C++11 standard or better.

However, this library relies on std::string_view without a shim, so C++17 or better is highly recommended.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].