All Projects → JuliaData → PooledArrays.jl

JuliaData / PooledArrays.jl

Licence: other
A pooled representation for arrays with few unique elements

Programming Languages

julia
2034 projects

PooledArrays.jl

CI codecov deps version pkgeval

A pooled representation of arrays for purposes of compression when there are few unique elements.

Installation: at the Julia REPL, import Pkg; Pkg.add("PooledArrays")

Usage:

Working with PooledArray objects does not differ from working with general AbstractArray objects, with two exceptions:

  • If you hold mutable objects in PooledArray it is not allowed to modify them after they are stored in it.
  • In multi-threaded context it is not safe to assign values that are not already present in a PooledArray's pool from one thread while either reading or writing to the same array from another thread.

Keeping in mind these two restrictions, as a user, the only thing you need to learn is how to create PooledArray objects. This is accomplished by passing an AbstractArray to the PooledArray constructor:

julia> using PooledArrays

julia> PooledArray(["a" "b"; "c" "d"])
2×2 PooledMatrix{String, UInt32, Matrix{UInt32}}:
 "a"  "b"
 "c"  "d"

PooledArray performs compression by storing an array of reference integers and a mapping from integers to its elements in a dictionary. In this way, if the size of the reference integer is smaller than the size of the actual elements the resulting PooledArray has a smaller memory footprint than the equivalent Array. By default UInt32 is used as a type of reference integers. However, you can specify the reference integer type you want to use by passing it as a second argument to the constructor. This is usually done when you know that you will have only a few unique elements in the PooledArray.

julia> PooledArray(["a", "b", "c", "d"], UInt8)
4-element PooledVector{String, UInt8, Vector{UInt8}}:
 "a"
 "b"
 "c"
 "d"

Alternatively you can pass the compress and signed keyword arguments to the PooledArray constructor to automatically select the reference integer type. When you pass compress=true then the reference integer type is chosen to be the smallest type that is large enough to hold all unique values in array. When you pass signed=true the reference type is signed (by default it is unsigned).

julia> PooledArray(["a", "b", "c", "d"]; compress=true, signed=true)
4-element PooledVector{String, Int8, Vector{Int8}}:
 "a"
 "b"
 "c"
 "d"

Maintenance: PooledArrays is maintained collectively by the JuliaData collaborators. Responsiveness to pull requests and issues can vary, depending on the availability of key collaborators.

Related Packages

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].