MultiKDE
A kernel density estimation library, what make this one different from other Julia KDE libraries are:
- Multidimension: Using product kernel to estimate multi-dimensional kernel density.
- Lazy evaluation: Doesn't pre-initialize a KDE, only evaluate points when necessary.
- Categorical distribution: This library supports categorical KDE using two specific kernel functions Wang-Ryzin and Aitchson-Aitken, in which the former one is for categorical distribution that is ordered (age, amount...), the latter is for categorical distribution that is unordered (sex, the face of the coin...). When using unordered categorical distribution, non-numeric objects are also supported.
Use
[notebook]
ExampleOne-dimension KDE
using MultiKDE
using Distributions, Random, Plots
# Simulation
bws = [0.05 0.1 0.5]
d = Normal(0, 1)
observations = rand(d, 50)
granularity_1d = 100
x = Vector(LinRange(minimum(observations), maximum(observations), granularity_1d))
ys = []
for bw in bws
kde = KDEUniv(ContinuousDim(), bw, observations, MultiKDE.gaussian)
y = [MultiKDE.pdf(kde, _x, keep_all=false) for _x in x]
push!(ys, y)
end
# Plot
highest = maximum([maximum(y) for y in ys])
plot(x, ys, label=bws, fmt=:svg)
plot!(observations, [highest+0.05 for _ in 1:length(ys)], seriestype=:scatter, label="observations", size=(900, 450), legend=:outertopright)
Multi-dimension KDE
using MultiKDE
using Distributions, Random, Plots
# Simulation
dims = [ContinuousDim(), ContinuousDim()]
bws = [[0.3, 0.3], [0.5, 0.5], [1, 1]]
mn = MvNormal([0, 0], [1, 1])
observations = rand(mn, 50)
observations = [observations[:, i] for i in 1:size(observations, 2)]
observations_x1 = [_obs[1] for _obs in observations]
observations_x2 = [_obs[2] for _obs in observations]
granularity_2d = 100
x1_range = LinRange(minimum(observations_x1), maximum(observations_x1), granularity_2d)
x2_range = LinRange(minimum(observations_x2), maximum(observations_x2), granularity_2d)
x_grid = [[_x1, _x2] for _x1 in x1_range for _x2 in x2_range]
y_grid = []
for bw in bws
kde = KDEMulti(dims, bw, observations)
y = [MultiKDE.pdf(kde, _x) for _x in x_grid]
push!(y_grid, y)
end
# Plot
highest = maximum([maximum(y) for y in y_grid])
plot([_x[1] for _x in x_grid], [_x[2] for _x in x_grid], y_grid, label=[bw[1] for bw in bws][:, :]', size=(900, 450), legend=:outertopright)
plot!(observations_x1, observations_x2, [highest for _ in 1:length(observations)], seriestype=:scatter, label="observations")
Post
MultiKDE.jl: A Lazy Evaluation Multivariate Kernel Density Estimator
Liscense
Licensed under MIT Liscense.