Compares draws regenerated from a compressed posterior against a reference set of draws (typically the original MCMC output) using two distribution-free, backend-independent two-sample diagnostics:
A posterior_compressed object (or a path to an .rds
file containing one).
A draws matrix, data.frame, or
posterior::draws_* object whose columns include all parameters
in comp. Treated as the ground-truth distribution.
Character vector of metrics to compute. One or more of
"energy" and "c2st". Default: both.
Integer number of draws to regenerate from comp.
Defaults to min(max_n, nrow(reference_draws)).
Cap on the number of points used in any pairwise
distance / classifier computation. Both samples are subsampled to
at most this many rows. Default 2000.
Number of self-baseline replicates for the energy
metric. Default 20.
Classifier for C2ST. "ranger" (default) uses a
random forest (ranger::ranger()). "knn" uses a k-NN probability
estimate instead (no random forest).
Cross-validation folds for C2ST. Default 5.
Optional integer seed for reproducibility.
Logical; print progress messages.
A compression_fidelity object.
Currently unused.
An S3 object of class compression_fidelity containing:
reproduction_pctHeadline reproduction score in [0, 100],
averaged across requested metrics.
metricsNamed list with detailed per-metric results.
n_reference, n_eval, n_paramsSample sizes used.
Energy distance (Szekely & Rizzo, 2013): a
scale-invariant distance between two samples that is zero
iff their distributions match. The raw distance is anchored
against a self-baseline obtained by bootstrapping
reference_draws to two independent samples matched in
size to the reconstructed sample. The 90th percentile of
this baseline distribution defines the Monte Carlo noise
envelope; the reproduction score is
100 * min(1, noise_envelope / max(distance, noise_envelope)),
so 100\
envelope and the score decays as 1/ratio for larger
distances.
Classifier two-sample test (C2ST) (Lopez-Paz &
Oquab, 2017): trains a classifier (random forest via
ranger::ranger() under cross-validation
to discriminate reference draws from reconstructed draws and
reports the out-of-fold ROC AUC. AUC = 0.5 means the two
samples are indistinguishable; the score is mapped to
100 * (1 - 2 * |AUC - 0.5|).
Both metrics are computed only from samples and are independent of
the compression backend (mclust, mvdens_gmm, mvdens_kde), so
the score is not "circular": it does not depend on the same density
model that produced the compression.
For a strict out-of-sample evaluation, hold out a fraction of the draws before fitting:
idx <- sample.int(nrow(draws), size = 0.8 * nrow(draws))
comp <- compress_posterior(draws[idx, ], method = "mclust")
evaluate_compression(comp, reference_draws = draws[-idx, ])Szekely, G. J. & Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8), 1249-1272.
Lopez-Paz, D. & Oquab, M. (2017). Revisiting Classifier Two-Sample Tests. ICLR.
set.seed(1)
draws <- matrix(rnorm(2000 * 3), ncol = 3,
dimnames = list(NULL, c("alpha", "beta", "sigma")))
comp <- compress_posterior(draws, method = "mclust", n_components = 2)
#> ℹ mclust: trying all 14 covariance models c(EII, VII, EEI, VEI, EVI, VVI, EEE, VEE, EVE, VVE, EEV, VEV, EVV, VVV) and picking the best by BIC (n = 2000, d = 3).
#> ℹ mclust: selected model 'EII' with G = 2 (BIC = -17,298.09) out of 14 candidate models: EII, VII, EEI, VEI, EVI, VVI, EEE, VEE, EVE, VVE, EEV, VEV, EVV, VVV.
fidelity <- evaluate_compression(comp, reference_draws = draws, seed = 1L)
fidelity
#> <compression_fidelity>
#> method : mclust
#> parameters : 3
#> reference n : 2000
#> eval n : 2000
#> ----------------------------------------
#> energy : 100.0% reproduction
#> distance : 0.0417 noise envelope (q90): 0.0613 ratio: 0.68x
#> C2ST : 98.7% reproduction
#> AUC : 0.506 classifier: ranger cv_folds: 5
#> ----------------------------------------
#> reproduction : 99.4%