Evaluate the fidelity of a compressed posterior — evaluate

Compares draws regenerated from a compressed posterior against a reference set of draws (typically the original MCMC output) using two distribution-free, backend-independent two-sample diagnostics:

evaluate_compression(
  comp,
  reference_draws,
  metric = c("energy", "c2st"),
  n_eval = NULL,
  max_n = 2000L,
  n_self_reps = 20L,
  classifier = c("ranger", "knn"),
  cv_folds = 5L,
  seed = NULL,
  verbose = FALSE
)

# S3 method for class 'compression_fidelity'
print(x, ...)

Arguments

comp: A posterior_compressed object (or a path to an .rds file containing one).
reference_draws: A draws matrix, data.frame, or posterior::draws_* object whose columns include all parameters in comp. Treated as the ground-truth distribution.
metric: Character vector of metrics to compute. One or more of "energy" and "c2st". Default: both.
n_eval: Integer number of draws to regenerate from comp. Defaults to min(max_n, nrow(reference_draws)).
max_n: Cap on the number of points used in any pairwise distance / classifier computation. Both samples are subsampled to at most this many rows. Default 2000.
n_self_reps: Number of self-baseline replicates for the energy metric. Default 20.
classifier: Classifier for C2ST. "ranger" (default) uses a random forest (ranger::ranger()). "knn" uses a k-NN probability estimate instead (no random forest).
cv_folds: Cross-validation folds for C2ST. Default 5.
seed: Optional integer seed for reproducibility.
verbose: Logical; print progress messages.
x: A compression_fidelity object.
...: Currently unused.

Value

An S3 object of class compression_fidelity containing:

reproduction_pct: Headline reproduction score in [0, 100], averaged across requested metrics.
metrics: Named list with detailed per-metric results.
n_reference, n_eval, n_params: Sample sizes used.

Details

Energy distance (Szekely & Rizzo, 2013): a scale-invariant distance between two samples that is zero iff their distributions match. The raw distance is anchored against a self-baseline obtained by bootstrapping reference_draws to two independent samples matched in size to the reconstructed sample. The 90th percentile of this baseline distribution defines the Monte Carlo noise envelope; the reproduction score is 100 * min(1, noise_envelope / max(distance, noise_envelope)), so 100\ envelope and the score decays as 1/ratio for larger distances.
Classifier two-sample test (C2ST) (Lopez-Paz & Oquab, 2017): trains a classifier (random forest via ranger::ranger() under cross-validation to discriminate reference draws from reconstructed draws and reports the out-of-fold ROC AUC. AUC = 0.5 means the two samples are indistinguishable; the score is mapped to 100 * (1 - 2 * |AUC - 0.5|).

Both metrics are computed only from samples and are independent of the compression backend (mclust, mvdens_gmm, mvdens_kde), so the score is not "circular": it does not depend on the same density model that produced the compression.

For a strict out-of-sample evaluation, hold out a fraction of the draws before fitting:


  idx  <- sample.int(nrow(draws), size = 0.8 * nrow(draws))
  comp <- compress_posterior(draws[idx, ], method = "mclust")
  evaluate_compression(comp, reference_draws = draws[-idx, ])

References

Szekely, G. J. & Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8), 1249-1272.

Lopez-Paz, D. & Oquab, M. (2017). Revisiting Classifier Two-Sample Tests. ICLR.

Examples

set.seed(1)
draws <- matrix(rnorm(2000 * 3), ncol = 3,
                dimnames = list(NULL, c("alpha", "beta", "sigma")))
comp <- compress_posterior(draws, method = "mclust", n_components = 2)
#> ℹ mclust: trying all 14 covariance models c(EII, VII, EEI, VEI, EVI, VVI, EEE, VEE, EVE, VVE, EEV, VEV, EVV, VVV) and picking the best by BIC (n = 2000, d = 3).
#> ℹ mclust: selected model 'EII' with G = 2 (BIC = -17,298.09) out of 14 candidate models: EII, VII, EEI, VEI, EVI, VVI, EEE, VEE, EVE, VVE, EEV, VEV, EVV, VVV.
fidelity <- evaluate_compression(comp, reference_draws = draws, seed = 1L)
fidelity
#> <compression_fidelity>
#>   method        : mclust
#>   parameters    : 3
#>   reference n   : 2000
#>   eval n        : 2000
#>   ----------------------------------------
#>   energy        : 100.0% reproduction
#>     distance    : 0.0417   noise envelope (q90): 0.0613   ratio: 0.68x
#>   C2ST          :  98.7% reproduction
#>     AUC         : 0.506   classifier: ranger   cv_folds: 5
#>   ----------------------------------------
#>   reproduction  : 99.4%