Evaluation
Model-comparison evidence margins
pyrecest.evaluation.model_comparison contains lightweight, dataframe-oriented
helpers for comparing model-evidence tables. The helpers are intentionally
domain-neutral: callers provide the model labels, grouping columns, thresholds,
and cluster/group identifiers.
Useful entry points include:
paired_model_margin_decisions, which compares a positive model against a reference model and separates positive claims, reference claims, and ambiguous events using a symmetric log-evidence threshold;paired_model_margin_summary, which summarizes raw wins, confident claims, ambiguous events, and mean/median log-evidence margins;paired_model_margin_threshold_sweep, which evaluates the paired decision rule over multiple candidate thresholds;select_paired_model_margin_threshold, which selects the smallest threshold satisfying synthetic false-positive and recall constraints;leave_one_group_out_summary, a generic leave-one-group-out wrapper for grouped robustness checks;cluster_bootstrap_margin_summary, which returns cluster-resampled uncertainty intervals for raw-win fractions, claim fractions, and evidence margins; andgrouped_claim_gate_summary, which summarizes whether every group satisfies majority/no-forbidden-claim/positive-margin gates.
These utilities are useful for model comparison, parameter selection, and paper-quality diagnostics whenever multiple filters, smoothers, or trackers emit comparable log marginal likelihoods.
Point-set geometry metrics
pyrecest.evaluation.point_set_metrics contains deterministic, NumPy/SciPy
helpers for evaluating sampled shapes, point-cloud estimates, and
extended-object geometry diagnostics. The functions provide nearest-neighbor
distances, symmetric Chamfer-L1/L2 distances, threshold precision/recall/F-score,
distance quantiles, and reproducible subsampling.
These helpers are intended for evaluation pipelines rather than differentiable
model code. They use scipy.spatial.cKDTree when available and a deterministic
chunked dense fallback otherwise.
Pareto and equal-quality selection
pyrecest.evaluation.pareto contains small dataframe-oriented utilities for
rate--distortion and equal-quality comparisons. Callers provide objective
columns and objective directions, so the helpers are intentionally domain-neutral
and can be used for particle-count, runtime, storage-size, or accuracy/quality
trade-offs.
Useful entry points include pareto_front_indices, is_pareto_front,
record_dominates, constraint_mask, select_under_constraints, and
equal_quality_selection.
Implicit-surface helpers
pyrecest.evaluation.implicit_surfaces contains lightweight helpers for
backend-neutral scalar-field and implicit-surface evaluation. These helpers cover
residual extraction through structural value(points) objects, surface-band
masks, inside/outside classification, and surface-band probabilities from signed
distance means and standard deviations. They are useful for shape-estimation and
extended-object diagnostics without requiring implementations to inherit from a
PyRecEst base class.
Protected-tail selection helpers
pyrecest.evaluation.selection contains deterministic, domain-neutral helpers
for selecting a fixed-size subset under reliability or confidence constraints.
The helpers are useful when an evaluation or ablation should preserve a bounded
number of low-reliability hypotheses, measurements, particles, or shape samples
while still ranking each region by a primary score. They intentionally avoid
domain-specific names such as visibility, splats, or rendering; callers provide
the primary scores, tail scores, reliability scores, retention fractions, and
tail quantiles.
__all__ = ['generate_groundtruth', 'generate_measurements', 'simulation_database', 'check_and_fix_config', 'configure_for_filter', 'perform_predict_update_cycles', 'iterate_configs_and_runs', 'determine_all_deviations', 'get_axis_label', 'get_distance_function', 'get_extract_mean', 'summarize_filter_results', 'generate_simulated_scenarios', 'plot_results', 'evaluate_for_file', 'evaluate_for_simulation_config', 'evaluate_for_variables', 'classify_inside_outside', 'surface_band_mask', 'surface_band_probability_from_signed_distance', 'surface_gradients', 'surface_residuals', 'surface_variances', 'add_evidence_margin_columns', 'classify_evidence_margin', 'cluster_bootstrap_margin_summary', 'evidence_margin_table', 'grouped_claim_gate_summary', 'grouped_paired_model_margin_summary', 'infer_paired_model_group_cols', 'leave_one_group_out_summary', 'paired_model_margin_decisions', 'paired_model_margin_summary', 'paired_model_margin_threshold_sweep', 'select_paired_model_margin_threshold', 'constraint_mask', 'equal_quality_selection', 'is_pareto_front', 'pareto_front_indices', 'record_dominates', 'select_under_constraints', 'as_point_set', 'chamfer_distance', 'deterministic_subsample', 'distance_quantiles', 'nearest_neighbor_distances', 'point_set_geometry_summary', 'precision_recall_curve', 'precision_recall_fscore', 'protected_tail_topk_mask', 'quantile_tail_mask', 'quantile_tail_threshold', 'retained_count_from_fraction', 'sanitized_score_vector', 'tail_rescue_quota_count', 'tail_rescue_topk_mask', 'top_count_mask', 'top_fraction_mask']
module-attribute
classify_inside_outside(values, *, negative_inside=True)
Classify signed scalar-field values as inside/outside.
Returns -1 for inside, +1 for outside, and 0 for exact zeros or
non-finite values. Set negative_inside=False for the opposite sign
convention.
surface_band_mask(values, threshold)
Return a mask for values within [-threshold, threshold].
surface_band_probability_from_signed_distance(distance, distance_std, epsilon, *, min_std=0.0001, normal_cdf=None)
Probability that a normal signed distance lies within a surface band.
surface_gradients(surface, points)
Return scalar-field gradients for points via surface.gradient.
surface_residuals(surface, points)
Return scalar-field residuals for points via surface.value.
surface_variances(surface, points)
Return predictive field variances for points via surface.variance_at.
add_evidence_margin_columns(scores, *, group_cols=('session', 'event_index'))
Merge event-level evidence-margin diagnostics back into score rows.
classify_evidence_margin(delta_log_evidence)
Classify a non-negative log-evidence margin into qualitative buckets.
cluster_bootstrap_margin_summary(frame, *, cluster_col, delta_col, positive_claim_col, n_bootstrap=DEFAULT_BOOTSTRAP_REPLICATES, random_seed=DEFAULT_BOOTSTRAP_RANDOM_SEED)
Return cluster-bootstrap intervals for paired evidence-margin summaries.
evidence_margin_table(scores, *, group_cols=('session', 'event_index'), evidence_col='log_evidence', model_col='model')
Return one best-vs-runner-up evidence-margin row per group.
grouped_claim_gate_summary(summary, *, group_col, claim_fraction_col, forbidden_claims_col=None, mean_delta_col=None, median_delta_col=None, min_claim_fraction=0.5)
Return generic pass/fail gates for grouped claim robustness.
grouped_paired_model_margin_summary(decisions, *, group_cols)
Alias for grouped paired-margin summaries used by downstream reports.
infer_paired_model_group_cols(scores)
Infer columns that define one paired model-decision unit.
leave_one_group_out_summary(frame, *, group_col, summary_fn, held_out_col='held_out_group')
Apply a summary after holding out each value of group_col.
paired_model_margin_decisions(scores, *, positive_model, reference_model, margin_threshold=0.0, group_cols=('session', 'event_index'), evidence_col='log_evidence', model_col='model', true_model_col=None, positive_true_label=None)
Classify paired model wins using a symmetric log-evidence margin.
paired_model_margin_summary(decisions, *, group_cols=(), true_model_col=None)
Summarize a paired margin-decision table, optionally by groups.
paired_model_margin_threshold_sweep(scores, *, positive_model, reference_model, thresholds, group_cols=None, summary_group_cols=(), evidence_col='log_evidence', model_col='model', true_model_col=None, positive_true_label=None)
Summarize paired margin decisions over candidate thresholds.
select_paired_model_margin_threshold(threshold_sweep, *, max_false_positive_claims=0, min_positive_claim_recall=0.0)
Select the smallest threshold satisfying synthetic specificity gates.
constraint_mask(table, constraints, *, eps=1e-12)
Return rows satisfying all scalar constraints.
Constraint values may be either ("<=", value) tuples or mappings with
{"op": "<=", "value": value} keys.
equal_quality_selection(table, *, quality_constraints, compression_objective, compression_direction='min', tie_breakers=(), eps=1e-12)
Select candidates under fixed quality constraints.
This is a named wrapper around :func:select_under_constraints for common
equal-quality compression/compaction comparisons.
is_pareto_front(table, objectives, *, directions, feasible_mask=None, eps=1e-12, allow_missing=True)
Return a boolean Series marking non-dominated rows.
pareto_front_indices(table, objectives, *, directions, feasible_mask=None, eps=1e-12, allow_missing=True)
Return indices of non-dominated rows.
Parameters
table:
DataFrame whose rows are candidates.
objectives:
Objective column names to compare.
directions:
Either a mapping from objective name to "min"/"max" or a
sequence aligned with objectives.
feasible_mask:
Optional row mask. If supplied, only feasible rows can be on the front.
eps:
Numerical tolerance for weak/strict comparisons.
allow_missing:
If true, objective values that are missing in either row are skipped for
that pair. A row can dominate another only when at least one comparable
objective remains and one comparable objective is strictly better.
record_dominates(left, right, objectives, *, directions, eps=1e-12, allow_missing=True)
Return whether left Pareto-dominates right.
left dominates right if it is at least as good on all comparable
objectives and strictly better on at least one comparable objective.
select_under_constraints(table, *, constraints, objective, direction, tie_breakers=(), eps=1e-12)
Return feasible rows sorted by one objective plus optional tie-breakers.
as_point_set(points, *, name='points', expected_dim=None)
Return points as a finite float64 array with shape (N, D).
Parameters
points: Array-like point set. name: Name used in validation error messages. expected_dim: Optional required point dimensionality.
chamfer_distance(points_a, points_b, *, squared=False, symmetric=True, query_chunk_size=_DEFAULT_CHUNK_SIZE)
Return directed or symmetric Chamfer distance between two point sets.
symmetric=True returns the sum of the two directed mean nearest-neighbor
distances, matching the convention used by common 3D reconstruction
diagnostics. squared=True applies the same convention to squared
nearest-neighbor distances.
deterministic_subsample(points, *, max_points, seed=0)
Return a deterministic random subset of points and its indices.
If max_points is None, non-positive, or at least the number of
points, the original order is preserved and all indices are returned.
Otherwise, max_points unique indices are sampled without replacement and
sorted so downstream computations are reproducible and stable under
independent chunk sizes.
distance_quantiles(query, reference, *, quantiles=(0.5, 0.9, 0.95, 0.99), query_chunk_size=_DEFAULT_CHUNK_SIZE)
Return quantiles of directed nearest-neighbor distances.
nearest_neighbor_distances(query, reference, *, query_chunk_size=_DEFAULT_CHUNK_SIZE, return_indices=False)
Distance from each query point to its nearest reference point.
The function uses :class:scipy.spatial.cKDTree when available and falls
back to a deterministic chunked dense implementation otherwise.
point_set_geometry_summary(estimate_points, reference_points, *, thresholds=(0.1,), query_chunk_size=_DEFAULT_CHUNK_SIZE)
Return standard directed/symmetric point-set geometry metrics.
The summary keys follow reconstruction terminology:
accuracy_*: estimate-to-reference nearest-neighbor distances.completion_*: reference-to-estimate nearest-neighbor distances.chamfer_l1: sum of directed mean distances.chamfer_l2: sum of directed mean squared distances.
precision_recall_curve(estimate_points, reference_points, thresholds, *, query_chunk_size=_DEFAULT_CHUNK_SIZE)
Return precision/recall/F-score rows for multiple thresholds.
precision_recall_fscore(estimate_points, reference_points, threshold, *, query_chunk_size=_DEFAULT_CHUNK_SIZE)
Return precision, recall, and F-score for a distance threshold.
protected_tail_topk_mask(primary_scores, tail_scores, reliability_scores, retention_fraction, *, tail_quantile, tail='lower', min_count=1, sanitize_nonnegative=True)
Select a fixed-size subset while preserving proportional tail capacity.
The candidate set is split at tail_quantile of reliability_scores.
The tail receives the same retention fraction as the full set, ranked by
tail_scores. The complement receives the remaining retained budget,
ranked by primary_scores. If the tail is empty the function falls back
to ordinary top-fraction selection by primary_scores; if the complement
is empty, all candidates are ranked by tail_scores.
quantile_tail_mask(reliability_scores, quantile, *, tail='lower', sanitize_nonnegative=True)
Return a boolean mask for the lower or upper reliability-score tail.
quantile_tail_threshold(reliability_scores, quantile, *, tail='lower', sanitize_nonnegative=True)
Return the threshold separating a reliability-score quantile tail.
retained_count_from_fraction(item_count, retention_fraction, *, min_count=1)
Convert a retention fraction to a deterministic retained count.
min_count is applied only when item_count > 0 and
retention_fraction > 0. Set min_count=0 for exact zero-retention
behavior.
sanitized_score_vector(values, *, nonnegative=True)
Return a finite one-dimensional float64 score vector.
Non-finite entries are mapped to zero. By default negative values are also clipped to zero, matching the common interpretation of scores as confidence, reliability, probability, or non-negative utility.
tail_rescue_quota_count(retained_count, *, rescue_fraction)
Return a bounded tail-rescue quota inside a retained budget.
tail_rescue_topk_mask(primary_scores, tail_scores, reliability_scores, retention_fraction, *, tail_quantile, rescue_fraction, tail='lower', min_count=1, sanitize_nonnegative=True)
Top-k selection with a bounded quota rescued from a reliability tail.
top_count_mask(scores, retained_count, *, tie_break_scores=None, largest=True, sanitize_nonnegative=True)
Return a deterministic mask selecting retained_count score entries.
Ties are resolved by the optional tie_break_scores and then by increasing
original index, making the selection reproducible across NumPy versions.
top_fraction_mask(scores, retention_fraction, *, tie_break_scores=None, largest=True, min_count=1, sanitize_nonnegative=True)
Return a top-k mask whose size is derived from a retention fraction.