Evaluation

Model-comparison evidence margins

pyrecest.evaluation.model_comparison contains lightweight, dataframe-oriented helpers for comparing model-evidence tables. The helpers are intentionally domain-neutral: callers provide the model labels, grouping columns, thresholds, and cluster/group identifiers.

Useful entry points include:

  • paired_model_margin_decisions, which compares a positive model against a reference model and separates positive claims, reference claims, and ambiguous events using a symmetric log-evidence threshold;
  • paired_model_margin_summary, which summarizes raw wins, confident claims, ambiguous events, and mean/median log-evidence margins;
  • paired_model_margin_threshold_sweep, which evaluates the paired decision rule over multiple candidate thresholds;
  • select_paired_model_margin_threshold, which selects the smallest threshold satisfying synthetic false-positive and recall constraints;
  • leave_one_group_out_summary, a generic leave-one-group-out wrapper for grouped robustness checks;
  • cluster_bootstrap_margin_summary, which returns cluster-resampled uncertainty intervals for raw-win fractions, claim fractions, and evidence margins; and
  • grouped_claim_gate_summary, which summarizes whether every group satisfies majority/no-forbidden-claim/positive-margin gates.

These utilities are useful for model comparison, parameter selection, and paper-quality diagnostics whenever multiple filters, smoothers, or trackers emit comparable log marginal likelihoods.

Point-set geometry metrics

pyrecest.evaluation.point_set_metrics contains deterministic, NumPy/SciPy helpers for evaluating sampled shapes, point-cloud estimates, and extended-object geometry diagnostics. The functions provide nearest-neighbor distances, symmetric Chamfer-L1/L2 distances, threshold precision/recall/F-score, distance quantiles, and reproducible subsampling.

These helpers are intended for evaluation pipelines rather than differentiable model code. They use scipy.spatial.cKDTree when available and a deterministic chunked dense fallback otherwise.

Pareto and equal-quality selection

pyrecest.evaluation.pareto contains small dataframe-oriented utilities for rate--distortion and equal-quality comparisons. Callers provide objective columns and objective directions, so the helpers are intentionally domain-neutral and can be used for particle-count, runtime, storage-size, or accuracy/quality trade-offs.

Useful entry points include pareto_front_indices, is_pareto_front, record_dominates, constraint_mask, select_under_constraints, and equal_quality_selection.

Implicit-surface helpers

pyrecest.evaluation.implicit_surfaces contains lightweight helpers for backend-neutral scalar-field and implicit-surface evaluation. These helpers cover residual extraction through structural value(points) objects, surface-band masks, inside/outside classification, and surface-band probabilities from signed distance means and standard deviations. They are useful for shape-estimation and extended-object diagnostics without requiring implementations to inherit from a PyRecEst base class.

Protected-tail selection helpers

pyrecest.evaluation.selection contains deterministic, domain-neutral helpers for selecting a fixed-size subset under reliability or confidence constraints. The helpers are useful when an evaluation or ablation should preserve a bounded number of low-reliability hypotheses, measurements, particles, or shape samples while still ranking each region by a primary score. They intentionally avoid domain-specific names such as visibility, splats, or rendering; callers provide the primary scores, tail scores, reliability scores, retention fractions, and tail quantiles.

__all__ = ['generate_groundtruth', 'generate_measurements', 'simulation_database', 'check_and_fix_config', 'configure_for_filter', 'perform_predict_update_cycles', 'iterate_configs_and_runs', 'determine_all_deviations', 'get_axis_label', 'get_distance_function', 'get_extract_mean', 'summarize_filter_results', 'generate_simulated_scenarios', 'plot_results', 'evaluate_for_file', 'evaluate_for_simulation_config', 'evaluate_for_variables', 'classify_inside_outside', 'surface_band_mask', 'surface_band_probability_from_signed_distance', 'surface_gradients', 'surface_residuals', 'surface_variances', 'add_evidence_margin_columns', 'classify_evidence_margin', 'cluster_bootstrap_margin_summary', 'evidence_margin_table', 'grouped_claim_gate_summary', 'grouped_paired_model_margin_summary', 'infer_paired_model_group_cols', 'leave_one_group_out_summary', 'paired_model_margin_decisions', 'paired_model_margin_summary', 'paired_model_margin_threshold_sweep', 'select_paired_model_margin_threshold', 'constraint_mask', 'equal_quality_selection', 'is_pareto_front', 'pareto_front_indices', 'record_dominates', 'select_under_constraints', 'as_point_set', 'chamfer_distance', 'deterministic_subsample', 'distance_quantiles', 'nearest_neighbor_distances', 'point_set_geometry_summary', 'precision_recall_curve', 'precision_recall_fscore', 'protected_tail_topk_mask', 'quantile_tail_mask', 'quantile_tail_threshold', 'retained_count_from_fraction', 'sanitized_score_vector', 'tail_rescue_quota_count', 'tail_rescue_topk_mask', 'top_count_mask', 'top_fraction_mask'] module-attribute

classify_inside_outside(values, *, negative_inside=True)

Classify signed scalar-field values as inside/outside.

Returns -1 for inside, +1 for outside, and 0 for exact zeros or non-finite values. Set negative_inside=False for the opposite sign convention.

surface_band_mask(values, threshold)

Return a mask for values within [-threshold, threshold].

surface_band_probability_from_signed_distance(distance, distance_std, epsilon, *, min_std=0.0001, normal_cdf=None)

Probability that a normal signed distance lies within a surface band.

surface_gradients(surface, points)

Return scalar-field gradients for points via surface.gradient.

surface_residuals(surface, points)

Return scalar-field residuals for points via surface.value.

surface_variances(surface, points)

Return predictive field variances for points via surface.variance_at.

add_evidence_margin_columns(scores, *, group_cols=('session', 'event_index'))

Merge event-level evidence-margin diagnostics back into score rows.

classify_evidence_margin(delta_log_evidence)

Classify a non-negative log-evidence margin into qualitative buckets.

cluster_bootstrap_margin_summary(frame, *, cluster_col, delta_col, positive_claim_col, n_bootstrap=DEFAULT_BOOTSTRAP_REPLICATES, random_seed=DEFAULT_BOOTSTRAP_RANDOM_SEED)

Return cluster-bootstrap intervals for paired evidence-margin summaries.

evidence_margin_table(scores, *, group_cols=('session', 'event_index'), evidence_col='log_evidence', model_col='model')

Return one best-vs-runner-up evidence-margin row per group.

grouped_claim_gate_summary(summary, *, group_col, claim_fraction_col, forbidden_claims_col=None, mean_delta_col=None, median_delta_col=None, min_claim_fraction=0.5)

Return generic pass/fail gates for grouped claim robustness.

grouped_paired_model_margin_summary(decisions, *, group_cols)

Alias for grouped paired-margin summaries used by downstream reports.

infer_paired_model_group_cols(scores)

Infer columns that define one paired model-decision unit.

leave_one_group_out_summary(frame, *, group_col, summary_fn, held_out_col='held_out_group')

Apply a summary after holding out each value of group_col.

paired_model_margin_decisions(scores, *, positive_model, reference_model, margin_threshold=0.0, group_cols=('session', 'event_index'), evidence_col='log_evidence', model_col='model', true_model_col=None, positive_true_label=None)

Classify paired model wins using a symmetric log-evidence margin.

paired_model_margin_summary(decisions, *, group_cols=(), true_model_col=None)

Summarize a paired margin-decision table, optionally by groups.

paired_model_margin_threshold_sweep(scores, *, positive_model, reference_model, thresholds, group_cols=None, summary_group_cols=(), evidence_col='log_evidence', model_col='model', true_model_col=None, positive_true_label=None)

Summarize paired margin decisions over candidate thresholds.

select_paired_model_margin_threshold(threshold_sweep, *, max_false_positive_claims=0, min_positive_claim_recall=0.0)

Select the smallest threshold satisfying synthetic specificity gates.

constraint_mask(table, constraints, *, eps=1e-12)

Return rows satisfying all scalar constraints.

Constraint values may be either ("<=", value) tuples or mappings with {"op": "<=", "value": value} keys.

equal_quality_selection(table, *, quality_constraints, compression_objective, compression_direction='min', tie_breakers=(), eps=1e-12)

Select candidates under fixed quality constraints.

This is a named wrapper around :func:select_under_constraints for common equal-quality compression/compaction comparisons.

is_pareto_front(table, objectives, *, directions, feasible_mask=None, eps=1e-12, allow_missing=True)

Return a boolean Series marking non-dominated rows.

pareto_front_indices(table, objectives, *, directions, feasible_mask=None, eps=1e-12, allow_missing=True)

Return indices of non-dominated rows.

Parameters

table: DataFrame whose rows are candidates. objectives: Objective column names to compare. directions: Either a mapping from objective name to "min"/"max" or a sequence aligned with objectives. feasible_mask: Optional row mask. If supplied, only feasible rows can be on the front. eps: Numerical tolerance for weak/strict comparisons. allow_missing: If true, objective values that are missing in either row are skipped for that pair. A row can dominate another only when at least one comparable objective remains and one comparable objective is strictly better.

record_dominates(left, right, objectives, *, directions, eps=1e-12, allow_missing=True)

Return whether left Pareto-dominates right.

left dominates right if it is at least as good on all comparable objectives and strictly better on at least one comparable objective.

select_under_constraints(table, *, constraints, objective, direction, tie_breakers=(), eps=1e-12)

Return feasible rows sorted by one objective plus optional tie-breakers.

as_point_set(points, *, name='points', expected_dim=None)

Return points as a finite float64 array with shape (N, D).

Parameters

points: Array-like point set. name: Name used in validation error messages. expected_dim: Optional required point dimensionality.

chamfer_distance(points_a, points_b, *, squared=False, symmetric=True, query_chunk_size=_DEFAULT_CHUNK_SIZE)

Return directed or symmetric Chamfer distance between two point sets.

symmetric=True returns the sum of the two directed mean nearest-neighbor distances, matching the convention used by common 3D reconstruction diagnostics. squared=True applies the same convention to squared nearest-neighbor distances.

deterministic_subsample(points, *, max_points, seed=0)

Return a deterministic random subset of points and its indices.

If max_points is None, non-positive, or at least the number of points, the original order is preserved and all indices are returned. Otherwise, max_points unique indices are sampled without replacement and sorted so downstream computations are reproducible and stable under independent chunk sizes.

distance_quantiles(query, reference, *, quantiles=(0.5, 0.9, 0.95, 0.99), query_chunk_size=_DEFAULT_CHUNK_SIZE)

Return quantiles of directed nearest-neighbor distances.

nearest_neighbor_distances(query, reference, *, query_chunk_size=_DEFAULT_CHUNK_SIZE, return_indices=False)

Distance from each query point to its nearest reference point.

The function uses :class:scipy.spatial.cKDTree when available and falls back to a deterministic chunked dense implementation otherwise.

point_set_geometry_summary(estimate_points, reference_points, *, thresholds=(0.1,), query_chunk_size=_DEFAULT_CHUNK_SIZE)

Return standard directed/symmetric point-set geometry metrics.

The summary keys follow reconstruction terminology:

  • accuracy_*: estimate-to-reference nearest-neighbor distances.
  • completion_*: reference-to-estimate nearest-neighbor distances.
  • chamfer_l1: sum of directed mean distances.
  • chamfer_l2: sum of directed mean squared distances.

precision_recall_curve(estimate_points, reference_points, thresholds, *, query_chunk_size=_DEFAULT_CHUNK_SIZE)

Return precision/recall/F-score rows for multiple thresholds.

precision_recall_fscore(estimate_points, reference_points, threshold, *, query_chunk_size=_DEFAULT_CHUNK_SIZE)

Return precision, recall, and F-score for a distance threshold.

protected_tail_topk_mask(primary_scores, tail_scores, reliability_scores, retention_fraction, *, tail_quantile, tail='lower', min_count=1, sanitize_nonnegative=True)

Select a fixed-size subset while preserving proportional tail capacity.

The candidate set is split at tail_quantile of reliability_scores. The tail receives the same retention fraction as the full set, ranked by tail_scores. The complement receives the remaining retained budget, ranked by primary_scores. If the tail is empty the function falls back to ordinary top-fraction selection by primary_scores; if the complement is empty, all candidates are ranked by tail_scores.

quantile_tail_mask(reliability_scores, quantile, *, tail='lower', sanitize_nonnegative=True)

Return a boolean mask for the lower or upper reliability-score tail.

quantile_tail_threshold(reliability_scores, quantile, *, tail='lower', sanitize_nonnegative=True)

Return the threshold separating a reliability-score quantile tail.

retained_count_from_fraction(item_count, retention_fraction, *, min_count=1)

Convert a retention fraction to a deterministic retained count.

min_count is applied only when item_count > 0 and retention_fraction > 0. Set min_count=0 for exact zero-retention behavior.

sanitized_score_vector(values, *, nonnegative=True)

Return a finite one-dimensional float64 score vector.

Non-finite entries are mapped to zero. By default negative values are also clipped to zero, matching the common interpretation of scores as confidence, reliability, probability, or non-negative utility.

tail_rescue_quota_count(retained_count, *, rescue_fraction)

Return a bounded tail-rescue quota inside a retained budget.

tail_rescue_topk_mask(primary_scores, tail_scores, reliability_scores, retention_fraction, *, tail_quantile, rescue_fraction, tail='lower', min_count=1, sanitize_nonnegative=True)

Top-k selection with a bounded quota rescued from a reliability tail.

top_count_mask(scores, retained_count, *, tie_break_scores=None, largest=True, sanitize_nonnegative=True)

Return a deterministic mask selecting retained_count score entries.

Ties are resolved by the optional tie_break_scores and then by increasing original index, making the selection reproducible across NumPy versions.

top_fraction_mask(scores, retention_fraction, *, tie_break_scores=None, largest=True, min_count=1, sanitize_nonnegative=True)

Return a top-k mask whose size is derived from a retention fraction.