Evaluation

Model-comparison evidence margins

pyrecest.evaluation.model_comparison contains lightweight, dataframe-oriented helpers for comparing model-evidence tables. The helpers are intentionally domain-neutral: callers provide the model labels, grouping columns, thresholds, and cluster/group identifiers.

Useful entry points include:

paired_model_margin_decisions, which compares a positive model against a reference model and separates positive claims, reference claims, and ambiguous events using a symmetric log-evidence threshold;
paired_model_margin_summary, which summarizes raw wins, confident claims, ambiguous events, and mean/median log-evidence margins;
paired_model_margin_threshold_sweep, which evaluates the paired decision rule over multiple candidate thresholds;
select_paired_model_margin_threshold, which selects the smallest threshold satisfying synthetic false-positive and recall constraints;
leave_one_group_out_summary, a generic leave-one-group-out wrapper for grouped robustness checks;
cluster_bootstrap_margin_summary, which returns cluster-resampled uncertainty intervals for raw-win fractions, claim fractions, and evidence margins; and
grouped_claim_gate_summary, which summarizes whether every group satisfies majority/no-forbidden-claim/positive-margin gates.

These utilities are useful for model comparison, parameter selection, and paper-quality diagnostics whenever multiple filters, smoothers, or trackers emit comparable log marginal likelihoods.

Point-set geometry metrics

pyrecest.evaluation.point_set_metrics contains deterministic, NumPy/SciPy helpers for evaluating sampled shapes, point-cloud estimates, and extended-object geometry diagnostics. The functions provide nearest-neighbor distances, symmetric Chamfer-L1/L2 distances, threshold precision/recall/F-score, distance quantiles, and reproducible subsampling.

These helpers are intended for evaluation pipelines rather than differentiable model code. They use scipy.spatial.cKDTree when available and a deterministic chunked dense fallback otherwise.

Pareto and equal-quality selection

pyrecest.evaluation.pareto contains small dataframe-oriented utilities for rate--distortion and equal-quality comparisons. Callers provide objective columns and objective directions, so the helpers are intentionally domain-neutral and can be used for particle-count, runtime, storage-size, or accuracy/quality trade-offs.

Useful entry points include pareto_front_indices, is_pareto_front, record_dominates, constraint_mask, select_under_constraints, and equal_quality_selection.

Implicit-surface helpers

pyrecest.evaluation.implicit_surfaces contains lightweight helpers for backend-neutral scalar-field and implicit-surface evaluation. These helpers cover residual extraction through structural value(points) objects, surface-band masks, inside/outside classification, and surface-band probabilities from signed distance means and standard deviations. They are useful for shape-estimation and extended-object diagnostics without requiring implementations to inherit from a PyRecEst base class.

Protected-tail selection helpers

pyrecest.evaluation.selection contains deterministic, domain-neutral helpers for selecting a fixed-size subset under reliability or confidence constraints. The helpers are useful when an evaluation or ablation should preserve a bounded number of low-reliability hypotheses, measurements, particles, or shape samples while still ranking each region by a primary score. They intentionally avoid domain-specific names such as visibility, splats, or rendering; callers provide the primary scores, tail scores, reliability scores, retention fractions, and tail quantiles.

all = ['generate_groundtruth', 'generate_measurements', 'simulation_database', 'check_and_fix_config', 'configure_for_filter', 'perform_predict_update_cycles', 'iterate_configs_and_runs', 'determine_all_deviations', 'get_axis_label', 'get_distance_function', 'get_extract_mean', 'summarize_filter_results', 'generate_simulated_scenarios', 'plot_results', 'evaluate_for_file', 'evaluate_for_simulation_config', 'evaluate_for_variables', 'classify_inside_outside', 'surface_band_mask', 'surface_band_probability_from_signed_distance', 'surface_gradients', 'surface_residuals', 'surface_variances', 'add_evidence_margin_columns', 'classify_evidence_margin', 'cluster_bootstrap_margin_summary', 'evidence_margin_table', 'grouped_claim_gate_summary', 'grouped_paired_model_margin_summary', 'infer_paired_model_group_cols', 'leave_one_group_out_summary', 'paired_model_margin_decisions', 'paired_model_margin_summary', 'paired_model_margin_threshold_sweep', 'select_paired_model_margin_threshold', 'constraint_mask', 'equal_quality_selection', 'is_pareto_front', 'pareto_front_indices', 'record_dominates', 'select_under_constraints', 'as_point_set', 'chamfer_distance', 'deterministic_subsample', 'distance_quantiles', 'nearest_neighbor_distances', 'point_set_geometry_summary', 'precision_recall_curve', 'precision_recall_fscore', 'protected_tail_topk_mask', 'quantile_tail_mask', 'quantile_tail_threshold', 'retained_count_from_fraction', 'sanitized_score_vector', 'tail_rescue_quota_count', 'tail_rescue_topk_mask', 'top_count_mask', 'top_fraction_mask'] `module-attribute`

`classify_inside_outside(values, *, negative_inside=True)`

Classify signed scalar-field values as inside/outside.

Returns -1 for inside, +1 for outside, and 0 for exact zeros or non-finite values. Set negative_inside=False for the opposite sign convention.

`surface_band_mask(values, threshold)`

Return a mask for values within [-threshold, threshold].

`surface_band_probability_from_signed_distance(distance, distance_std, epsilon, *, min_std=0.0001, normal_cdf=None)`

Probability that a normal signed distance lies within a surface band.

`surface_gradients(surface, points)`

Return scalar-field gradients for points via surface.gradient.

`surface_residuals(surface, points)`

Return scalar-field residuals for points via surface.value.

`surface_variances(surface, points)`

Return predictive field variances for points via surface.variance_at.

`add_evidence_margin_columns(scores, *, group_cols=('session', 'event_index'))`

Merge event-level evidence-margin diagnostics back into score rows.

`classify_evidence_margin(delta_log_evidence)`

Classify a non-negative log-evidence margin into qualitative buckets.

`cluster_bootstrap_margin_summary(frame, *, cluster_col, delta_col, positive_claim_col, n_bootstrap=DEFAULT_BOOTSTRAP_REPLICATES, random_seed=DEFAULT_BOOTSTRAP_RANDOM_SEED)`

Return cluster-bootstrap intervals for paired evidence-margin summaries.

`evidence_margin_table(scores, *, group_cols=('session', 'event_index'), evidence_col='log_evidence', model_col='model')`

Return one best-vs-runner-up evidence-margin row per group.

`grouped_claim_gate_summary(summary, *, group_col, claim_fraction_col, forbidden_claims_col=None, mean_delta_col=None, median_delta_col=None, min_claim_fraction=0.5)`

Return generic pass/fail gates for grouped claim robustness.

`grouped_paired_model_margin_summary(decisions, *, group_cols)`

Alias for grouped paired-margin summaries used by downstream reports.

`infer_paired_model_group_cols(scores)`

Infer columns that define one paired model-decision unit.

`leave_one_group_out_summary(frame, *, group_col, summary_fn, held_out_col='held_out_group')`

Apply a summary after holding out each value of group_col.

`paired_model_margin_decisions(scores, *, positive_model, reference_model, margin_threshold=0.0, group_cols=('session', 'event_index'), evidence_col='log_evidence', model_col='model', true_model_col=None, positive_true_label=None)`

Classify paired model wins using a symmetric log-evidence margin.

`paired_model_margin_summary(decisions, *, group_cols=(), true_model_col=None)`

Summarize a paired margin-decision table, optionally by groups.

`paired_model_margin_threshold_sweep(scores, *, positive_model, reference_model, thresholds, group_cols=None, summary_group_cols=(), evidence_col='log_evidence', model_col='model', true_model_col=None, positive_true_label=None)`

Summarize paired margin decisions over candidate thresholds.

`select_paired_model_margin_threshold(threshold_sweep, *, max_false_positive_claims=0, min_positive_claim_recall=0.0)`

Select the smallest threshold satisfying synthetic specificity gates.

`constraint_mask(table, constraints, *, eps=1e-12)`

Return rows satisfying all scalar constraints.

Constraint values may be either ("<=", value) tuples or mappings with {"op": "<=", "value": value} keys.

`equal_quality_selection(table, *, quality_constraints, compression_objective, compression_direction='min', tie_breakers=(), eps=1e-12)`

Select candidates under fixed quality constraints.

This is a named wrapper around :func:select_under_constraints for common equal-quality compression/compaction comparisons.

`is_pareto_front(table, objectives, *, directions, feasible_mask=None, eps=1e-12, allow_missing=True)`

Return a boolean Series marking non-dominated rows.

`pareto_front_indices(table, objectives, *, directions, feasible_mask=None, eps=1e-12, allow_missing=True)`

Return indices of non-dominated rows.

Parameters

table: DataFrame whose rows are candidates. objectives: Objective column names to compare. directions: Either a mapping from objective name to "min"/"max" or a sequence aligned with objectives. feasible_mask: Optional row mask. If supplied, only feasible rows can be on the front. eps: Numerical tolerance for weak/strict comparisons. allow_missing: If true, objective values that are missing in either row are skipped for that pair. A row can dominate another only when at least one comparable objective remains and one comparable objective is strictly better.

`record_dominates(left, right, objectives, *, directions, eps=1e-12, allow_missing=True)`

Return whether left Pareto-dominates right.

left dominates right if it is at least as good on all comparable objectives and strictly better on at least one comparable objective.

`select_under_constraints(table, *, constraints, objective, direction, tie_breakers=(), eps=1e-12)`

Return feasible rows sorted by one objective plus optional tie-breakers.

`as_point_set(points, *, name='points', expected_dim=None)`

Return points as a finite float64 array with shape (N, D).

Parameters

points: Array-like point set. name: Name used in validation error messages. expected_dim: Optional required point dimensionality.

`chamfer_distance(points_a, points_b, *, squared=False, symmetric=True, query_chunk_size=_DEFAULT_CHUNK_SIZE)`

Return directed or symmetric Chamfer distance between two point sets.

symmetric=True returns the sum of the two directed mean nearest-neighbor distances, matching the convention used by common 3D reconstruction diagnostics. squared=True applies the same convention to squared nearest-neighbor distances.

`deterministic_subsample(points, *, max_points, seed=0)`

Return a deterministic random subset of points and its indices.

If max_points is None, non-positive, or at least the number of points, the original order is preserved and all indices are returned. Otherwise, max_points unique indices are sampled without replacement and sorted so downstream computations are reproducible and stable under independent chunk sizes.

`distance_quantiles(query, reference, *, quantiles=(0.5, 0.9, 0.95, 0.99), query_chunk_size=_DEFAULT_CHUNK_SIZE)`

Return quantiles of directed nearest-neighbor distances.

`nearest_neighbor_distances(query, reference, *, query_chunk_size=_DEFAULT_CHUNK_SIZE, return_indices=False)`

Distance from each query point to its nearest reference point.

The function uses :class:scipy.spatial.cKDTree when available and falls back to a deterministic chunked dense implementation otherwise.

`point_set_geometry_summary(estimate_points, reference_points, *, thresholds=(0.1,), query_chunk_size=_DEFAULT_CHUNK_SIZE)`

Return standard directed/symmetric point-set geometry metrics.

The summary keys follow reconstruction terminology:

accuracy_*: estimate-to-reference nearest-neighbor distances.
completion_*: reference-to-estimate nearest-neighbor distances.
chamfer_l1: sum of directed mean distances.
chamfer_l2: sum of directed mean squared distances.

`precision_recall_curve(estimate_points, reference_points, thresholds, *, query_chunk_size=_DEFAULT_CHUNK_SIZE)`

Return precision/recall/F-score rows for multiple thresholds.

`precision_recall_fscore(estimate_points, reference_points, threshold, *, query_chunk_size=_DEFAULT_CHUNK_SIZE)`

Return precision, recall, and F-score for a distance threshold.

`protected_tail_topk_mask(primary_scores, tail_scores, reliability_scores, retention_fraction, *, tail_quantile, tail='lower', min_count=1, sanitize_nonnegative=True)`

Select a fixed-size subset while preserving proportional tail capacity.

The candidate set is split at tail_quantile of reliability_scores. The tail receives the same retention fraction as the full set, ranked by tail_scores. The complement receives the remaining retained budget, ranked by primary_scores. If the tail is empty the function falls back to ordinary top-fraction selection by primary_scores; if the complement is empty, all candidates are ranked by tail_scores.

`quantile_tail_mask(reliability_scores, quantile, *, tail='lower', sanitize_nonnegative=True)`

Return a boolean mask for the lower or upper reliability-score tail.

`quantile_tail_threshold(reliability_scores, quantile, *, tail='lower', sanitize_nonnegative=True)`

Return the threshold separating a reliability-score quantile tail.

`retained_count_from_fraction(item_count, retention_fraction, *, min_count=1)`

Convert a retention fraction to a deterministic retained count.

min_count is applied only when item_count > 0 and retention_fraction > 0. Set min_count=0 for exact zero-retention behavior.

`sanitized_score_vector(values, *, nonnegative=True)`

Return a finite one-dimensional float64 score vector.

Non-finite entries are mapped to zero. By default negative values are also clipped to zero, matching the common interpretation of scores as confidence, reliability, probability, or non-negative utility.

`tail_rescue_quota_count(retained_count, *, rescue_fraction)`

Return a bounded tail-rescue quota inside a retained budget.

`tail_rescue_topk_mask(primary_scores, tail_scores, reliability_scores, retention_fraction, *, tail_quantile, rescue_fraction, tail='lower', min_count=1, sanitize_nonnegative=True)`

Top-k selection with a bounded quota rescued from a reliability tail.

`top_count_mask(scores, retained_count, *, tie_break_scores=None, largest=True, sanitize_nonnegative=True)`

Return a deterministic mask selecting retained_count score entries.

Ties are resolved by the optional tie_break_scores and then by increasing original index, making the selection reproducible across NumPy versions.

`top_fraction_mask(scores, retention_fraction, *, tie_break_scores=None, largest=True, min_count=1, sanitize_nonnegative=True)`

Return a top-k mask whose size is derived from a retention fraction.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search