Performance And Benchmarking
Benchmarks should report both numerical outputs and timing information. The
script benchmarks/basic_regressions.py emits JSON that can be archived by CI and
compared across releases. The companion script
scripts/check_benchmark_results.py compares deterministic benchmark outputs
against JSON baselines under benchmarks/baselines/.
Baseline checks should start by enforcing numerical outputs only. Runtime
thresholds can be added later with max_elapsed_seconds, elapsed_seconds plus
--max-runtime-ratio, or --warn-only-runtime when the goal is to collect early
signals without failing CI on shared-runner noise.
Backend-specific targets should be explicit:
| Backend | Performance target |
|---|---|
| NumPy | Reliable default behavior and SciPy-heavy workflows. |
| PyTorch | Tensor/autodiff workflows and GPU-capable native paths where implemented. |
| JAX | Pure functional and vectorized workflows where JIT is practical. |
Avoid optimizing a backend-specific path until its dtype, device, and autodiff semantics are documented in the capability matrix.
Historical Benchmarks
Use asv for trend tracking across commits and releases:
poetry run asv run --quick
poetry run asv publish
poetry run asv preview
The lightweight JSON benchmark remains useful for CI smoke checks, while ASV is better for longitudinal analysis of algorithmic changes. Add benchmarks for new performance-sensitive APIs before optimizing them.