Paper Figures

The current paper figures are split between:

the evaluation_v2 core bundle under output/paper/evaluation_v2/pack/figures/
the evaluation_v3 strengthening bundle under output/paper/evaluation_v3/pack/figures/

Active Figure Roots

core main-paper figures: output/paper/evaluation_v2/pack/figures/main/
strengthening figures: output/paper/evaluation_v3/pack/figures/main/
appendix/supporting figures: output/paper/evaluation_v2/pack/figures/appendix/
generated-figure manifest: output/paper/evaluation_v2/pack/figures/manifest.json
figure index: output/paper/evaluation_v2/pack/figures/README.md

Canonical Regeneration Steps

Run the active evaluation bundle:

python paper/experiments/scripts/reproduce_evaluation_v2.py
python paper/experiments/scripts/exp_rq4_evaluation_v2.py --out output/paper/evaluation_v2/runs/E5_policy_comparison

Derive the paper-facing summaries:

python paper/experiments/scripts/derive_paper_evaluation.py --root output/paper/evaluation_v2

Regenerate the publication-focused figure set:

python paper/experiments/scripts/generate_eval_v2_focus_figures.py --root output/paper/evaluation_v2
python -m claimstab.figures.plot_rq4_adaptive \
  --input output/paper/evaluation_v2/runs/E5_policy_comparison/rq4_policy_summary.json \
  --out output/paper/evaluation_v2/runs/E5_policy_comparison/figures

Main-Paper Figure Set

The current ICSE-style main figure set is:

fig1_stability_profile
fig2_robustness_cells_by_delta
fig3_claim_distribution
fig4_e1_prevalence_by_scope
fig5_claim_metric_mismatch
fig6_claim_family_verdicts
fig_rq4_ci_width_vs_cost

The publication-ready PNG/PDF copies live in:

output/paper/evaluation_v2/pack/figures/main/

The strengthening bundle adds:

fig_w1_second_family_verdicts
fig_w3_metric_baseline_sensitivity
fig_w5_near_boundary_tradeoff

These live in:

output/paper/evaluation_v3/pack/figures/main/

Figure Roles

fig4_e1_prevalence_by_scope: RQ1 prevalence in the main E1 battleground.
fig5_claim_metric_mismatch: icon figure showing that a supportive metric summary does not imply a stable claim.
fig6_claim_family_verdicts: RQ2 semantic discrimination across ranking, decision, and distribution claims.
fig_rq4_ci_width_vs_cost: RQ4 cost-agreement tradeoff, highlighting adaptive_ci_tuned.

Supporting / Appendix Figures

Supporting figures remain staged under:

output/paper/evaluation_v2/pack/figures/appendix/

These cover:

E2 GHZ structural calibration
E3 BV decision calibration
E4 Grover fragile distribution case
S2 boundary stress
QEC portability illustration
per-experiment heatmaps and robustness/supporting diagnostics

Scope Note

Legacy figure roots such as output/paper/artifact/figures/ and output/paper/pack/figures/ are retired from the active workflow. The current website and paper narrative should refer to the evaluation_v2 core bundle plus the evaluation_v3 strengthening bundle.