Paper Figures
The current paper figures are split between:
- the
evaluation_v2core bundle underoutput/paper/evaluation_v2/pack/figures/ - the
evaluation_v3strengthening bundle underoutput/paper/evaluation_v3/pack/figures/
Active Figure Roots
- core main-paper figures:
output/paper/evaluation_v2/pack/figures/main/ - strengthening figures:
output/paper/evaluation_v3/pack/figures/main/ - appendix/supporting figures:
output/paper/evaluation_v2/pack/figures/appendix/ - generated-figure manifest:
output/paper/evaluation_v2/pack/figures/manifest.json - figure index:
output/paper/evaluation_v2/pack/figures/README.md
Canonical Regeneration Steps
Run the active evaluation bundle:
python paper/experiments/scripts/reproduce_evaluation_v2.py
python paper/experiments/scripts/exp_rq4_evaluation_v2.py --out output/paper/evaluation_v2/runs/E5_policy_comparison
Derive the paper-facing summaries:
python paper/experiments/scripts/derive_paper_evaluation.py --root output/paper/evaluation_v2
Regenerate the publication-focused figure set:
python paper/experiments/scripts/generate_eval_v2_focus_figures.py --root output/paper/evaluation_v2
python -m claimstab.figures.plot_rq4_adaptive \
--input output/paper/evaluation_v2/runs/E5_policy_comparison/rq4_policy_summary.json \
--out output/paper/evaluation_v2/runs/E5_policy_comparison/figures
Main-Paper Figure Set
The current ICSE-style main figure set is:
fig1_stability_profilefig2_robustness_cells_by_deltafig3_claim_distributionfig4_e1_prevalence_by_scopefig5_claim_metric_mismatchfig6_claim_family_verdictsfig_rq4_ci_width_vs_cost
The publication-ready PNG/PDF copies live in:
output/paper/evaluation_v2/pack/figures/main/
The strengthening bundle adds:
fig_w1_second_family_verdictsfig_w3_metric_baseline_sensitivityfig_w5_near_boundary_tradeoff
These live in:
output/paper/evaluation_v3/pack/figures/main/
Figure Roles
fig4_e1_prevalence_by_scope: RQ1 prevalence in the main E1 battleground.fig5_claim_metric_mismatch: icon figure showing that a supportive metric summary does not imply a stable claim.fig6_claim_family_verdicts: RQ2 semantic discrimination across ranking, decision, and distribution claims.fig_rq4_ci_width_vs_cost: RQ4 cost-agreement tradeoff, highlightingadaptive_ci_tuned.
Supporting / Appendix Figures
Supporting figures remain staged under:
output/paper/evaluation_v2/pack/figures/appendix/
These cover:
- E2 GHZ structural calibration
- E3 BV decision calibration
- E4 Grover fragile distribution case
- S2 boundary stress
- QEC portability illustration
- per-experiment heatmaps and robustness/supporting diagnostics
Scope Note
Legacy figure roots such as output/paper/artifact/figures/ and output/paper/pack/figures/ are retired from the active workflow.
The current website and paper narrative should refer to the evaluation_v2 core bundle plus the evaluation_v3 strengthening bundle.