use-smart-humanize-text · iteration 1

Runner cursor-agent · Generated by humblskills eval. Pass rate and tokens are aggregated per session index (longitudinal runs compound in session order for smart_skill).

adaptive-brand-voice-discovery

What to read first

No narrative bullets were generated. Open report.json for full trajectory and benchmark objects.

Charts

Pass rate

Tokens (mean per session) — bars

Violations (rule-checker count per session) — lines

Pass^k (all assertions pass this session)

Brain: patterns.md entries (smart_skill)

The patterns series tracks cumulative entries in references/patterns.md after each session (flat and no_skill are typically flat at zero). Rising values indicate the brain is accumulating lessons across sessions. The violations series is the sum of count fields across any *-check.json sidecars the agent produced; lower is better.

Derived metrics

arm	learning velocity	token ratio (last/first)

Longitudinal: first vs last session (per scenario)

scenario	arm	first pass	last pass	delta	sessions

Cross-section (mean over all runs)

arm	pass_rate	tokens	time s	cost $

Deltas (smart_skill baseline comparisons)

pair	Δ pass_rate	% change	Δ tokens	% change	Δ time s	% change	Δ cost $	% change

Δ is the absolute difference (smart − baseline). % change is Δ / baseline × 100. For tokens / time / cost, smaller is better, so a negative % is an improvement and colors green.

Arms: flat_skill · no_skill · smart_skill ·
humblSKILLS longitudinal evals: smart_skill restores prior brain state between sessions so patterns.md and related files can compound. Compare cross-section deltas and longitudinal tables to argue skill + brain vs flat instructions vs no skill.