use-smart-humanize-text · iteration 1
Runner cursor-agent · Generated by humblskills eval. Pass rate and tokens are aggregated per session index (longitudinal runs compound in session order for smart_skill).
adaptive-brand-voice-discovery
What to read first
No narrative bullets were generated. Open report.json for full trajectory and benchmark objects.
Charts
Tokens (mean per session) — bars
Violations (rule-checker count per session) — lines
Pass^k (all assertions pass this session)
Brain: patterns.md entries (smart_skill)
The patterns series tracks cumulative entries in references/patterns.md after each session (flat and no_skill are typically flat at zero). Rising values indicate the brain is accumulating lessons across sessions. The violations series is the sum of count fields across any *-check.json sidecars the agent produced; lower is better.
Derived metrics
| arm | learning velocity | token ratio (last/first) |
Longitudinal: first vs last session (per scenario)
| scenario | arm | first pass | last pass | delta | sessions |
Cross-section (mean over all runs)
| arm | pass_rate | tokens | time s | cost $ |
Deltas (smart_skill baseline comparisons)
| pair |
Δ pass_rate | % change |
Δ tokens | % change |
Δ time s | % change |
Δ cost $ | % change |
Δ is the absolute difference (smart − baseline). % change is Δ / baseline × 100. For tokens / time / cost, smaller is better, so a negative % is an improvement and colors green.