Skip to content

Eval artifacts¤

Each iteration writes artifacts under:

$XDG_STATE_HOME/humblskills/evals/<skill>/iteration-N/

Typical layout:

iteration-N/
├── benchmark.json      # cross-section stats + deltas
├── trajectory.json     # per-session time series (smart arm compounds here)
├── report.html         # single-file Plotly dashboard
├── report.md           # plaintext mirror (PR-friendly)
├── report.json         # machine-readable
├── smart_skill/
│   └── session-NN/
│       ├── outputs/              # files the agent wrote
│       ├── transcript.txt      # full agent transcript
│       ├── timing.json         # tokens, duration, cost
│       ├── metrics.json        # tool-call counts + brain reads
│       ├── brain-snapshot-before/
│       └── brain-snapshot-after/ # feeds session N+1 for smart arm
├── flat_skill/...
└── no_skill/...

Iterations are persistent and append-only. Use humblskills eval prune to cap how many iterations you keep per skill.