Abstract
Intervention-aware evaluation treats every pause, rollback, and resumed command as evidence about autonomy, safety, and reproducibility.
This site frames the project as an academic discussion artifact: it states the adaptation problem, proposes a design stance, and lists evaluation lenses that can be expanded into experiments.
Research Question
What empirical protocol can separate productive Mano autonomy from unobserved drift in persistent terminal sessions?
Adaptation Notes
- Tasks vary in duration, dependency depth, and need for manual interruption.
- Interventions are logged as first-class events rather than external notes.
- Final artifacts are scored together with trace quality and recovery rationale.
Evaluation Lens
- Autonomy duration before intervention
- Rollback correctness
- Trace-grounded justification quality
Open Discussion
The central methodological risk is mistaking terminal completion for agent understanding. The project therefore treats tmux as both infrastructure and evidence: pane state, focus movement, command output, and recovery behavior all become part of the argument.
Future work can connect this static discussion to executable harnesses, trace viewers, and standardized task suites for cross-agent comparison.