tmux for Mano: Autonomy and Intervention Evaluation

Abstract

Intervention-aware evaluation treats every pause, rollback, and resumed command as evidence about autonomy, safety, and reproducibility.

This site frames the project as an academic discussion artifact: it states the adaptation problem, proposes a design stance, and lists evaluation lenses that can be expanded into experiments.

Research Question

What empirical protocol can separate productive Mano autonomy from unobserved drift in persistent terminal sessions?

Adaptation Notes

Tasks vary in duration, dependency depth, and need for manual interruption.
Interventions are logged as first-class events rather than external notes.
Final artifacts are scored together with trace quality and recovery rationale.

Evaluation Lens

Autonomy duration before intervention
Rollback correctness
Trace-grounded justification quality

Open Discussion

The central methodological risk is mistaking terminal completion for agent understanding. The project therefore treats tmux as both infrastructure and evidence: pane state, focus movement, command output, and recovery behavior all become part of the argument.

Future work can connect this static discussion to executable harnesses, trace viewers, and standardized task suites for cross-agent comparison.