Methodology Note · v0.1
Reliability from the Harness, Not the Model
An adversarial verification loop that shapes any seated model's output into a fixed, usable form
Summary
We describe a verification architecture in which a language model is seated in one role of an adversarial multi-model loop and a separate gate refuses to let a session close until the model's output matches a contract fixed before the session began. The central property is that reliability is supplied by the harness rather than by the model in the seat: the gate checks output against the frozen contract and is indifferent to what produced it. We report a behavioral record of 319 sessions to date, in which the seat has been held by five different models — most often an inexpensive open-weight model — and present the per-cycle measurements the system records, including a randomized-signal control that distinguishes genuine adversarial response from its appearance. A companion demonstration note provides a single session in full detail as a close-up of the same mechanism.
The problem
A capable language model produces answers that are usually almost right. For a human reader, almost-right is often enough. For an automated pipeline that consumes one model's output as another stage's input, almost-right is unusable: a step that expects a known shape cannot consume output whose shape varies run to run. The gap is not intelligence. A model can be entirely capable of the correct answer and still, unprompted, deliver it in a form that is slightly under-grounded, slightly over-asserted, or slightly differently structured each time. That variance is what makes raw model output a poor input.
The usual response is to make the model better or to prompt it more carefully. Our approach is different: leave the model as it is and place a gate outside it that will not accept output until the output conforms to a specification written in advance. The discipline comes from the architecture, so it does not depend on which model is in the seat.
The loop
A session runs four models in defined roles. A Researcher advances the work toward an objective. A Challenger contests the Researcher's output each cycle. A Friction model scores session health out of band. A Parietal adjudicates challenges and distills the final result. Beneath them sits a persistent record of what the project has already established, so a session builds on prior results rather than re-deriving them.
Before a session starts, its objective is compiled into a frozen contract: a small set of criteria, each marked as checked-by-code against the execution log or judged by the Challenger. The contract cannot be edited, added to, or reinterpreted once the session begins. The session may not close until every criterion is satisfied. When the Researcher proposes to end, the gate tests the proposal against the contract; if any criterion is unmet, the close is refused and the session continues.
The Researcher seat is filled through a mailbox. The engine routes each Researcher turn to that mailbox and waits for a reply, and it cannot determine what is answering. This is the mechanism of interchangeability: a frontier chat model, an API model, or a cheap open-weight model all occupy the seat the same way and are gated identically.
What the record shows
As of this writing the system has run 319 sessions since mid-May 2026, of which 271 closed complete. The remainder failed in identifiable ways — a model process dying mid-session, or a session that never reached a clean close — which are recorded distinctly rather than laundered into the complete count. The Researcher seat has been held by five different models. The distribution is the point: the seat is usually not a frontier model.
| Model in the Researcher seat | Complete | Avg cycles | Avg challenges |
|---|---|---|---|
| Open-weight GLM-4.7 (primary) | 225 | 2.5 | 0.1 |
| Frontier chat model | 30 | 4.0 | 0.8 |
| Open-weight Qwen-235B | 3 | 14.7 | 3.0 |
| Open-weight DeepSeek-V3.2 | 3 | 1.0 | 0.0 |
| Open-weight GLM-4.7 (API variant) | 10 | 5.4 | 0.2 |
The inexpensive open-weight model carried the large majority of completed sessions. That is the empirical content of the interchangeability claim: it is not an argument from design that a cheap model could be substituted, but a record in which a cheap model has been the ordinary occupant, gated by the same contract machinery throughout. The frontier model is the exception in the seat, useful precisely because watching the gate operate on the most capable occupant shows the discipline is external to the model rather than a concession the model is making.
Measuring whether the gate does work
The system records a per-cycle behavioral observation for every Researcher turn — 1,240 to date. Each captures the cycle's friction signal and the reason given for it, the Researcher's word count and counts of hedging and certainty markers, whether the Challenger issued a challenge, and the running challenge and uphold totals. This is a behavioral corpus, not a set of anecdotes, and it makes the gate's activity measurable.
The measurements are coherent with the mechanism. On cycles where the Challenger issued a challenge, the Researcher's turn was markedly longer and more assertive than on uncontested cycles — roughly two to three times the word count and a sharp rise in certainty markers. The gate reacts to exactly the profile one would predict for an over-confident, under-grounded answer: longer, more certain, less hedged. That is the near-miss the architecture exists to catch.
The methodologically important part is a control. On a substantial fraction of cycles — 333 of the 1,240 — the friction signal presented to the Researcher was injected rather than computed from the session state. The purpose is to separate genuine response to adversarial pressure from mere response to a number: if the Researcher changes its behavior only when the signal's content warrants it, and not merely because a value moved, the pressure is real rather than theatrical. The signals exist to be testable, and they are tested against themselves.
A close-up
The companion demonstration note records a single session in full: a frontier model in the Researcher seat, a three-criterion contract, and a gate that refused to close three times — forcing the model to replace a paraphrase with a quotation, an inference with a cited mechanism, and a prose description with an applicable edit — before accepting a consolidated deliverable. That session is the aggregate behavior reported here, viewed at single-session resolution. The value of pairing them is that the close-up shows the mechanism legibly while the record shows it is not a one-off.
Why the output is usable downstream
Because a session cannot close until its output matches a contract specified in advance, the output emerges in a known shape: claims grounded, evidence cited, recommendation and any edits in a fixed form. A later stage can consume that output without re-interpreting it. This is the link between a single gated session and an automated pipeline: a problem dissolved into a predictably-shaped, evidence-backed artifact is a usable input to a decomposition stage, where a loosely-worded answer of varying shape is not. The gate is the converter from answer to data.
Limitations and honesty about the numbers
The record is observational, not a controlled study, and several limits should be stated plainly. The challenge-ruling field is cleanly captured only for a recent portion of the history: of 166 recorded challenge events, 110 carry an unresolved ruling label from an earlier schema period, leaving 51 upholds and 5 rejects clearly recorded. We therefore do not report a precise uphold rate across the full history; the defensible statements are the behavioral distribution above and the clean recent window. Counts of cycles, challenges, and completions are reliable; the full text of older challenges was not always retained. We have not run a matched comparison of the same objective across occupants under identical conditions, which is the natural next measurement and would convert the interchangeability claim from a strong observational pattern into a controlled result. None of these caveats bears on the architecture's logic; they bound what the current record can be said to prove.
Relation to existing work
Message-passing between model conversations is established: parallel coding assistants coordinate through shared task lists and mailboxes, and several independent systems let model sessions exchange messages to hand off tasks or share findings. Those are coordination mechanisms among cooperating peers, and they trust each participant's output. The contribution here is orthogonal to that line of work: an adversarial gate that refuses a seated model's output until it matches a pre-fixed contract, with a mailbox used not to coordinate peers but to seat an interchangeable occupant in an adversarial role. We have not found that combination described elsewhere.