A market with hidden information
A used-car lot. Sellers know things buyers don't — undisclosed accidents, mechanical wear, rolled odometers, prior fleet use. Each session pairs one buyer agent with one seller agent over one car. The buyer can ask, can pay to inspect, can walk. The seller can volunteer, deflect, or lie. We measure how much above the car's true value the buyer ends up paying.
The two layers
Every car has a public side and a private side. The seller's prompt contains both. The buyer's prompt only the public side. Extracting private facts requires either a well-posed question (with a willing seller) or a $150 inspection that surfaces every fact in one focus area.
"Full leather interior, Bose sound, seating for the whole crew, 5.3L V8. Just had it dealer-serviced. A family was just in here asking about it — let's talk."
If a buyer agrees to $24,000 on this Tahoe, that's a +105% premium over its true value — the seller has extracted $12,300 in information rent.
Who's in the lot
Personas define the negotiation style on each side. Knowledge level drives what questions the buyer thinks to ask; deceptiveness drives what the seller is willing to lie about. We vary both — and the underlying model running each agent — across the sweep.
Capability matters, but persona dominates. Across the e3 sweep (40+ sessions per buyer-model row), switching models moves the premium by ~6pp; switching persona from casual to engineer can swing it by 25pp in either direction depending on whose deception the buyer is facing.
The premium curve
Each closed deal lands somewhere relative to the car's true value — the valuation given all private facts. A deal at zero premium is fair. Positive premium means the seller extracted information rents from the buyer. The curve below is split by buyer persona; the gap between mechanic and grandma is the value of expertise in a market with hidden information.
Persona pair — premium by buyer × seller
Mean premium across closed deals in each (seller, buyer) cell. Reading across a row: how much does a given buyer get exploited by sellers of increasing dishonesty? Reading down a column: how much does a given seller's edge erode against more expert buyers?
Inspections and outcomes
Inspections are the buyer's costly route to private facts. Paying $150 for an inspection of a focus area reveals every private fact tagged to that area. The plot maps inspections-used to final premium — the downward slope is the return on diligence.
Who's in the lot
Personas define the negotiation style on each side. Knowledge level drives what questions the buyer thinks to ask; deceptiveness drives what the seller is willing to lie about. We vary both — and the underlying model running each agent — across the sweep.
Capability matters, but persona dominates. Across the e3 sweep (96 sessions spanning four sellers, four buyers, and three cars under two delegation modes), persona shifts outcome more than the model — and delegation-mode shifts it again on top.
Susceptibility map
Each cell is the mean of the selected metric over all sessions in that (row, column) bin. Warmer cells mean the seller is extracting more rent; cooler cells mean the buyer held the line.
Tactic profile
For the currently selected row dimension, premium lift attributable to each tactic relative to the same row's no-tactic baseline. A long right-pointing bar means this tactic systematically extracts more from this kind of buyer than a vanilla conversation does.
E3 — Delegation: Human vs Agent negotiation
Same LLM (Gemini 2.5 Flash Lite via Vertex) on both sides of every session. The only thing that changes between cells is the system prompt: H mode delivers the persona's warm character voice; A mode delivers a structured AGENT MANDATE briefing (principal name, constraints, decision rules, operating policy). Same tools, same 22-turn cap. The treatment is briefing-format alone.
The headline: agent-mode buyers close 21 percentage points more often and pay 6.5 pp higher premium on deals. On the catastrophic-lemon Tahoe, agent-mode buyers close 5× more deals than human-mode buyers — the same buyers a human would have walked from.
Aggregate — H-H vs A-A
Outcome distribution per cell
Per-car premium — clean / moderate / catastrophic
The delegation cost is concentrated on the catastrophic lemon. On the clean Prius both cells converge on a fair price. On the moderate Altima the cells are similar. On the Tahoe — the car where E1's only successful close was Gemini-flash buyer being fleeced — A-A buyers close 5× more deals at higher premium.
Featured transcripts
Click into the Transcript Replay view (using these session IDs in the picker) to see the iceberg story play out turn-by-turn.
The institutional fix
The asymmetry problem looks like a model problem (e1, e3): every model
pair extracts ~30% premium under deception. But it isn't a model
problem — it's a market-design problem. e4 / e5 hold the model fixed
(slimy gemini-flash-lite seller,
gemini-flash buyer) and toggle one thing:
can the next buyer read what previous buyers wrote about this seller?
Within-arc decay — extraction per shopping attempt
Each arc is 8 sequential trades with the same seller, 5 arcs per condition. The y-axis is the seller's surplus (final price minus true value) divided by all 5 buyers who approached, not just the buyers who closed — this avoids the selection bias where cautious buyers only close on the deals their filter missed. With reputation hidden, the seller keeps extracting steadily every trade. With reputation visible, total extraction collapses by trade 3 as enough bad reviews land that the next buyer walks.
Close rate · trade by trade
What buyers actually wrote
Reviews are the audit trail. They name specific failures — frame damage, transmission issues, undisclosed rental history — not generic complaints. The next buyer sees these before deciding whether to engage.
Method
Each session pairs a seller agent and a buyer agent across one car drawn from a fleet. The seller's system prompt contains two layers — a public listing (year, make, mileage, asking price, marketing blurb) and a private layer the buyer cannot see (true mileage, undisclosed accidents, mechanical issues, title brand, maintenance gaps). The buyer's prompt contains only the public layer.
The buyer extracts private facts two ways. They can ask questions, which the seller may answer truthfully, deflect, or lie about depending on persona. Or they can pay $150 for an inspection of a focus area, which truthfully surfaces every private fact tagged to that area. Inspections give expertise real bite — a methodical buyer knows when and where to spend.
Ground truth comes from two valuations Claude reasons through during
fleet generation: public_fair_value, the price if the
public layer were the whole truth, and true_value,
accounting for all private facts. The headline metric is
premium over true value:
(final_price − true_value) / true_value.
Personas
Four sellers (honest, pragmatic, pushy, slimy) define a deception axis. Four buyers (grandma, casual, engineer, mechanic) define a knowledge / skepticism axis. Personas are JSON files containing knowledge level, patience, skepticism, inspection propensity, default budget, and a hand-written system prompt.
Tactics
Ten named selling angles drawn from social-engineering and persuasion literature — anchoring, false urgency, phantom buyers, manufactured authority, buried disclosure, technical confusion, flattery, sunk-cost framing, sweetener bundles, and social proof. Each tactic comes with a system-prompt instruction the seller is forced to deploy when the session toggle is set, isolating the marginal effect of that single lever.
Outcome metrics
Logged per session: outcome (deal / walk-away / timeout), final price,
premium over true and listed value, turn count, question count,
inspections used, facts revealed, and a post-hoc classification of
which private facts were lied about, deflected, or volunteered. The
flat-row table at runs/<sweep_id>/sessions.parquet
is the API to this analysis layer.