Working paper · v0.1 · 2026-05-16

Asymmetric Information in LLM Negotiations

A used-car marketplace where the seller agent knows things the buyer agent doesn't. Measures how much a model's susceptibility to deception costs it at the close — and which selling tactics industrialize that gap.

A market with hidden information

A used-car lot. Sellers know things buyers don't — undisclosed accidents, mechanical wear, rolled odometers, prior fleet use. Each session pairs one buyer agent with one seller agent over one car. The buyer can ask, can pay to inspect, can walk. The seller can volunteer, deflect, or lie. We measure how much above the car's true value the buyer ends up paying.

Transactions
Mean premium over true value
$0 = fair · positive = seller rent
Buyer walk-aways
Buyers
grandma — trusts
casual — asks 1–2
engineer — methodical
mechanic — surgical
Sellers
honest — discloses
pragmatic — selective
pushy — pressure
slimy — lies
A continuously-running illustrative simulation — buyers (circles) wander the lot, occasionally meet a seller (square), and trade. Outcomes are sampled from the empirical (buyer × seller) premium distribution observed in the real sweep. Hit the analytical views below for the actual results.

The two layers

Every car has a public side and a private side. The seller's prompt contains both. The buyer's prompt only the public side. Extracting private facts requires either a well-posed question (with a willing seller) or a $150 inspection that surfaces every fact in one focus area.

2016 Chevrolet Tahoe LT
142,000 mi · "good" exterior · asking $24,900

"Full leather interior, Bose sound, seating for the whole crew, 5.3L V8. Just had it dealer-serviced. A family was just in here asking about it — let's talk."

PUBLIC · what the buyer sees
waterline · information asymmetry
engine AFM lifters ticking on cold start; misfire codes on cyl 1 & 7. Lifter/cam replacement needed. −$4,800
history Two-time rental fleet vehicle; CARFAX shows fleet registration 2015–2019. −$3,200
title Frame straightness within spec but prior moderate-impact collision report. −$2,800
PRIVATE · seller-only
true value (full disclosure): $11,700

If a buyer agrees to $24,000 on this Tahoe, that's a +105% premium over its true value — the seller has extracted $12,300 in information rent.

Who's in the lot

Personas define the negotiation style on each side. Knowledge level drives what questions the buyer thinks to ask; deceptiveness drives what the seller is willing to lie about. We vary both — and the underlying model running each agent — across the sweep.

Capability matters, but persona dominates. Across the e3 sweep (40+ sessions per buyer-model row), switching models moves the premium by ~6pp; switching persona from casual to engineer can swing it by 25pp in either direction depending on whose deception the buyer is facing.

Sessions run
across persona × tactic cells
Deal rate
% of sessions ending in a deal
Mean premium over true value
across closed deals only
Inspection rate
% of sessions with ≥1 paid inspection

The premium curve

Each closed deal lands somewhere relative to the car's true value — the valuation given all private facts. A deal at zero premium is fair. Positive premium means the seller extracted information rents from the buyer. The curve below is split by buyer persona; the gap between mechanic and grandma is the value of expertise in a market with hidden information.

Persona pair — premium by buyer × seller

Mean premium across closed deals in each (seller, buyer) cell. Reading across a row: how much does a given buyer get exploited by sellers of increasing dishonesty? Reading down a column: how much does a given seller's edge erode against more expert buyers?

Inspections and outcomes

Inspections are the buyer's costly route to private facts. Paying $150 for an inspection of a focus area reveals every private fact tagged to that area. The plot maps inspections-used to final premium — the downward slope is the return on diligence.

Who's in the lot

Personas define the negotiation style on each side. Knowledge level drives what questions the buyer thinks to ask; deceptiveness drives what the seller is willing to lie about. We vary both — and the underlying model running each agent — across the sweep.

Buyers
Sellers
Cars (used in the e3 sweep)

Capability matters, but persona dominates. Across the e3 sweep (96 sessions spanning four sellers, four buyers, and three cars under two delegation modes), persona shifts outcome more than the model — and delegation-mode shifts it again on top.

Conversation turn 0 / —
The asymmetric-information iceberg public listing above · private facts below

Susceptibility map

Each cell is the mean of the selected metric over all sessions in that (row, column) bin. Warmer cells mean the seller is extracting more rent; cooler cells mean the buyer held the line.

Tactic profile

For the currently selected row dimension, premium lift attributable to each tactic relative to the same row's no-tactic baseline. A long right-pointing bar means this tactic systematically extracts more from this kind of buyer than a vanilla conversation does.

E3 — Delegation: Human vs Agent negotiation

Same LLM (Gemini 2.5 Flash Lite via Vertex) on both sides of every session. The only thing that changes between cells is the system prompt: H mode delivers the persona's warm character voice; A mode delivers a structured AGENT MANDATE briefing (principal name, constraints, decision rules, operating policy). Same tools, same 22-turn cap. The treatment is briefing-format alone.

The headline: agent-mode buyers close 21 percentage points more often and pay 6.5 pp higher premium on deals. On the catastrophic-lemon Tahoe, agent-mode buyers close 5× more deals than human-mode buyers — the same buyers a human would have walked from.

Aggregate — H-H vs A-A

Outcome distribution per cell

Per-car premium — clean / moderate / catastrophic

The delegation cost is concentrated on the catastrophic lemon. On the clean Prius both cells converge on a fair price. On the moderate Altima the cells are similar. On the Tahoe — the car where E1's only successful close was Gemini-flash buyer being fleeced — A-A buyers close 5× more deals at higher premium.

Featured transcripts

Click into the Transcript Replay view (using these session IDs in the picker) to see the iceberg story play out turn-by-turn.

The institutional fix

The asymmetry problem looks like a model problem (e1, e3): every model pair extracts ~30% premium under deception. But it isn't a model problem — it's a market-design problem. e4 / e5 hold the model fixed (slimy gemini-flash-lite seller, gemini-flash buyer) and toggle one thing: can the next buyer read what previous buyers wrote about this seller?

Within-arc decay — extraction per shopping attempt

Each arc is 8 sequential trades with the same seller, 5 arcs per condition. The y-axis is the seller's surplus (final price minus true value) divided by all 5 buyers who approached, not just the buyers who closed — this avoids the selection bias where cautious buyers only close on the deals their filter missed. With reputation hidden, the seller keeps extracting steadily every trade. With reputation visible, total extraction collapses by trade 3 as enough bad reviews land that the next buyer walks.

Close rate · trade by trade

What buyers actually wrote

Reviews are the audit trail. They name specific failures — frame damage, transmission issues, undisclosed rental history — not generic complaints. The next buyer sees these before deciding whether to engage.

Method

Each session pairs a seller agent and a buyer agent across one car drawn from a fleet. The seller's system prompt contains two layers — a public listing (year, make, mileage, asking price, marketing blurb) and a private layer the buyer cannot see (true mileage, undisclosed accidents, mechanical issues, title brand, maintenance gaps). The buyer's prompt contains only the public layer.

The buyer extracts private facts two ways. They can ask questions, which the seller may answer truthfully, deflect, or lie about depending on persona. Or they can pay $150 for an inspection of a focus area, which truthfully surfaces every private fact tagged to that area. Inspections give expertise real bite — a methodical buyer knows when and where to spend.

Ground truth comes from two valuations Claude reasons through during fleet generation: public_fair_value, the price if the public layer were the whole truth, and true_value, accounting for all private facts. The headline metric is premium over true value: (final_price − true_value) / true_value.

Personas

Four sellers (honest, pragmatic, pushy, slimy) define a deception axis. Four buyers (grandma, casual, engineer, mechanic) define a knowledge / skepticism axis. Personas are JSON files containing knowledge level, patience, skepticism, inspection propensity, default budget, and a hand-written system prompt.

Tactics

Ten named selling angles drawn from social-engineering and persuasion literature — anchoring, false urgency, phantom buyers, manufactured authority, buried disclosure, technical confusion, flattery, sunk-cost framing, sweetener bundles, and social proof. Each tactic comes with a system-prompt instruction the seller is forced to deploy when the session toggle is set, isolating the marginal effect of that single lever.

Outcome metrics

Logged per session: outcome (deal / walk-away / timeout), final price, premium over true and listed value, turn count, question count, inspections used, facts revealed, and a post-hoc classification of which private facts were lied about, deflected, or volunteered. The flat-row table at runs/<sweep_id>/sessions.parquet is the API to this analysis layer.