v4.0 launched this morning. By evening, the paper bot had completed 11 cycles, scanned 180 markets, and found 9 quoteable opportunities. The arbitrage thesis fired faster than I expected — and not where I expected. Three things changed today. Writing them down before they feel obvious.
What actually happened
v4.0's job, in one sentence: scan Polymarket every hour for binaries where the YES bid + NO bid sum to less than $1.00, and post both sides for a redeemable pair at maturity. "Quoteable" means the spread is wide enough that we'd post. "Filled" is a separate question — that's what the replay layer answers.
First-hour numbers (paper, all hypothetical, no real orders):
| Asset | Quotes | Capital (theoretical) | Proj P&L | Edge % |
|---|---|---|---|---|
| DOGE | 3 | $30 | +$5.94 | 19.8% |
| XRP | 3 | $30 | +$3.54 | 11.8% |
| BTC | 1 | $10 | +$0.58 | 5.8% |
| Total | 7 | $70 | +$10.06 | 14.4% |
Those are projection numbers — "if both sides fill at our hypothetical price." Realistic conversion will be lower, possibly much lower. The dashboard's replay panel will show that gap as data accumulates.
The asset surprise
I framed v4.0 as a "BTC binaries" experiment. The pivot writeup (entry #001) assumed Bitcoin Up/Down hourly markets would be the bread-and-butter. They're not.
That changes the product framing. The thing being traded isn't "BTC" — it's Polymarket's hourly directional series, which exists for many assets, each with its own liquidity profile. Bitcoin is the most-shopped and the tightest. Alts are the loosest. Both are tradeable; the alts will produce more edge.
Practical change: I added DOGE and XRP to the explicit watchlist and reordered the keyword filter so
"up or down" leads. The per-asset keywords (BTC/ETH/SOL) stay as insurance against Polymarket
renaming the product or shipping new SKUs. Costs nothing to keep them.
Why I bumped stake to $50/side
The original STAKE_PER_SIDE was $5 — basically toy size. The math said $0.30–$1/day at full
saturation. That's fine for "does the strategy work at all" but it's too small to learn from.
Noise dominates signal. A 14% projected edge on $5 stake = $0.70. Could be skill, could be a single market
tick.
10× the stake, 10× the dollar magnitude, same percentage. Now we're talking about $3–10/day projected, on $500 max-cycle theoretical capital. Still tiny vs the $9,400 paper bankroll, still inside the "science project" guardrail — but loud enough that the daily numbers actually mean something when we read them.
| Knob | Before | After |
|---|---|---|
| Stake per side | $5 | $50 |
| Per-pair capital | $10 | $100 |
| Max-cycle exposure | ~$50 | ~$500 |
| Phase B gate (replay $/day) | $0.30/day | $3/day |
The Phase B decision gate scales with stake — same percentage threshold, just larger dollar values. If the replay layer shows hypothetical realized P&L crossing $3/day over 48–72 hours, we move forward with live orders. Below that, we keep iterating in paper.
What the dashboard now shows that it didn't this morning
- Bot health panel — heartbeat status, last-run age, 24h uptime %. So failure mode is visible.
- Queue position panel — across the 9 quoteables seen so far, our hypothetical +0.001 bid jump puts us at top-of-book 100% of the time on both sides. Means fills aren't gated by us being behind a wall of orders.
- Replay validation — populating now. Each quoteable gets re-checked against actual trade prints for the next ~8 hours.
- Adaptive threshold suggestion — computed but never auto-applied. Once we have ~20 replays, it'll suggest whether to loosen or tighten
MIN_BID_ROOM.
What I'm watching overnight
- Replay conversion ratio. Of every $1 of projected edge, how much does the trade-print evidence say we'd actually have realized? This is the gap I genuinely don't know yet.
- Asset-mix consistency. If tomorrow morning the breakdown is still DOGE-and-XRP-heavy, that's a real signal — not just one hour of luck.
- Top-of-book stability. The new churn metric records how much the best bid moves between cycles. If it's churning hard, an hourly cron is too slow and we have a latency problem.
Next checkpoint
Tomorrow morning (2026-05-17) the first full overnight will be in. Three possible outcomes:
- Green light: replay confirms >30% of projected → start scoping Phase B (live orders, kill-switch, wallet integration).
- Yellow: replay shows 5–30% conversion → keep paper, raise
MIN_BID_ROOMthreshold, narrow universe. - Red: replay shows <5% → books were theatrical, real takers don't cross. Rethink — possibly need a different product (limit orders on resolved-tomorrow markets?) or a different strategy entirely.
Either way, the next entry has the answer. The whole point of building this in public is to publish what the data says, not what I hoped it would say.