Phase A (single-pass paper scanner) showed $65.98 hypothetical realized P&L over four days on the combined-cost CTF arbitrage thesis. Encouraging. Also not enough — paper that assumes our quote sat at the front of the queue forever doesn't tell us whether the strategy survives contact with real execution.
Today I shipped Phase B (12 tasks) and Phase B.2 Stage 1 (3 tasks) — 15 dispatched implementations, two stages of review per task, 137 tests, two pre-existing bug fixes, and one big strategic re-frame after some honest research on what other people running this strategy have actually learned. The wallet stays closed. The machinery is now real.
What shipped
Phase B — stateful order machinery on paper
Phase A was a single-pass scan: every cron wakes up, looks for combined-cost < 0.97 opportunities, writes a "would-quote" row, exits. Stateless. Phase B is the real engine, running on paper:
OrderClientinterface — abstract base class withPaperOrderClientas the current impl andLiveOrderClientas a stub that raisesNotImplementedErroron every method. The seam is live-compatible: when we eventually wirepy-clob-client+ a funded wallet, the swap is one constructor change, not an architectural rewrite.Inventorystate class — JSON file (polydoge_v4_inventory.json) holdingcumulative_pnl_all_time,day_pnl,open_positions,paused_until. Atomic save via.tmp+os.replace. Day-roll at UTC midnight resets only the day counter — never touches the cumulative seed.- Migration — one-time CLI that reads Phase A replay results and seeds the
inventory file with
cumulative_pnl_all_time = $65.98across 62 replay records. Write-once protected. We carry forward our paper history into the new state machine; Phase B doesn't start from zero. - 5 kill switches reusing the v3.x signal stack: whale order >$5K, BTC 24h
range >4%, Binance funding spike ±0.005%/5min, day P&L < -$20, top-50 smart-money
wallet entering one of our markets. Severity-ordered so only the highest-severity trip pauses
the bot, but every trip logs to
polydoge_v4_killswitches.jsonl. - Stateful match loop in
main()— on each cron: reconcile fills against the trades API, close any resolved one-sided positions, run pre-resolution cancel on markets resolving within the next 35 minutes, check kill switches, cancel-and-replace any order whose target price drifted >2¢ since posting, place new quotes viaplace_maker_gtc. - Discord alerts — per-fill (✅ paired / 🟡 one-sided), kill-switch trips (🚨), and a daily 📊 summary on the first cron of each new UTC day, using the previous day's data captured just before the roll.
- GHA cron bumped to
*/30(was hourly) — the new state machine needs sub-hour cadence to actually fire cancel-and-replace, pre-resolution cancel, and kill-switch reactivity. About 48 runs/day, well within free tier. Workflow also now commits the three new state files back to the repo so they survive across crons.
Phase B.2 Stage 1 — universe expansion + paper realism
After Phase B shipped, the research changed how I think about the strategy. Three tasks:
- U1: Multi-category scanner. Was crypto-only (BTC/ETH/SOL/XRP/DOGE Up/Down).
Now 5 categories with their own time windows and volume floors:
crypto(0.5-8hr),politics(24-168hr),sports(2-72hr),econ(12-336hr),geopolitics(24-720hr). Each market gets tagged with its category and that tag rides all the way through to the ledger, dashboard, and P&L attribution. - U2: Category-aware rebate model. Phase B had
MAKER_REBATE_RATE = 0.001(0.10%) — a guess. Polymarket's published maker-rebate program is 20-50% of taker fees, which given 1.0-1.8% taker fees translates to 0.20-0.90% rebates by category. I was undercounting rebates by 2-5x. The corrected dict:{finance: 0.005, politics/sports/econ/geo: 0.0025, crypto: 0.002}. Pair P&L on a YES@0.45 + NO@0.52 size=10 fill: politics now books $0.30425; crypto books $0.2994; finance $0.3285. Same trade, different real economics depending on what we were quoting on. - U3: Queue-aware fill simulation. The load-bearing change.
analyze()already capturesyes_queue_aheadandno_queue_ahead— the existing maker depth at our target price level. Phase B paper ignored those numbers entirely; if a taker SELL crossed our price, we credited a fill. That's not how real maker queues work. The newreconcile_fillsdrainsqueue_ahead_at_postfrom cumulative taker volume BEFORE crediting fills to our order. If 500 shares of qualifying SELL volume arrive but there were 800 shares of maker depth ahead of us, we get nothing.
The two findings that reframed targeting
I asked an agent to find every public Polymarket market-making algorithm and tell me what other people running this strategy have actually learned. Two things came back that mattered.
1. We were quoting in the HFT lane
Polymarket introduced dynamic fees on 5-minute and 15-minute crypto markets in 2025
specifically to curb latency arbitrage. Those markets are dominated by sub-10ms Rust
bots running on Polymarket API co-location. Our 30-minute cron cannot win there. The empirical
data confirmed it: 5-min and 15-min crypto markets sat at bid_combined ≈ 0.99 — 1¢
of room — for our entire Phase A window. Our actual wins came from 4-hour alt markets
(XRP best fill at $12.80, DOGE and SOL in the $5-8 range) where bid_combined opened
up to 0.95.
The structural sweet spot for a 30-min cron is politics, sports, econ, geopolitics — zero taker fees on most, higher rebate tiers, slower price movement, HFTs ignore them below $10K depth. That's the entire premise behind U1.
2. The strategy is academically validated
Othman & Sandholm (CMU 2010, 2012) prove that pure spread capture in prediction markets is negative-EV against informed flow — you lose to adverse selection every time. But CTF redemption is the structural edge that makes combined-cost arbitrage non-zero-EV even against informed takers. We're not running a generic MM strategy. We're running the one strategy on Polymarket the academic literature says can work without information edge, because of how the protocol settles.
Other published numbers: $20M+ in collective Polymarket MM profits in 2024. One profiled bot made $2.2M in two months — though that bot used an ensemble probability model + news signal, a different (directional) strategy than ours. The combined-cost arb cohort is smaller but real.
Bugs caught along the way
Two real bugs found while building Phase B that pre-dated this session and would have silently broken parts of the system:
- CoinGecko funding rate was 100x under-reported. The Binance code path in
binance.pycorrectly multipliedfundingRateby 100 to convert decimal-to-percentage. The CoinGecko fallback path had a comment saying "CoinGecko returns as percentage already" — wrong; CoinGecko returns decimal too. The funding-spike kill switch was effectively disabled on any IP that hit the CoinGecko path. Fixed. - Migration could be poisoned by NaN/Inf. The
_compute_cumulative_pnlsummed everyhypothetical_realized_pnlvalue throughfloat()— which happily acceptsfloat("nan")andfloat("inf"). A single corrupt row would have madecumulative_pnl_all_timea non-finite value thatjson.dumpwould write as a non-standard JSON token. Addedmath.isfiniteguard with a stderr warning that skips the row.
What's now observable
The dashboard JSON at /polydoge/data/v4-paper-stats.json gained four new keys
(all additive — old keys preserved): phase_b (current inventory state),
phase_b_pnl_history (cumulative line with Phase A→B switchover marker),
phase_b_fills_summary (24h + 7d fill counts, rebate sums, gas), and
killswitches_recent (last 10 trips). Schema version bumped to 2.
RUN_LOG rows now carry: pre_resolution_cancelled,
one_sided_resolved_count, day_rebates_earned, day_gas_paid,
fills_count, phase_b_active, posting_paused. The bot is
self-attesting about whether each step actually ran.
The gates
| Date | Check | Decision |
|---|---|---|
| 2026-05-28 | 1-week check | Funnel widening >60% of crons quoteable? Category mix? Realistic P&L direction? |
| 2026-06-04 | Phase C gate | Pot >$200 OR 14d from switchover. Ship live with $50 wallet, or shelve. |
If queue-aware paper collapses the $16/day implied trajectory below $0.50/day after rest-of-May data lands, that's the honest answer that combined-cost arb at our latency tier was phantom all along — and the past five days of work was tuition I'd rather pay before funding a wallet than after.
If it survives, Phase B.2 Stage 2 ships next: UMA dispute screen (real existential risk per the research — UMA oracle was successfully attacked in March 2025), news-triggered quote withdrawal (the single biggest adverse-selection defense pro MMs use, not in any public bot), auto-redeem on pair completion, per-category dashboard breakdown.
The lesson
The deeper synthesis is what surfaced the 5x rebate undercount, the wrong market universe, the opportunity-density bottleneck (68% of crons found zero quoteables — we're constrained by opportunity flow, not execution quality), and the realization that paper P&L is mostly machinery validation, not strategy validation. Without that pushback, the wallet might already be funded and bleeding.
Paper continues at */30 through the rest of May. The next decision lands June 4.