Today PolyDoge pivoted from directional prediction to liquidity provision on Polymarket. The directional engine ran for 83 days, made 6,267 predictions, and lost $582. We learned a lot. We're not going to keep losing money to learn the same lesson again.
This is the honest write-up. What didn't work, why, and what we're trying instead.
What the prediction engine actually proved
v3.x was a learning loop on top of an LLM trader. CoinGecko price, Binance funding, Fear & Greed, smart-money leaderboards, order books, whale detection — seven signal providers feeding a scoring prompt. Position-aware gates, Kelly stake sizing, per-signal lift tracking, auto-pruning of dead categories. All of it auto-calibrated from the ledger every cycle.
None of it found edge.
| Window | Resolved | Hit rate | $P&L |
|---|---|---|---|
| All-time (BTC) | 457 | 37% | −$582 |
| Post-Exp-#013 (Mar 24+) | 32 | 41% | −$14 |
| Last 30 days | 8 | 50% | −$5 |
Best-case interpretation: we're flat, not bleeding. Honest interpretation: the engine is a coin flip with extra steps. The 85+ confidence bucket is the only well-calibrated one (88.5% actual vs 90% expected on 96 samples). Everything below 85% is dramatically overconfident — the 55-69% bucket hits 12.7% on 220 samples, which is worse than coin flip.
The structural diagnosis
We ran one final test before pulling the plug. Across 4,294 post-fix shadow predictions, how many times did the algo say ≥80% confident on a market priced between $0.20 and $0.80 in disagreement with the market?
0 of 4,294.
The algorithm never confidently disagrees with the market. It only gets confident when the market is already confident. When the market is uncertain (0.20–0.80 range), so is the algo.
The shadow mirage
One thing that confused us for weeks: PolyDoge's shadow bets (predictions that didn't clear the live-trading gates but got tracked anyway) had a 95.4% hit rate with +$93 in hypothetical P&L. It looked like there was alpha being filtered out.
There wasn't. 79% of shadow bets were on near-certain markets ($0.05 or $0.95 pricing) where we agreed with the market — those naturally hit 95% because the market is right 95% of the time. The Kelly stake collapses to micro-pennies on those because there's no edge to size into. We simulated trading the entire shadow universe at a fixed $5 stake:
4,294 trades · 89.8% hit rate · -3.65% ROI · -$784.69 net
High hit rate, negative P&L. Classic picking-up-nickels-in-front-of-a-steamroller. The 10% of losses on "near-certain" markets each cost ~$4.75 (you risk $4.75 to win $0.25), and they wiped out the wins.
What we're trying instead: combined-cost arbitrage
Polymarket binary markets resolve to exactly $1.00 — one side wins ($1.00), the other goes to zero. So if you can buy both sides for under $1.00 combined, you have a guaranteed profit when you redeem the pair via the CTF (Conditional Token Framework) contract.
This is market-neutral. We don't predict anything. We post maker bids on YES and NO simultaneously, try to capture both sides at a combined cost under $1.00, and book the difference.
The reality check (also today)
We ran the v4.0 paper bot against live Polymarket data immediately after committing. The first scan revealed the second hard lesson of the day.
| Market type | bid_combined | Room to $1 | Edge per cycle |
|---|---|---|---|
| 5-min BTC Up/Down | 0.9900 | 1.0¢ | ~$0.10/cycle |
| 15-min Up/Down | 0.9900 | 1.0¢ | ~$0.10/cycle |
| 4-hour Up/Down | 0.9500 | 5.0¢ | ~$0.50/cycle |
| Daily BTC binaries | 0.9990 | 0.1¢ | ~$0.01/cycle |
The 5-min and daily markets are heavily MM'd — competing bots have already collapsed the spread to 1¢ or less. The sweet spot is the 4-hour Up/Down markets, where fewer competitors mean wider spreads. Only 1-2 of those are active at any given time though, so realistic projection is small.
What shipped today
- v3.3 killed — confidence floors raised to 101 in
algorithm/config.py. All picks force-skip to shadow status. No new Discord posts. 8 open paper positions resolve naturally by May 21. (commit) - v4.0 paper bot live —
dogelord/v4/mm_bot.pyruns hourly via GitHub Actions. Scans 1-4hr BTC binaries, computes combined-cost arbitrage decisions, logs would-be quotes. No wallet, no orders placed — yet. - Phase plan compressed from 5 phases to 3:
- Phase A (now): paper-only on GHA hourly, 48-72hr of data
- Phase B: wire
py-clob-client-v2+ wallet, $50 starting capital, $20 daily loss cap - Phase C: scale or shelve based on 14-day P&L
What we're explicitly NOT doing
- Touching the v3.x ledger, lab reports, or pick history. They're historical record. Rewriting the past is dishonest and breaks the learning trail.
- Renting paid infrastructure. v4.0 runs on the GitHub Actions free tier — hourly cron, ~30min/day of compute. Total v3.x + v4.0 budget: 0% incremental cost.
- Over-engineering. polymarket-terminal is a 300-line Node.js script doing the same thing. v4.0 will stay similarly minimal until we have data justifying complexity.
- Promising P&L. The honest reality is this might not work at all. If 48 hours of paper data shows
$0.10/day projected, we shelve v4.0 and PolyDoge becomes an archived experiment.
What to watch
- Live picks page — v3.x history preserved; new entries reflect v4.0 paper mode
- About — updated to reflect v4.0 thesis
- Lab dashboard — bankroll chart now has a clear "v3.x → v4.0" phase marker where the prediction engine was retired
- Future journal entries — each major decision or finding gets its own entry going forward. No more silent pivots.
Why we're posting this
Building in public means publishing the losses, not just the wins. v3.x lost money and we're saying so. v4.0 might also lose money and we'll say that too. The point of the experiment isn't to be right — it's to find out what's true.
Polymarket is more efficient than we thought. The algorithm we built was a sophisticated way to confirm what the market already knew. We learned that by watching the loss column compound. Now we try the inverse: stop predicting, start providing liquidity, see if the rebate structure pays better than the prediction structure did.
If it works, great. If it doesn't, we'll write another journal entry explaining why, and PolyDoge will retire as a completed experiment with everything documented. Either way, the public record stands.
c717519 · v4.0 first run: 15:49 UTC