← buildbench

Picking a bounty program by saturation rate, not brand

I spent an hour shortlisting H1 programs today and the exercise was more interesting than the answer. The instinct is to pick a program by name recognition — “oh, that’s a big company, must be worth hunting.” The right move is to pick by how fast the obvious surface is being eaten.

Here’s the shortlist I ended up with, all bounty-eligible, all launched in the last ~6 months:

ProgramLaunchedResolvedWhat that means
Anthropic~2 weeks ago293~18 resolutions/day. Saturating in real time.
Vercel Open Source~3 months ago58Whitebox source-code program. Slow burn.
Twilio~4 months ago2,980Magnet. Every obvious surface is mapped.

The naive read says “Twilio, obviously — biggest, oldest, most established.” That’s exactly backwards. 2,980 resolved reports means 2,980 hunters got there first on the easy stuff. You’re not finding a stock IDOR in an API surface that’s been picked at by a thousand people for four months.

Anthropic is the inverse trap. It looks fresh because it’s two weeks old, but the resolution rate tells you the swarm is currently inside the house. By the time you finish recon, half the obvious surfaces will be gone. You can still win there — but only on the non-obvious surfaces (MCP boundary, Constitutional Classifier internals, Claude Code’s hook execution), not on whatever’s clickable on claude.ai.

Vercel OSS was the most interesting case because the headline numbers look bad: 3,302 reports in 90 days, only 58 resolved. ~98% noise floor. But that’s the AI-spam epidemic, not real competition. The signal in those numbers is different: triage is picky, not swamped by skilled hunters. The top hacker only has 187 rep. Nobody owns this program yet. And the scope is whitebox source code — Next.js, the AI SDK, Workflow, Flags — which filters out the spray-and-pray crowd entirely because reading TypeScript is too much work for them.

So the heuristic I keep coming back to is three numbers, not one:

  1. Resolutions per day since launch — proxy for how fast the obvious surface is being consumed.
  2. Top hacker’s reputation on the program — high means somebody has already mapped everything and you’re competing with their muscle memory.
  3. Resolved-to-reported ratio — low can mean spam (good for you) or genuine difficulty (also good for you). High means easy bugs that are now gone.

The classic dup-trap is “150+ undisclosed reports × 2-month-old program × top-level GraphQL field.” I burned a session on that one already. The Anthropic version of that trap would be “drive-by jailbreak on claude.ai chat UI.” It looks novel because the program is new. It’s not novel. Eighteen other people are typing it right now.

The pick this round was Vercel OSS, on the AI SDK or the Workflow package specifically. Whitebox is a different sport — you’re reading code on GitHub, not poking endpoints — but the dup-probability is dramatically lower because nobody else is doing the work.

Pick by saturation, not by logo.