← buildbench

Triaging a 754-skill pack down to 11

Someone pointed me at a community repo of 754 cybersecurity “skills” for AI agents — MITRE/NIST/D3FEND-mapped, Apache-licensed, the works. The obvious move is git clone and let the agent have everything. I didn’t, and the reason is worth writing down.

The always-on tax

Claude Code loads every skill’s name and description into the system prompt on every turn so the model can decide which to invoke. The body of a skill is lazy-loaded, but the index isn’t. 754 entries means a permanent context tax on every message, and — more importantly — a much harder selection problem for the model. When ten skills plausibly match a query, the right one gets picked less often than when two do.

So the question isn’t “are these skills good?” It’s “is each one worth a permanent slot in the always-loaded index?” Most aren’t.

The three-tier filter

I keyword-grepped the index for the surface area I actually work on (mobile, APK, Frida, GraphQL, WebSocket, JWT, OAuth, IDOR, SSRF, recon). 78 of 754 matched. Then I sorted those into three buckets:

Tier 1 — fills a gap I don’t already have docs for. Things like performing-dynamic-analysis-of-android-app (I have zero Frida notes), performing-mobile-app-certificate-pinning-bypass (needed the moment an OkHttp-pinned app shows up), testing-android-intents-for-vulnerabilities. About 10 skills. These earn their context slot.

Tier 2 — generic bounty playbooks. BOLA, mass assignment, JWT-none, OAuth misconfig, request smuggling, rate-limit bypass. Useful, but not always-loaded useful. I can gh api them on demand when the situation arises. ~15 skills.

Tier 3 — off-scope or redundant. Anything implementing-*, securing-*, detecting-* (defensive/blue-team). Forensics and Cellebrite. AD red-teaming. iOS-only Frida flows when I have no iOS device. Subdomain enumeration with subfinder, which is already two lines in my CLAUDE.md quickstart. ~53 skills. Skip entirely.

The cheap heuristic

Before going through them one by one I spot-checked one Tier 1 skill (performing-dynamic-analysis-of-android-app/SKILL.md) for quality. Real commands, prerequisites, OWASP MASVS framing. Solid. I extrapolated — community packs tend to have uniform quality once you’ve seen one — and skipped re-reading the other ten. If the spot-check had been bad I would have walked away from the whole repo.

The selection rule that fell out:

Install a community skill only if it fills a gap your existing docs don’t cover and you’d otherwise have to look up at runtime.

“Useful playbook I might want someday” is not enough. That goes in a bookmarked clone, not in the always-loaded index.

What I shipped

A re-runnable scripts/install-tier1-skills.sh that pulls 11 SKILL.md files into .claude/skills/<name>/. Re-running overwrites, so upstream updates flow in. No git submodule, no vendoring of the whole repo, no clever symlink dance — just curl into the directory Claude Code already auto-discovers.

754 → 11 is a 98.5% rejection rate. That feels about right for any community pack against a narrow personal workflow. The interesting number isn’t how many you install. It’s how confidently you can say no to the rest.