The Failure Mode: Capability Turns Into Underbrush
Agent skills are useful because they make repeated workflows easier to execute correctly. That is also why they are dangerous.
Once a skill system works, every bit of friction starts to look like a skill candidate:
- “We should have a skill for this recurring cleanup.”
- “This old helper should become triggerable.”
- “This external repo has a neat workflow; maybe install it.”
- “This alias phrase failed once; maybe add a redirect skill.”
Individually, each suggestion can sound reasonable. Collectively, they create a skill jungle. I do not mean “many skills.” I mean unmanaged growth: overlapping entry points, unclear ownership, stale aliases, package artifacts that carry local junk, and a catalog that becomes harder for the agent to choose from.
The maintenance move is not to bulldoze the catalog. It is to prune it and mark its paths: one best owner for each recurring problem, short trigger surfaces, explicit capability gates, and a clear retirement path when a skill becomes only an alias for something better.
A skill catalog is not a trophy shelf. It is part of the runtime decision surface.
That distinction changed how I modernize my OpenClaw skills. I stopped asking, “Can this be made into a skill?” and started asking, “What is the smallest durable artifact that makes the next real execution better?”
Why Catalog Shape Matters More Than It Looks
Modern agent-skill systems commonly use progressive disclosure: the model initially sees each skill's name, short description, and path, then reads the full instructions only after selecting a skill. The Codex skill docs describe exactly this shape and explicitly warn that large skill sets compress or omit descriptions from the initial list.
That means skill governance is not just documentation hygiene. It affects routing. If the short description is fuzzy, the skill may never load. If ten skills all sound adjacent, the wrong one may load. If an old alias remains as a separate skill, it can steal traffic from the real owner or keep a stale mental model alive.
Not every agent host implements skill loading exactly the same way, but the interface lesson travels: the routing metadata is part of the product. Treating it as an afterthought is how a helpful catalog becomes a maze.
Plugin architecture has the same old lesson in different clothing. A plugin should have clear boundaries, a stable interface, loose coupling, and a reason to exist outside the host. General plugin-design writeups emphasize separation of concerns, clear interfaces, and independent maintenance. Agent skills need the same discipline, plus one extra concern: the model has to choose the right one from a compressed catalog.
The Governance Shape That Worked
The modernization pass that finally felt sane had three surfaces. This is where the jungle metaphor turns into maintenance rules:
| Surface | Purpose | What it prevents |
|---|---|---|
| Problem-first skill map | Find the current owner for a problem before creating anything new. | Alphabetical catalogs that look complete but do not answer “what should handle this?” |
| Intake gate | Force the decision: skill, protocol, config, script, note, or no artifact. | Turning every one-off irritation into another permanent skill. |
| Bounded radar | Look for external patterns occasionally and narrowly, after local evidence first. | Trend-following, blind installs, and community-skill collections becoming authority. |
The important word is problem-first. A catalog that starts with skill names is convenient for inventory. A catalog that starts with repeated problems is useful for routing.
Four Modernization Moves That Beat Skill Sprawl
1. Turn bulky skills into thin triggers plus reference protocols
One recurring maintenance workflow needed to remain triggerable, but the execution procedure was too large to live comfortably in the skill body. The better shape was a thin skill that says when to use the workflow, plus a separate reference protocol that holds the detailed steps.
That keeps the trigger surface concise while preserving the full procedure for the moment it is actually needed.
Skill body: when this applies, when to skip it, and which protocol to load.
Reference protocol: detailed steps, thresholds, validation, rollback notes.
This is not just neat organization. It protects the progressive-disclosure path: the initial catalog stays short, and the long procedure only enters context after the skill is selected.
2. Merge stale alias skills into the real owner
Another legacy skill existed mostly because an old phrase was convenient. But the real workflow had grown into a broader maintenance skill with better checks, richer recovery logic, and clearer risk boundaries.
The right answer was not to modernize the old alias into a prettier standalone package. The right answer was to retire the standalone alias and preserve its trigger phrases inside the richer owner.
3. Inspect the package, not just the skill file
A skill can validate cleanly and still package badly. The most concrete example in this pass was a local dependency environment that would have been swept into the skill archive if I only trusted surface validation.
The fix was boring and important:
- move local runtime environments outside the skill folder
- move bulky setup docs into a reference folder
- keep the skill body focused on trigger and workflow contract
- inspect the packaged archive before calling the modernization done
That is the difference between “the skill file looks modern” and “the distributable artifact is clean.”
4. Add a new skill only when a real habit is important but easy to forget
I did add one small skill during this broader cleanup, but only after the intake question passed: the behavior was recurring, cross-cutting, and easy to forget in the moment. The skill did not add a new subsystem. It made an evidence-routing habit explicit.
That is the kind of new skill I trust: small, memorable, low-permission, and justified by repeated misses. Not a mega-skill. Not a trend reaction. Not a second owner for an existing workflow.
Capability Gates Beat Optimistic Skills
A deck-generation workflow gave me the cleanest reminder that skill modernization is also about honest capability boundaries.
The old temptation is to say:
“The deck skill should just figure out a generation route.”
The safer version is:
“The deck skill must declare which route it is using, what inputs that route can preserve, what model/tool capability is required, and whether it is blocked before any side effect happens.”
When a route was missing its required generation configuration, the correct result was not creative improvisation. It was a clean blocked state. When a later route was explicitly authorized, the useful proof was end-to-end: real source packet in, required assets preserved, final artifact produced, degradation clearly labeled.
| Gate | Question | Fail-closed behavior |
|---|---|---|
| Input contract | Are required assets, evidence anchors, and format obligations present? | Stop before generation if the packet is incomplete. |
| Route capability | Can this route actually preserve the required input class? | Declare degradation or choose a different authorized route. |
| Generation config | Is the needed generation lane configured for this tool? | Report blocked; do not silently borrow unrelated credentials or routes. |
| Artifact proof | Did the final output contain what the packet required? | Do not treat service startup or API success as artifact success. |
That lesson generalizes beyond decks. A skill should not hide missing capability behind agent optimism. It should make the capability gate visible.
Detect-Only Beats Auto-Mutation for Living Catalogs
My stage-validation catalog is also a living artifact. Recent usage changes which workflows deserve canary coverage. But “living” does not mean “auto-mutating.”
The pattern I trust is detect-only first:
- scan recent usage and recent stage evidence
- recommend promote, retain, or demote decisions
- explain what evidence was missing or degraded
- mutate the canonical catalog only after explicit approval
This is basically architecture governance at personal-agent scale. Good governance is the thin layer that spots drift, manages risk, and keeps shared decisions coherent without turning everything into bureaucracy. That aligns with broader architecture-governance framing: enough structure to prevent chaos, not enough ceremony to freeze useful change.
Source Hygiene: Do Not Let the Radar Chase Its Own Echoes
A skill radar has a subtle failure mode: it can discover its own old outputs and mistake them for fresh signal.
That is how you get echo-driven skill growth. Yesterday's recommendation becomes today's source. Today's source becomes tomorrow's “trend.” Eventually the system is not discovering new needs; it is recursively amplifying old prose.
The fix is source hygiene:
- local direct evidence beats generated summaries
- fresh user/operator friction beats old scout output
- external collections are pattern sources, not authority
- prior reports can be anti-repeat evidence, but should not be the proof of novelty
- every candidate should still pass the intake gate
This matters because agent skills are sticky. Once a skill exists, it changes future routing. A bad skill is not just a bad document; it becomes a low-grade decision bug.
The Checklist I Use Now
Before creating, modernizing, or keeping a skill, I ask:
- Repeated problem: did this happen often enough to deserve a durable entry point?
- Best artifact: is this really a skill, or would a protocol/script/config/note be cleaner?
- Trigger stability: can I describe when it should and should not load in one short paragraph?
- Owner clarity: is there exactly one best home for this workflow?
- Consolidation path: if a nearby owner already exists, can this become a trigger phrase, reference note, or merge instead of a new standalone skill?
- Package hygiene: does the packaged artifact avoid local caches, virtual environments, secrets, and runtime residue?
- Capability gate: does the skill fail closed when a required tool, route, credential class, or artifact is missing?
- Validation: did at least one realistic prompt hit the right skill and avoid the wrong neighbor?
- Rollback: can I disable, retire, or merge it cleanly if reality changes?
What I Would Not Build Yet
The most useful part of this pass was what I did not build:
- no giant “all maintenance” skill
- no daily external skill trend feed
- no automatic install path from community collections
- no separate redirect skill for every old phrase
- no magical router that hides capability gaps behind confidence
Those would all feel productive in the short term. They would also make the system harder to reason about a month later.
The Bigger Lesson
Skill modernization is not about making every folder prettier. It is about reducing decision entropy.
A healthy skill catalog should make the right workflow easier to find, not make the assistant look more capable by listing more names.
That is the line I am trying to hold now: problem-first discovery, intake before creation, thin wrappers when procedures are long, consolidation when aliases get stale, detect-only refresh for living catalogs, and capability gates that say “blocked” before the agent invents success.
Fewer skills. Better owners. Sharper gates. Much less jungle.