I've been building my AI assistant setup with OpenClaw, and recently tried to create a custom "blog-publishing" skill to automate my workflow. After spending hours configuring everything, I discovered it simply doesn't work—and it's not just me.
OpenClaw's skill system allows you to extend your agent with custom skills. According to the docs, you can place custom skills in:
~/.openclaw/workspace/skills/ (highest priority)~/.openclaw/skills/ (shared across agents)skills.load.extraDirs in configBut when I placed my custom skill there and restarted the gateway, it never appeared in the skills list. Only the bundled skills (5 out of 51) loaded.
After some research, I found GitHub Issue #10386:
"The agent is unable to discover or register custom skills located in the workspace or defined via extraDirs. While the agent can 'see' the files via directory listing commands in a chat session, the system registry fails to ingest them, only showing the 50 default bundled skills."
This bug affects multiple users running different setups (Docker, bare metal, various Linux distributions). The issue has been open since version 2026.2.3-1, and persists in the latest version (2026.2.22-2 as of this writing).
While waiting for a fix, here's what works:
Custom skills are essential for power users who want to:
Without custom skills, OpenClaw becomes much less customizable. I hope this gets fixed soon—it's a critical feature for the platform's extensibility.
Despite the bug, I created my blog-publishing skill anyway:
SKILL.md~/.openclaw/workspace/skills/blog-publishing/It works—you're reading this blog post! But it's not as elegant as it should be.
This post started as a complaint about custom skills not loading. That was a real operator problem at the time, and later OpenClaw releases made workspace skill discovery much less fragile for this setup.
The more durable lesson showed up in a different place: generated Markdown. A nightly generated-memory report produced a heading-level jump that tripped a Markdown structure rule and broke the automation even though the underlying summary content was fine.
The first tempting fix was too simple: make the generator emit one heading depth everywhere. The real fix had to respect two output shapes:
That is the same kind of contract thinking custom skills need. A skill is not only a folder that loads. It is also an artifact producer, and the artifacts need tests at the boundary where users actually consume them.
For generated agent output, test the rendered artifact shape, not just the prompt intent.
The useful regression test was not “did the agent mention the right topic?” It was “does the generated Markdown have a valid heading hierarchy in each supported context?” That is a small distinction, but it is the difference between a pretty summary and a reliable automation surface.
The later closeout made the generated-artifact lesson sharper. The bug was small: one generated report jumped heading levels in a way the Markdown checker correctly rejected. The fix was not to make every generated heading globally shallower or deeper. That would only move the bug.
The real contract had two valid shapes:
Those two surfaces need different heading depths. A useful regression test therefore has to render both surfaces and validate the final artifact, not just inspect the shared prose generator.
Generated content has to satisfy the consumer's structure, not the generator's internal convenience.
The generated-Markdown regression eventually deserved its own treatment because it is no longer only a skill-system footnote. The deeper lesson is about output contracts: one generated fragment had to be valid both as an inline note block and as a standalone report, so the final validation needed to test both rendered surfaces.
I wrote that version here: One Heading Level Broke the Nightly Build. This original post stays useful as historical context for extension discovery; the new post focuses on generated artifacts as software interfaces.
If you are reading this as a current OpenClaw operator, treat the original custom-skill loading bug as historical context rather than a fresh diagnosis. The durable takeaway is broader: extension systems need both discovery checks and artifact-contract tests.
If a skill loads but the files it generates are structurally invalid, the extensibility story is still incomplete.