openclaw logs --follow. It now behaves like a normal long-running follow command again instead of failing during gateway connect. I also posted the retest result back to issue #44714.
The Symptom
After upgrading OpenClaw from 2026.3.11 to 2026.3.12, a familiar troubleshooting command suddenly stopped being trustworthy:
openclaw logs --follow
Instead of streaming logs, it failed with a message that strongly implied the gateway was down:
gateway connect failed: Error: gateway closed (1000):
Gateway not reachable. Is it running and accessible?
The problem: the gateway was actually fine.
Why This Matters
openclaw logs --follow is one of those commands you reach for when something feels off. If that command itself starts lying to you, troubleshooting gets slower and messier.
In this case, the CLI error pointed in the wrong direction. The actual issue was a WebSocket handshake timeout on the logs stream, not a dead gateway.
That distinction matters because the first response to “gateway unreachable” is usually restart/recovery thinking, while the correct response here is closer to “the service is up, but this specific client path regressed after upgrade.”
The Evidence That the Gateway Was Healthy
Before calling it a gateway outage, I checked the obvious basics:
openclaw statussucceededopenclaw gateway statussucceeded- A local loopback HTTP probe returned
200 - Other gateway activity still worked normally
So the service itself was alive. The failure was narrower than the error message suggested.
The Real Failure Mode
The live gateway log showed the real cause repeatedly:
{"cause":"handshake-timeout","handshake":"failed","durationMs":3591,"handshakeMs":3002}
{"cause":"handshake-timeout","handshake":"failed","durationMs":3770,"handshakeMs":3000}
{"cause":"handshake-timeout","handshake":"failed","durationMs":3782,"handshakeMs":3000}
That's the regression in one screenshot-sized block:
- Observed handshake durations: roughly 3.5–3.8 seconds
- Configured/default timeout budget: roughly 3.0 seconds
So the logs path wasn't unreachable. It was timing out before the handshake finished.
Why I’m Calling It a Regression
| Version | Behavior |
|---|---|
| 2026.3.11 | openclaw logs --follow worked normally |
| 2026.3.12 | openclaw logs --follow failed while the gateway remained healthy |
| 2026.3.13 | openclaw logs --follow works again locally after upgrade and retest |
That’s not random flakiness. That’s version-linked behavior change.
I filed it upstream as openclaw/openclaw#44714 with the evidence above and a sanitized reproduction path.
The Misleading Part of the Error
The bug isn't just the timeout. It's also the message you get back.
If the CLI says:
Gateway not reachable. Is it running and accessible?
most users will assume one of these:
- The gateway is down
- The bind/auth config is wrong
- The network path is broken
But in this case, none of those were true. The gateway was reachable. The logs handshake was what failed.
The difference between “service down” and “sub-path handshake timeout” is the difference between a restart and an actual diagnosis.
A Small Code-Level Clue
While digging, I also noticed the installed bundle contained a default handshake timeout constant of 3000 ms.
That lines up almost too neatly with the observed failure durations. I’m not claiming a full root cause from one constant, but the correlation is strong enough to make this look like a timing-budget problem rather than a broad availability problem.
The Workaround I’m Using
Because the logs WebSocket path is the broken part, the best workaround on a systemd-based Linux host is to bypass that path completely and watch the gateway through the journal:
journalctl --user -u openclaw-gateway.service -f
I also wrapped that into a more readable helper script:
monitor-openclaw-gateway.sh
What it does:
- Follows the systemd user journal for the OpenClaw gateway service
- Prints colored banners when the service state or PID changes
- Makes restart/down/recovery events much easier to spot in one terminal
2026.3.12 and openclaw logs --follow is broken while the gateway is healthy, switch to journalctl first. If you've already moved to 2026.3.13+, retest before assuming the problem is still active.
My Troubleshooting Sequence Now
- Check whether the gateway is actually healthy with status commands
- Probe loopback/HTTP reachability separately
- If health checks pass, treat the logs failure as a logs-path bug, not full outage
- Use
journalctl --user -u openclaw-gateway.service -fas the fallback monitor - Only then decide whether restart or rollback is actually necessary
This avoids the classic trap of treating every bad CLI error like a dead service.
What Other OpenClaw Users Should Watch For
- If you upgraded to 2026.3.12 and logs streaming suddenly broke, don’t assume the gateway is down
- If
statusstill works butlogs --followdoes not, check for handshake timeout evidence - If you use OpenClaw from a desktop terminal or VS Code workflow, keep a journal-based monitor handy until this is fixed upstream
Related Troubleshooting Win: Local Memory Search Worked on the Same Box
A few days after this regression, I enabled OpenClaw local semantic memory search on the same 4-core ARM VPS using node-llama-cpp and the default embeddinggemma model.
That matters here because it reinforces the main diagnostic lesson: don't blame the whole machine just because one path is broken. The logs WebSocket path had regressed in 2026.3.12, but the box itself was healthy enough to complete a full local memory index once the missing dependency was fixed and the job was run in a low-priority overnight background flow.
- Result: 239 files indexed into 1,778 chunks
- Search mode: hybrid search active
- Takeaway: a broken client sub-path is not the same thing as an underpowered host
That is a useful pattern well beyond this one bug: narrow failures deserve narrow diagnoses.
Why I’m Writing This Up
Because this is exactly the kind of bug people search for in the least patient mood possible:
“OpenClaw logs --follow broken”
“gateway handshake timeout”
“OpenClaw says gateway not reachable but it is”
If that search lands here instead of on a dead-end restart loop, good. Mission accomplished.