Last week, my blog scout cron job started crashing with a cryptic error:
InternalError.Algo.InvalidParameter: Range of input length should be [1, 73728]
At first, I was confused. I'm using GLM-5 through Alibaba Cloud Bailian, which advertises a 202,752 token context window. My agent was only loading about 60K tokens of context—well under the limit. What was going on?
The blog scout job reads a lot of context:
All of this adds up. But with a 202K context window, I should have plenty of room, right?
After digging through documentation (in Chinese, which added to the challenge), I found the real limits. Alibaba Cloud Bailian imposes a platform-level input limit that's separate from the model's native context window:
| Model | Native Context | Bailian Max Input | Gap |
|---|---|---|---|
| GLM-5 | 202,752 | 73,728 | -64% |
| GLM-4.7 | 169,984 | 73,728 | -57% |
| GLM-4.5 | 131,072 | 98,304 | -25% |
That's right—the API only accepts 73K tokens, even though the model can theoretically handle 202K. This is documented in the official English docs (also available in Chinese), but it's easy to miss if you're skimming.
If you're building AI agents that:
...you might silently exceed the platform limit even though the model claims to support a much larger context.
And the error message doesn't help—it just says "input length should be [1, 73728]" without explaining why.
Configure your agent framework to respect the platform limit, not the model's advertised context window. In OpenClaw, that looks like this:
{
"agents": {
"defaults": {
"models": {
"bailian/glm-5": {
"contextWindow": 65000
}
}
}
}
}
Notice I set it to 65K, not 73K. That leaves a buffer for:
InternalError.Algo.InvalidParameter: Range of input length should be [1, 73728]
The number at the end (73728) is your actual platform limit, not the model's native context.
Check both the model's native specs AND the API provider's limits:
# Check model's native context
curl -s https://www.alibabacloud.com/help/en/model-studio/glm | grep -i "context"
# Check for platform-specific limits in API docs
curl -s https://www.alibabacloud.com/help/en/model-studio/api-reference | grep -i "input\|limit"
Add logging to see what you're actually sending:
import tiktoken
encoder = tiktoken.get_encoding("cl100k_base")
tokens = encoder.encode(your_input_text)
print(f"Input tokens: {len(tokens)}")
Some providers show input/output stats in their dashboard:
Set your configured context window to 80-90% of the platform limit:
# Platform limit: 73,728 tokens
# Conservative config: 65,000 tokens (88% of limit)
"bailian/glm-5": {
"contextWindow": 65000
}
API providers often impose their own limits. The model might support 202K, but the platform you're accessing it through might only accept 73K.
Especially when dealing with translated documentation. The limits were clearly documented—I just didn't look closely enough.
Set your context windows conservatively. Better to trigger compaction early than to hit a hard platform limit mid-request.
My logs showed "compaction wait aborted" multiple times—the framework was trying to compact context but timing out before the API call. This was a hint that something was wrong with the context sizing.
I suspect this isn't unique to Bailian. Many API providers likely impose platform limits that differ from model capabilities:
When debugging context issues, check both the model specs and the API documentation.
On March 14, Anthropic updated its pricing docs to say that Claude Opus 4.6 and Sonnet 4.6 now include the full 1M-token context window at standard pricing. Anthropic's own example is blunt: a 900K-token request is billed at the same per-token rate as a 9K-token request.
That does not invalidate the main point of this post. My debugging problem here was not "long context is expensive." It was "the platform advertised one number and enforced another." Those are different failure modes:
So yes, Anthropic just made long documents and big codebase passes more attractive on Claude. But the operational lesson still holds across the industry:
If anything, Anthropic's pricing shift makes the contrast sharper. Once one provider removes the long-context premium, the next constraint you notice is often not price but the hidden ceiling in the toolchain around the model.
The "202K context" marketing number doesn't tell the whole story. When using models through intermediary platforms, there's often a hidden input limit that can bite you.
Platform limits exist. Check them before you ship.
After adjusting my configuration, the blog scout job runs smoothly again. But this was a frustrating few days of debugging that could have been avoided with clearer documentation—or just reading it more carefully in the first place.