The Real Question Wasn't “What Model Next?”
When Gemini preview capacity collapses, the obvious question is: what should I try next?
The more important question is: where should fallback policy live?
I run Gemini through an ACP-backed path inside OpenClaw—a wrapper/runtime layer that lets Gemini act as a consultant inside a larger agent workflow. When that path fails, the user-facing symptom can be something generic like acpx exited with code 1, while the actual cause is upstream 429, RESOURCE_EXHAUSTED, or MODEL_CAPACITY_EXHAUSTED.
That distinction matters because a model-capacity problem, an auth problem, and a local wrapper bug should not trigger the same fallback behavior. If your system collapses all of them into “Gemini failed,” the fallback strategy is already wrong.
The design problem was not “which model should come second?” It was “how do I keep model policy out of the wrong layers?”
The Three Wrong Places to Put Fallback Policy
Once I knew I needed fallback behavior, there were three obvious places to put it. All three were bad.
1. Gemini CLI settings
This is the tempting shortcut: repoint one alias when preview gets unreliable and call it a day.
That works once. It fails as a design. The provider settings should stay thin and concrete: alias-to-model mapping, auth mode, and tool-local defaults. The moment they start carrying lane order, rescue-lane logic, and failure triggers, they stop being a clean adapter surface and start becoming an orchestration layer in disguise.
2. Every workflow that happens to use Gemini
This is worse. If panel review code, summarization code, and research code all start manually walking preview -> stable -> fast, model policy gets scattered across business logic. A future model change becomes archaeology.
3. Operator memory
This is the classic 2 AM trap: “I know what to do when preview is down.”
That remains true until the failure mode turns out not to be preview capacity, or until someone else touches the system, or until yesterday’s outage shape stops matching today’s reality.
The Design That Survived the Internal Panel Review
The design that survived an internal five-model panel review was simple:
- define semantic lanes
- keep lane order and failure triggers in OpenClaw-owned config
- keep Gemini settings limited to alias -> concrete model mapping
- use a small Gemini-specific wrapper to execute that lane plan
- let higher-level workflows ask only for the logical Gemini consultant
The conceptual lane stack looked like this:
{
"profiles": {
"consult-primary": {
"order": ["preview-primary", "stable-primary"],
"rescueOrder": ["fast-fallback"],
"enableRescueLane": false,
"fallbackOn": ["429", "RESOURCE_EXHAUSTED", "MODEL_CAPACITY_EXHAUSTED"]
}
},
"lanes": {
"preview-primary": { "alias": "gemini-preview" },
"stable-primary": { "alias": "gemini-stable" },
"fast-fallback": { "alias": "gemini-fast", "disabledByDefault": true }
}
}
That separation gave each layer one job:
- Gemini settings answer: what concrete model does this alias mean?
- OpenClaw lane config answers: in what order should I try lanes, and under what failure class?
- The wrapper answers: how do I execute that policy and log what happened?
- The workflow answers only: I need the Gemini consultant role.
preview-primary -> stable-primary -> fast-fallback. The current live deployment now runs stable-primary -> preview-primary as the normal route, with the fast lane still defined but disabled by default. That change happened because lane-health probes mattered more than design preference.
Why “Quality-Preserving Degradation” Was the Right Mental Model
Not every fallback is equivalent.
If a high-end preview lane is overloaded, the next question should not be “what is the fastest thing that still returns text?” The next question should be:
What is the best available downgrade that preserves answer quality for the kind of work I am doing?
For my use case, the conceptual answer was:
- preview-primary — highest-quality, less stable lane
- stable-primary — quality-preserving fallback
- fast-fallback — emergency rescue lane, intentionally lower tier
The important rule was that the fast lane should exist, but it should not silently become the everyday default just because it happens to be available.
That is still the design principle I like most here: make the downgrade legible. If you are using a fast rescue lane, that should mean something operationally. It should not masquerade as a normal quality path.
Failure Classes Should Map to Different Behaviors
This was the most reusable lesson in the whole exercise.
| Failure shape | What it probably means | What I want the system to do |
|---|---|---|
429 / MODEL_CAPACITY_EXHAUSTED on preview |
Temporary capacity pressure on the preferred lane | Try the stable lane next; maybe use fast only if rescue is intentionally enabled |
429 on stable too |
The quality lanes are unhealthy right now | Either use an explicit fast rescue route or mark Gemini degraded |
| Auth / permission error | Configuration or account problem | Stop immediately; do not pretend another lane solves it |
| Generic ACP exit with no useful classification | Wrapper symptom, not root cause | Run direct lane-health probes before blaming local ACP plumbing |
| All lanes unhealthy | Gemini is effectively unavailable for this workflow | Continue the wider panel in degraded mode and say so plainly |
What the Wrapper Solved — and What It Didn't
The wrapper itself stayed intentionally boring:
resolve profile -> lane order
for each lane:
run Gemini with that alias
if success: return
if error is capacity-only: try next lane
else: stop and surface the failure
if all lanes fail: mark Gemini degraded/unavailable
That design did a few useful things immediately:
- fallback became capacity-aware at startup, instead of blanket retry-everything chaos
- event logging could explain whether a run succeeded on the first lane or fell back
- the rest of the consultation workflow stopped hardcoding Gemini model tiers
- lane policy became editable in one place
That correction mattered more than the original implementation.
The first version solved startup-layer lane choice cleanly. Later live failures showed that some Gemini ACP failures happened after the ACP process started successfully, during the prompt/session phase. At that point, the startup wrapper was already out of the decision path. The failure surfaced outward as a generic ACP exit, and the fallback chain never had a chance to run.
Fallback at startup is an operational improvement. Fallback at prompt/session time is the real long-term reliability fix.
That is why I now think the right architecture description is:
- wrapper-layer fallback is a good first operational layer
- prompt/session-layer fallback is still the real engineering target
Health Probes Beat Preference
The original conceptual story was preview -> stable -> fast. The live story got messier.
At one point, direct lane probes showed both quality lanes unhealthy while the fast lane still worked. That is exactly the sort of moment when design preference has to lose to operational evidence.
A sanitized failure-state probe looks like this:
- preview-primary: capacity_exhausted
- stable-primary: capacity_exhausted
- fast-fallback: ok
That is the sort of result that justifies a temporary emergency route. But it is not the whole story forever. Before publishing this post, a fresh probe showed all three lane aliases healthy again. The point of the health gate is not to memorialize one outage. The point is to keep route changes tied to current evidence instead of static preference.
May 2026 Follow-Up: Drift Closure Is a State Machine
A later Gemini ACP closeout made the fallback lesson more concrete. The route had multiple possible failure meanings: capacity pressure, auth drift, upstream dependency movement, and a local adapter compatibility seam. Treating all of those as “Gemini is broken” would have pushed the system toward the wrong fix.
The useful closeout split was:
| State | What it means | Best next action |
|---|---|---|
| Passive watch | Readiness is green and the remaining risk is future drift. | Close active implementation work, keep the watch surface, and write reopen criteria. |
| Upstream wait | The meaningful movement belongs to an upstream issue, pull request, or provider surface. | Monitor for maintainer feedback, check failures, or fresh provider evidence instead of patching locally. |
| Local repair | A narrow wrapper or adapter incompatibility is reproducible at the integration boundary. | Patch the compatibility seam in an isolated checkout and preserve the broader lane policy. |
That distinction kept the system from turning every degraded coding-agent lane into a configuration change. If the health probe is green and the remaining dependency is already watched, the honest state is passive monitoring. If a backend rejects a control message before work starts, that is a local adapter contract bug. Different state, different response.
The Design Rule I Trust Now
The most durable lesson was not Gemini-specific. It was this:
Keep policy separate from mapping, and separate again from execution.
In plain English:
- do not bury lane policy inside the provider’s local settings file
- do not scatter tier-walking logic across workflows
- do not let a wrapper become your only explanation of runtime health
- do keep one owner-controlled policy file that defines the semantic route
- do keep one adapter layer that executes that policy
- do keep one probe path that tells you whether the preferred route still matches reality
That turns model fallback from folklore into an operating surface.
What I'd Tell Anyone Building Agent Fallback Now
- Classify failure shapes before you classify models.
- Prefer the best downgrade, not the first downgrade.
- Put fallback policy in one owner-controlled place.
- Make lane order cheap to change.
- Separate startup fallback from prompt-time fallback in your head and in your code.
- Use live health probes, not yesterday’s theory.
The real mistake is not failing over from a preview model. The real mistake is building a system where model policy lives in random places and then acting surprised when reliability turns into a scavenger hunt.