OpenClaw 3 Months — 4 Hidden Traps and How AI Helped Optimize

เราเล่าจากการทดลองจริงในแล็บ ไม่ใช่ทฤษฎี — และให้หลักฐานพูดแทน

จุดเจ็บ: ระบบมักพังตอนที่ไม่มีใครดู กว่าจะรู้ตัวก็เสียหายไปแล้ว

เดิมพัน: ระบบล่มเงียบ ๆ คือฝันร้ายของทุกทีม เพราะความเสียหายโตขึ้นทุกนาทีที่ไม่รู้

สิ่งที่เราทำในแล็บ: Three months running OpenClaw AI Trading — found 4 hidden bottlenecks. AI helped analyze, optimize multi-model routing, and cut costs while improving quality.

Quick Summary — Read This First

What are the 4 hidden traps in the system?
How did the overall architecture change?
How were all 4 issues fixed?
How does Multi-Model Routing work?
Before vs After — what changed?

The AI Agent Team running on OpenClaw + Lark + n8n had been performing well for 3 months — 10 Bots, 53 Workflows, 25 Skills covering every department. But after a full system audit, 4 hidden traps were discovered that caused overspending on API costs, crash loop risks, and preventable errors.

After fixing all 4 issues — API costs dropped by ~50% (measured by token usage over 30 days before/after optimization), response speed improved by ~35% (measured by average response time over 30 days), errors reduced by half (measured from error logs over 30 days), and crash loops were prevented entirely.

Micro-Skills

Workflows

Bot Rooms

Active Users

What are the 4 hidden traps in the system?

Cursor Prompt: "Analyze bottlenecks in a trading system using Claude API + Lark + n8n, then propose an incremental optimization plan without rewriting everything"

A system that "works" does not mean it "works well" — like a car driven daily but never serviced. After a thorough audit, 4 hidden issues were found that would only get worse over time.

Trap #1 — One Model for Everything 💸

Claude Sonnet 4 ($3/million tokens) was used for all 25 skills — whether a simple task like "late attendance notification" or a complex one like "Sales Pipeline analysis." That meant overpaying by 30x for 76% of routine tasks.

Trap #2 — Infinite Restart Loop 🔄

Every container was set to restart: unless-stopped — if a service crashed, it would restart indefinitely. Each restart cycle called the API again, silently burning money while consuming CPU/RAM and slowing down other services.

Trap #3 — Context Bloat (Memory Never Cleaned) 🧠

The memory system had 22 files / 705 lines accumulated over 3 months with no summarization or cleanup. Every time a Bot started a task, it loaded the entire memory — the larger the memory, the slower the response and the higher the token cost.

Trap #4 — False Confidence (No Verification) ✅❌

Validation scripts existed (card_validator, verify-deployment) but required manual execution every time. When forgotten, Bots would report "done" while card formats were broken — forcing the team to fix and resend, wasting time on rework.

All 4 issues share a common root cause: "configured once and never revisited" — a universal trap for any AI system that scales quickly.

Santorini, Greece — Aegean Sea at sundown

How did the overall architecture change?

The diagram below shows the flow before and after optimization — hover over each node for details.

❌ Before Optimization

👤 User types in Lark 20 team members use Lark chat to command Bots daily

→

🤖 OpenClaw Receives command → selects skill → sends to AI model

→

🧠 Sonnet 4 for everything All 25 skills use Sonnet 4 at $3/M tokens — regardless of task difficulty

→

📤 Send Card (no verify) Sends Lark card directly without auto-verify — format errors may slip through

✅ After Optimization

👤 User types in Lark Team uses the same interface — no behavior change required

→

🔀 Smart Router Checks skill type + keyword → automatically selects the right model

↓

⚡ Gemini Flash (76%) Routine tasks — notifications, summaries, health checks use $0.10/M token model

🧠 Sonnet 4 (24%) Complex tasks — Pipeline analysis, Delivery reports, Exec Briefs use $3/M token model

↓

✅ Auto-Verify 5 checks Validates card format, JSON syntax, number commas, unique text, working links — before delivery

→

💬 Lark (verified) Team receives validated cards — reducing rework

📊 Hover over each node for details

How were all 4 issues fixed?

Every fix follows the same principle — prevention before occurrence is better than correction after the fact, because once the problem happens, API costs have already spiked and it is too late.

01 Multi-Model Routing

Routing rules were added to openclaw.json to assign models by task type — 6 skills use Sonnet 4 (analysis, complex reports) | 15 skills use Gemini Flash (notifications, summaries, routine). 76% of routine tasks now use a model that costs 30x less.

02 Anti-Loop Protection

Docker restart policy was changed from unless-stopped to on-failure:5 with an additional rule in CLAUDE.md — "if a task fails 2 consecutive times, stop immediately and notify Admin." This prevents runaway bills from infinite restart loops.

03 Memory Hygiene Rules

New rules were established — memory files over 100 lines must be summarized, files older than 30 days must be reviewed, the index must stay under 50 lines, and duplicate memory creation is prohibited. This prevents context bloat and reduces tokens sent with every request.

04 Auto-Verify Pipeline

A mandatory 5-point checklist was added before every delivery — card_validator.py passes → JSON/YAML syntax valid → numbers have commas → text is not duplicated → links are working. Errors are caught before reaching the team.

Bonus: Beyond the 4 main fixes, Session Pruning (automatic context trimming after 30 turns) and a crontab @reboot rule for auto-starting containers after server reboot were also added.

Overwater bungalows in Maldives — tropical paradise

Take a visual break — Optimizing an AI Agent system is like maintaining a luxury resort. Every component must work in harmony, from infrastructure to cost management.

How does Multi-Model Routing work?

The key to reducing costs is sending tasks to the right model, not always the most expensive one. The Router analyzes skill type and keywords in the message, then decides automatically.

💬 Message from User Every incoming message is analyzed by the Router first

→

🔀 Router (skill + keyword) Checks skill type (e.g., sales-pipeline, hr-attendance) + keywords (e.g., "analyze," "summarize")

↓

🧠 Claude Sonnet 4
$3.00 / M tokens sales-pipeline, pm-task, exec-brief + keywords like "analyze," "strategy" — 6 skills (24%)

⚡ Gemini Flash
$0.10 / M tokens hr-attendance, admin-task, dev-task + keywords like "summarize," "notify" — 15 skills (76%)

⚡ Smart Routing — Tasks go to the right model, not always the most expensive one

Model	Price / M tokens	Used For	Skill Count	Share
Claude Sonnet 4	$3.00 Premium	Pipeline analysis, Delivery, Exec Brief	6	24%
Gemini Flash	$0.10 Budget	Notifications, summaries, health checks	15	76%

Not every task requires an expensive model — 76% of tasks in this system are routine work where Gemini Flash at $0.10 performs just as well.

Before vs After — what actually changed?

The table below compares every aspect that changed — the "Improvement" column provides a quick overview at a glance.

Aspect	❌ Before	✅ After	Improvement
Model Used	Sonnet 4 for all tasks (25 skills) $3/M every skill	Sonnet 4 → 6 skills, Flash → 15 skills $0.10/M for 76%	API cost down 40-60%
Restart Policy	`unless-stopped` on all containers unlimited restarts	`on-failure:5` for API services stops after 5	Crash loop prevention
Anti-Loop Rules	No fail-stop rule unlimited retries	Task fails 2x → stop + notify Admin circuit breaker	Lower error-state bills
Memory	22 files / 705 lines, never summarized growing endlessly	Rule: > 100 lines → summarize, > 30 days → review auto-managed	Token use down 15-25%
Session Pruning	None — context accumulated without limit prompt growing endlessly	Context trimmed after 30 turns, max 20 tool results compact prompt	Token use down 20-30%
Verification	Scripts exist but require manual runs easy to forget	5-point checklist mandatory before delivery auto-verify	Errors down 40-60%
Auto-start	None — reboot required manual start downtime	crontab @reboot → docker compose up auto in 30 seconds	Zero-touch reboot

Hot air balloons over Cappadocia at sunset

Zoom out for perspective — Viewing the entire system from a high level reveals optimization opportunities far more clearly than examining each part individually.

What are the measurable results?

The numbers below represent combined results from fixing all 4 issues — Multi-Model Routing had the largest impact as it directly reduces costs.

~50%

API Cost Reduction

~35%

Faster Response

~50%

Fewer Errors

100%

Crash Loop Prevention

Container Status After Deploy

Container	Status	Restart Policy	Before	Reason
🤖 openclaw	✅ healthy	`on-failure:5`	`unless-stopped`	Uses API → must limit restarts
🔗 lark-mcp	✅ healthy	`on-failure:5`	`unless-stopped`	Connects to Lark API → must limit
⚙️ n8n	✅ healthy	`on-failure:5`	`unless-stopped`	Runs workflows → must limit
🏛️ egp-solver	✅ healthy	`on-failure:3`	`unless-stopped`	Uses heavy Chromium → stricter limit
📊 dashboard	✅ Up	`unless-stopped`	`unless-stopped`	Lightweight nginx, no API → unchanged
💾 duplicati	✅ Up	`unless-stopped`	`unless-stopped`	Backup service → unchanged

How does the Auto-Verify Pipeline work?

🔨 Create Card / Config Bot creates a Lark card or config file

→

1️⃣ card_validator.py Validates 14 rules — layout, fields, format

→

2️⃣ JSON/YAML syntax Ensures file parses correctly with no syntax errors

↓

3️⃣ Number commas $1,000,000 not 1000000 — much easier to read

→

4️⃣ Unique text Closing text must not duplicate the previous run

→

5️⃣ Working links Every URL in the card must be accessible

↓

✅ All 5 pass → Deliver to Lark Cards that pass all 5 checkpoints are automatically delivered to the team

✅ Every Card/Config must pass 5 checkpoints before reaching the team — no shortcuts allowed

What lessons were learned from this audit?

Key takeaways from this audit — applicable to any AI system, not just OpenClaw.

Not every task needs an expensive model

76% of tasks in this system are routine — notifications, summaries, health checks. Gemini Flash at $0.10 handles them just as well. Paying $3.00 every time is unnecessary. Choosing the model that "fits" the task matters more than choosing the "best" model.

A system that "works" is not a "good" system

The Bots worked every day, but hidden costs, hidden risks, and hidden latency were accumulating. Regular audits are like car maintenance — waiting until something breaks means the repair bill will be many times higher.

Prevention beats correction

Anti-Loop + Auto-Verify = preventing problems before they occur. Once an issue happens, API costs have already spiked and the team has already wasted time on rework. By then, it is too late.

Memory needs housekeeping

Without regular cleanup, memory bloats like a house that is never organized — the longer it goes, the slower, more expensive, and harder it becomes to find the information that actually matters.

Building a good AI Agent Team is not just about "creating many Bots" — it is about managing context, cost, and workflow to work together in harmony, just like a real team that needs rules, verification, and the right people assigned to the right tasks.

Files Modified

File	Changes Made	Purpose
`config/openclaw.json`	Added routing rules (2 rules) + session pruning config	Multi-Model Routing + context trimming
`docker-compose.yml`	Changed restart policy for 4 containers + removed duplicate volume	Anti-Loop + bug fix
`CLAUDE.md`	Added sections 9 (Anti-Loop), 10 (Memory), 11 (Auto-Verify)	Rules preventing all 4 issues

Singapore Marina Bay skyline at twilight

Cursor Prompt: "Design a multi-model routing strategy that selects AI models by task type automatically — e.g., Claude for analysis, GPT for generation, Gemini for summarization"

What is planned next?

Three additional improvements are planned to take the system even further.

Priority	Planned Work	Difficulty	Impact
1	Summarizer Agent — n8n workflow to auto-summarize memory every night	Medium	Token use down 15-25%
2	Auto-Test Pipeline — n8n workflow to test cards before actual delivery	Medium	Zero-defect delivery
3	Sub-Agent Spawning — Enabling Agents to handle multiple tasks in parallel	Hard	Throughput up 30-50%

Frequently asked questions?

Why was Gemini Flash chosen over Haiku or other models?

Gemini Flash at $0.10/M tokens is the most affordable option at this quality tier. It responds quickly with low latency, making it ideal for routine tasks like notifications and data summaries that do not require complex reasoning. If quality falls short, the task gets routed to Sonnet 4 instead.

Did changing the restart policy cause more container downtime?

No — on-failure:5 means the container can restart up to 5 times only when it fails. Normal operation is unaffected. It only stops when there are 5 consecutive failures, which indicates a real bug that needs fixing.

Is 705 lines of memory really too much?

Yes — every time a Bot starts a task, the entire memory is loaded into the context window (consuming roughly 2,000-3,000 tokens). Summarizing down to 200 lines retains all critical information while using 3x fewer tokens.

How is the 50% API cost reduction calculated?

76% of skills (15 out of 25) were switched from Sonnet 4 ($3.00/M) to Flash ($0.10/M) = 30x cheaper for that portion. Averaging across the entire system (including the 24% still on Sonnet 4) yields approximately 40-60% savings depending on request volume per skill.

Bottom line: Auditing an AI Agent Team regularly is no different from scheduled car maintenance — waiting until something breaks means the repair bill will be many times higher than a routine checkup. Try applying these 4 strategies to your own system and the difference will be clear.

Last updated: March 2026

#API #Chatbot #Server #TeamWork #Technology #AITools #PromptEngineering #Automation #n8n #Deployment #CursorAI #Docker

สิ่งที่ได้ และหลักคิด

ของจริงที่เอาไปใช้ต่อได้ ไม่ใช่แค่ไอเดีย หลักคิดของเราคือทำให้เป็นระบบที่ทำซ้ำได้และไม่พึ่งความจำคน

อยากเห็นระบบแบบนี้ทำงานกับงานของคุณ — ดู ViberQC และลงชื่อรอรอบทดลองที่ hilogiclabs.com