Building Multi-Agent Trading Systems with Claude (the Subscription-First Way)
Calling the LLM API 32 times per cycle is a great way to burn $4,000 a month. Calling Claude via subscription CLI and falling back to the API only when needed is how iQntX runs an institutional-grade brain on a retail budget.
The cost problem nobody talks about
When you build a multi-agent AI trading system, every agent that needs reasoning makes an LLM call. iQntX has 32 agents. A complete decision cycle — from chart capture to executed order — typically fires 5–15 LLM calls (regime classification, trade decision, fact check, double check, journal, etc.).
At 5 LLM calls per cycle, running 24×7 with new market data every minute, you're looking at ~7,000 LLM calls per day. On Claude Sonnet at typical retail rates, that's ~$50–80/day, or $1,500–2,400 per month. On Opus, it's 3–5x that.
This is the cost ceiling that has kept multi-agent trading systems out of retail hands. iQntX gets around it by routing subscriptions first, API last.
The four-route architecture
iQntX defines four LLM routes with explicit priorities:
| Priority | Route | What it is | Cost model |
|---|---|---|---|
| 1 | Claude CLI | claude command line, signed into Max x20 | Flat subscription (~$200/mo for Max x20) |
| 2 | Codex CLI | codex command line, signed into Codex subscription | Flat subscription (~$20/mo) |
| 3 | Claude API | Anthropic API direct | Per-token, ~$3/M input / $15/M output (Sonnet 4.6) |
| 4 | OpenAI API | OpenAI direct | Per-token, varies by model |
The LLM Router picks the highest-priority route whose circuit breaker is closed.
Why CLI first
Two reasons: cost and rate limits.
Cost. The Claude Max x20 subscription is a flat $200/month and (subject to fair use) is essentially unlimited for the kinds of small, fast reasoning calls iQntX makes. The Anthropic API meters by token; a 32-agent system at 7,000 daily calls would cost roughly $1,500–2,400/month at Sonnet rates. The subscription is at least 10x cheaper for this workload.
Rate limits. API rate limits are per-token per minute. Subscription CLI is gated by the subscription's own throttling, which for Max x20 is generous enough that a single multi-agent system never hits it under normal operation.
The trade-off is that subscription CLI is slightly slower (2–5s vs 1–4s for the API), but trading-system decisions are not HFT — adding 2 seconds to a 30-second end-to-end decision cycle is in the noise.
Why API fallback
Three failure modes happen, occasionally:
- Subscription CLI crashes. A subprocess can hang. The router needs to recover without manual intervention.
- Anthropic / OpenAI rolls out a CLI update that breaks compatibility. Rare but real.
- The operator wants to burst above subscription limits. During a black-swan day, a 32-agent system might want to fire 50+ decisions per minute — beyond the subscription's comfortable throughput.
The API routes handle all three. They're slower and more expensive per call, but they're always there, and the cost ledger logs exactly how much each fallback cost.
Circuit breakers, properly implemented
A circuit breaker is one of the most important patterns in distributed systems and one of the most commonly misimplemented. Done right, it prevents 32 agents from hammering a broken endpoint and creating an outage cascade.
The implementation pattern iQntX uses:
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_seconds=60):
self.failure_threshold = failure_threshold
self.recovery_seconds = recovery_seconds
self.failures = 0
self.opened_at = None
self.state = "closed" # closed | half_open | open
def before(self):
if self.state == "open":
if time.time() - self.opened_at > self.recovery_seconds:
self.state = "half_open"
return True # trial request
return False # short-circuit; don't call
return True # closed or half_open
def after(self, success: bool):
if success:
self.failures = 0
self.state = "closed"
else:
self.failures += 1
if self.failures >= self.failure_threshold or self.state == "half_open":
self.state = "open"
self.opened_at = time.time()
Three crucial details most implementations get wrong:
- Half-open allows exactly one trial. Some implementations let several requests through during half-open, which defeats the purpose — you want exactly one trial to test recovery.
- Counter resets on success. If you decrement on success instead of resetting, intermittent failures keep the breaker permanently dangerous.
- Recovery is exponential. Successive open-state durations should grow (e.g., 60s → 120s → 240s → 480s) until a real human intervention happens. iQntX caps at 1 hour to bound damage.
Naive retries make outages worse. If 32 agents each retry 3 times on a failing endpoint, you've turned a 32-call problem into a 96-call problem. Circuit breakers stop retries during outages, which is the only correct behavior.
The cost ledger
Every LLM call writes one row to llm_cost_ledger:
| Column | Purpose |
|---|---|
id | UUID |
agent | Which agent made the call |
route | claude_cli / codex_cli / anthropic_api / openai_api |
model | Specific model used |
input_tokens | Prompt tokens |
output_tokens | Completion tokens |
cost_usd | Computed cost (0 for subscription routes) |
latency_ms | Round-trip time |
success | Boolean |
created_at | Timestamp |
You can SELECT against this table to see exactly where the money went:
SELECT
route,
COUNT(*) AS calls,
SUM(cost_usd) AS total_cost,
AVG(latency_ms) AS avg_latency
FROM llm_cost_ledger
WHERE created_at >= NOW() - INTERVAL '1 day'
GROUP BY route
ORDER BY total_cost DESC;
For a healthy subscription-first deployment, you'll typically see:
- claude_cli: 4,000 calls, $0.00, avg 2.8s
- codex_cli: 1,200 calls, $0.00, avg 3.1s
- anthropic_api: 80 calls, $1.20, avg 1.6s
- openai_api: 5 calls, $0.04, avg 1.4s
Total daily LLM cost: $1.24. Total monthly LLM cost: ~$37 plus the two subscriptions ($220).
Skill-based prompting
Each iQntX agent has one or more skills, defined as markdown files under skills/iqntx-*/SKILL.md. A skill is a structured prompt with three sections:
- Persona — who is reasoning (e.g., "You are the RiskGate agent of a 32-agent AI hedge fund").
- Inputs — what arrives in the prompt (e.g., "stance, proposed trade, current exposure, drawdown headroom").
- Output schema — what the response must look like (e.g., a JSON object with
decision,reasoning,caveats).
Skills are loaded by the SkillLoader at boot and reloaded on file change — so an operator can iterate on prompting without restarting the service.
Skills go in version control and become the single most readable artifact in the codebase. Anyone (operator, auditor, future engineer) can read the skill files and understand exactly what each agent is being asked to do.
What this looks like in production
A complete decision cycle for an EURUSD setup:
| Step | Agent | Route used | Tokens | Cost |
|---|---|---|---|---|
| 1. Regime classification | Phase1Pipeline | claude_cli | 1.2K in / 200 out | $0 |
| 2. Macro check | MacroOfficer | claude_cli | 0.8K in / 150 out | $0 |
| 3. Setup proposal | StrategistAgent | claude_cli | 2.5K in / 400 out | $0 |
| 4. Risk gate | RiskGateAgent | claude_cli | 1.0K in / 100 out | $0 |
| 5. Fact check | FactCheckerAgent | claude_cli | 1.2K in / 150 out | $0 |
| 6. Double check | DoubleCheckerAgent | claude_cli | 1.5K in / 200 out | $0 |
| 7. Journal | JournalAgent | claude_cli | 0.5K in / 300 out | $0 |
Total cost: $0. All seven calls hit the Claude CLI subscription. At API rates, that single decision cycle would cost ~$0.06–0.10. Over 100 cycles a day, that's $6–10/day vs $0 on subscription.
What you need to replicate this
- Claude Max x20 subscription + the
claudeCLI installed and signed in on your trading host. - Codex CLI + a Codex subscription (optional but cheap and useful).
- Anthropic API key for fallback.
- OpenAI API key for fallback.
- A subprocess wrapper that spawns the CLI, sends prompts on stdin, parses stdout, and times out cleanly.
- The circuit breaker + cost ledger described above.
That's the entire LLM-routing stack. Maybe 400 lines of Python. The rest of the system (the 32 agents, the message bus, the EA) is a much larger investment, but the routing layer is small and replicable.
Why this matters for retail multi-agent systems
The honest truth about multi-agent trading systems is that they've been institutional-only until 2025 because the LLM cost made retail economics impossible. A 32-agent fund running on per-token billing costs $1,500+/month at minimum. Most retail traders won't pay that on top of the brokerage costs.
Subscription-first routing collapses that cost ceiling. Now a retail operator with a Max x20 subscription can run an institutional-grade agent topology for $220–280/month all-in (subscription + VPS + small fallback). That puts hedge-fund-grade discipline within reach of a serious retail trader for the first time.
The architecture is the unlock. The architecture is the moat.
Keep reading
- How a 32-Agent AI Hedge Fund Beats a Single-Model Bot — the org chart.
- The Anatomy of a Drawdown — what survival actually costs.
- Why Most MT5 EAs Fail — the layer the LLMs sit on top of.
Writes about multi-agent AI trading architecture, hedge-fund operations, and risk discipline for retail and prop-firm traders.
Questions readers ask about this
If you find a question we should add, send it to hello@iqntx.com.
Why CLI-first and not just API-first?
Cost. The Anthropic Claude API meters by token; the Claude Max x20 subscription is a flat monthly fee. A 32-agent system making thousands of small reasoning calls per day costs $1–3 per call on the API but is essentially free on the subscription (within fair-use). CLI-first lets us reason cheaply for the common case and pay per-token only for the bursts.
What about latency? Isn't the API faster than the CLI?
Marginally — typical Claude API latency is 1–4 seconds for a small reasoning call; CLI is 2–5 seconds. For a trading decision that takes 30+ seconds end-to-end (chart capture → ingress → pipeline → router → reasoning → fact check → execute), the CLI overhead is in the noise. We're not HFT.
What's a circuit breaker in this context?
A pattern from distributed systems: if a route fails N consecutive times, the breaker 'opens' and routes are skipped until a cooldown elapses. Then a 'half-open' state lets one trial request through. If it succeeds, the breaker closes. If it fails, the cooldown extends. This prevents 32 agents from hammering a broken endpoint.
Why both Claude and OpenAI?
Different models have different strengths. Claude is preferred for reasoning-heavy decisions (regime classification, trade reasoning, journal writing). Codex / GPT is preferred for code-shaped tasks (parsing JSON, validating regex, generating MQL5 fragments). Having two providers is also a hedge against provider downtime.
How does the router decide which route to use?
Each route has a priority (CLI > API), a circuit breaker state (closed / half-open / open), and a per-route cost ledger. The router picks the highest-priority route whose breaker is closed. If all are open, it raises an exception that surfaces to the EmergencyAgent. Cost-aware routing is configurable: an operator can cap monthly API spend, and the router refuses API calls once the cap is hit.
What's the rough cost of running iQntX per month?
If you have a Claude Max x20 subscription and a Codex CLI subscription (both ~$20/month each as of 2026), most reasoning runs on those. The Anthropic API and OpenAI API are fallbacks; expect $20–80/month if your CLIs are stable, more during outages. Plus a Windows VPS (~$30–60/month) and Supabase free tier.
Can I run this without subscriptions, on API only?
Yes, but expect $300–800/month in API costs for an active 32-agent setup. The architecture is designed for subscription-first economics; pure-API is a degraded mode the router supports but doesn't optimize for.
Keep reading
RelatedClaude Trading Bot: From Single Agent to Production Multi-Agent System
How a 32-Agent AI Hedge Fund Beats a Single-Model Bot
What Is Multi-Agent Trading? (And Why It Beats Single-Model Bots)
Ready to put this on autopilot?
The waitlist is your fastest path to a private cohort. We open in waves so the system never gets in front of itself.