Skip to main content
EUR/USD1.0842 0.32%
XAU/USD2,418.50 1.18%
BTC/USD94,210 0.92%
USD/JPY156.74 0.18%
S&P 5005,842.11 0.41%
NASDAQ 10020,914 0.65%
GBP/USD1.2710 0.21%
WTI Oil68.32 1.42%
DXY104.18 0.11%
EUR/USD1.0842 0.32%
XAU/USD2,418.50 1.18%
BTC/USD94,210 0.92%
USD/JPY156.74 0.18%
S&P 5005,842.11 0.41%
NASDAQ 10020,914 0.65%
GBP/USD1.2710 0.21%
WTI Oil68.32 1.42%
DXY104.18 0.11%
architectureMay 12, 2026 · 8 min read

Building Multi-Agent Trading Systems with Claude (the Subscription-First Way)

Calling the LLM API 32 times per cycle is a great way to burn $4,000 a month. Calling Claude via subscription CLI and falling back to the API only when needed is how iQntX runs an institutional-grade brain on a retail budget.

By iQntX Engineering
ShareXinWA

The cost problem nobody talks about

When you build a multi-agent AI trading system, every agent that needs reasoning makes an LLM call. iQntX has 32 agents. A complete decision cycle — from chart capture to executed order — typically fires 5–15 LLM calls (regime classification, trade decision, fact check, double check, journal, etc.).

At 5 LLM calls per cycle, running 24×7 with new market data every minute, you're looking at ~7,000 LLM calls per day. On Claude Sonnet at typical retail rates, that's ~$50–80/day, or $1,500–2,400 per month. On Opus, it's 3–5x that.

This is the cost ceiling that has kept multi-agent trading systems out of retail hands. iQntX gets around it by routing subscriptions first, API last.

The four-route architecture

iQntX defines four LLM routes with explicit priorities:

PriorityRouteWhat it isCost model
1Claude CLIclaude command line, signed into Max x20Flat subscription (~$200/mo for Max x20)
2Codex CLIcodex command line, signed into Codex subscriptionFlat subscription (~$20/mo)
3Claude APIAnthropic API directPer-token, ~$3/M input / $15/M output (Sonnet 4.6)
4OpenAI APIOpenAI directPer-token, varies by model

The LLM Router picks the highest-priority route whose circuit breaker is closed.

Subscription routes
2
Claude CLI + Codex CLI
API fallbacks
2
Anthropic + OpenAI
Breaker states
3
closed · half-open · open
Typical $/month
$40-80
Subscription + small fallback

Why CLI first

Two reasons: cost and rate limits.

Cost. The Claude Max x20 subscription is a flat $200/month and (subject to fair use) is essentially unlimited for the kinds of small, fast reasoning calls iQntX makes. The Anthropic API meters by token; a 32-agent system at 7,000 daily calls would cost roughly $1,500–2,400/month at Sonnet rates. The subscription is at least 10x cheaper for this workload.

Rate limits. API rate limits are per-token per minute. Subscription CLI is gated by the subscription's own throttling, which for Max x20 is generous enough that a single multi-agent system never hits it under normal operation.

The trade-off is that subscription CLI is slightly slower (2–5s vs 1–4s for the API), but trading-system decisions are not HFT — adding 2 seconds to a 30-second end-to-end decision cycle is in the noise.

Why API fallback

Three failure modes happen, occasionally:

  1. Subscription CLI crashes. A subprocess can hang. The router needs to recover without manual intervention.
  2. Anthropic / OpenAI rolls out a CLI update that breaks compatibility. Rare but real.
  3. The operator wants to burst above subscription limits. During a black-swan day, a 32-agent system might want to fire 50+ decisions per minute — beyond the subscription's comfortable throughput.

The API routes handle all three. They're slower and more expensive per call, but they're always there, and the cost ledger logs exactly how much each fallback cost.

Circuit breakers, properly implemented

A circuit breaker is one of the most important patterns in distributed systems and one of the most commonly misimplemented. Done right, it prevents 32 agents from hammering a broken endpoint and creating an outage cascade.

The implementation pattern iQntX uses:

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_seconds=60):
        self.failure_threshold = failure_threshold
        self.recovery_seconds = recovery_seconds
        self.failures = 0
        self.opened_at = None
        self.state = "closed"  # closed | half_open | open

    def before(self):
        if self.state == "open":
            if time.time() - self.opened_at > self.recovery_seconds:
                self.state = "half_open"
                return True   # trial request
            return False      # short-circuit; don't call
        return True           # closed or half_open

    def after(self, success: bool):
        if success:
            self.failures = 0
            self.state = "closed"
        else:
            self.failures += 1
            if self.failures >= self.failure_threshold or self.state == "half_open":
                self.state = "open"
                self.opened_at = time.time()

Three crucial details most implementations get wrong:

  1. Half-open allows exactly one trial. Some implementations let several requests through during half-open, which defeats the purpose — you want exactly one trial to test recovery.
  2. Counter resets on success. If you decrement on success instead of resetting, intermittent failures keep the breaker permanently dangerous.
  3. Recovery is exponential. Successive open-state durations should grow (e.g., 60s → 120s → 240s → 480s) until a real human intervention happens. iQntX caps at 1 hour to bound damage.

The cost ledger

Every LLM call writes one row to llm_cost_ledger:

ColumnPurpose
idUUID
agentWhich agent made the call
routeclaude_cli / codex_cli / anthropic_api / openai_api
modelSpecific model used
input_tokensPrompt tokens
output_tokensCompletion tokens
cost_usdComputed cost (0 for subscription routes)
latency_msRound-trip time
successBoolean
created_atTimestamp

You can SELECT against this table to see exactly where the money went:

SELECT
  route,
  COUNT(*) AS calls,
  SUM(cost_usd) AS total_cost,
  AVG(latency_ms) AS avg_latency
FROM llm_cost_ledger
WHERE created_at >= NOW() - INTERVAL '1 day'
GROUP BY route
ORDER BY total_cost DESC;

For a healthy subscription-first deployment, you'll typically see:

  • claude_cli: 4,000 calls, $0.00, avg 2.8s
  • codex_cli: 1,200 calls, $0.00, avg 3.1s
  • anthropic_api: 80 calls, $1.20, avg 1.6s
  • openai_api: 5 calls, $0.04, avg 1.4s

Total daily LLM cost: $1.24. Total monthly LLM cost: ~$37 plus the two subscriptions ($220).

Skill-based prompting

Each iQntX agent has one or more skills, defined as markdown files under skills/iqntx-*/SKILL.md. A skill is a structured prompt with three sections:

  1. Persona — who is reasoning (e.g., "You are the RiskGate agent of a 32-agent AI hedge fund").
  2. Inputs — what arrives in the prompt (e.g., "stance, proposed trade, current exposure, drawdown headroom").
  3. Output schema — what the response must look like (e.g., a JSON object with decision, reasoning, caveats).

Skills are loaded by the SkillLoader at boot and reloaded on file change — so an operator can iterate on prompting without restarting the service.

Skills go in version control and become the single most readable artifact in the codebase. Anyone (operator, auditor, future engineer) can read the skill files and understand exactly what each agent is being asked to do.

What this looks like in production

A complete decision cycle for an EURUSD setup:

StepAgentRoute usedTokensCost
1. Regime classificationPhase1Pipelineclaude_cli1.2K in / 200 out$0
2. Macro checkMacroOfficerclaude_cli0.8K in / 150 out$0
3. Setup proposalStrategistAgentclaude_cli2.5K in / 400 out$0
4. Risk gateRiskGateAgentclaude_cli1.0K in / 100 out$0
5. Fact checkFactCheckerAgentclaude_cli1.2K in / 150 out$0
6. Double checkDoubleCheckerAgentclaude_cli1.5K in / 200 out$0
7. JournalJournalAgentclaude_cli0.5K in / 300 out$0

Total cost: $0. All seven calls hit the Claude CLI subscription. At API rates, that single decision cycle would cost ~$0.06–0.10. Over 100 cycles a day, that's $6–10/day vs $0 on subscription.

The system, fully subscription-routed
A trading day where every LLM call hit the Claude CLI subscription. Cost: $0. Calls: 1,847. Latency p95: 4.1s. Illustrative.
illustrative
iQntX 32-agent baseline (illustrative)
Total return
+51.40%
Sharpe ratio
6.14
Win rate
63.1%
Max drawdown
-2.08%

What you need to replicate this

  1. Claude Max x20 subscription + the claude CLI installed and signed in on your trading host.
  2. Codex CLI + a Codex subscription (optional but cheap and useful).
  3. Anthropic API key for fallback.
  4. OpenAI API key for fallback.
  5. A subprocess wrapper that spawns the CLI, sends prompts on stdin, parses stdout, and times out cleanly.
  6. The circuit breaker + cost ledger described above.

That's the entire LLM-routing stack. Maybe 400 lines of Python. The rest of the system (the 32 agents, the message bus, the EA) is a much larger investment, but the routing layer is small and replicable.

Why this matters for retail multi-agent systems

The honest truth about multi-agent trading systems is that they've been institutional-only until 2025 because the LLM cost made retail economics impossible. A 32-agent fund running on per-token billing costs $1,500+/month at minimum. Most retail traders won't pay that on top of the brokerage costs.

Subscription-first routing collapses that cost ceiling. Now a retail operator with a Max x20 subscription can run an institutional-grade agent topology for $220–280/month all-in (subscription + VPS + small fallback). That puts hedge-fund-grade discipline within reach of a serious retail trader for the first time.

The architecture is the unlock. The architecture is the moat.

Keep reading

#claude#llm-routing#multi-agent#cost-optimization#circuit-breaker
iQntX Engineering
Founder & Head of AI Trading Architecture · iQntX

Writes about multi-agent AI trading architecture, hedge-fund operations, and risk discipline for retail and prop-firm traders.

FAQ

Questions readers ask about this

If you find a question we should add, send it to hello@iqntx.com.

Why CLI-first and not just API-first?

Cost. The Anthropic Claude API meters by token; the Claude Max x20 subscription is a flat monthly fee. A 32-agent system making thousands of small reasoning calls per day costs $1–3 per call on the API but is essentially free on the subscription (within fair-use). CLI-first lets us reason cheaply for the common case and pay per-token only for the bursts.

What about latency? Isn't the API faster than the CLI?

Marginally — typical Claude API latency is 1–4 seconds for a small reasoning call; CLI is 2–5 seconds. For a trading decision that takes 30+ seconds end-to-end (chart capture → ingress → pipeline → router → reasoning → fact check → execute), the CLI overhead is in the noise. We're not HFT.

What's a circuit breaker in this context?

A pattern from distributed systems: if a route fails N consecutive times, the breaker 'opens' and routes are skipped until a cooldown elapses. Then a 'half-open' state lets one trial request through. If it succeeds, the breaker closes. If it fails, the cooldown extends. This prevents 32 agents from hammering a broken endpoint.

Why both Claude and OpenAI?

Different models have different strengths. Claude is preferred for reasoning-heavy decisions (regime classification, trade reasoning, journal writing). Codex / GPT is preferred for code-shaped tasks (parsing JSON, validating regex, generating MQL5 fragments). Having two providers is also a hedge against provider downtime.

How does the router decide which route to use?

Each route has a priority (CLI > API), a circuit breaker state (closed / half-open / open), and a per-route cost ledger. The router picks the highest-priority route whose breaker is closed. If all are open, it raises an exception that surfaces to the EmergencyAgent. Cost-aware routing is configurable: an operator can cap monthly API spend, and the router refuses API calls once the cap is hit.

What's the rough cost of running iQntX per month?

If you have a Claude Max x20 subscription and a Codex CLI subscription (both ~$20/month each as of 2026), most reasoning runs on those. The Anthropic API and OpenAI API are fallbacks; expect $20–80/month if your CLIs are stable, more during outages. Plus a Windows VPS (~$30–60/month) and Supabase free tier.

Can I run this without subscriptions, on API only?

Yes, but expect $300–800/month in API costs for an active 32-agent setup. The architecture is designed for subscription-first economics; pure-API is a degraded mode the router supports but doesn't optimize for.

Early access · Limited cohorts

Ready to put this on autopilot?

The waitlist is your fastest path to a private cohort. We open in waves so the system never gets in front of itself.

+ Add details (helps us prioritize your cohort)