Why CLI-first and not just API-first?

Cost. The Anthropic Claude API meters by token; the Claude Max x20 subscription is a flat monthly fee. A 32-agent system making thousands of small reasoning calls per day costs $1–3 per call on the API but is essentially free on the subscription (within fair-use). CLI-first lets us reason cheaply for the common case and pay per-token only for the bursts.

What about latency? Isn't the API faster than the CLI?

Marginally — typical Claude API latency is 1–4 seconds for a small reasoning call; CLI is 2–5 seconds. For a trading decision that takes 30+ seconds end-to-end (chart capture → ingress → pipeline → router → reasoning → fact check → execute), the CLI overhead is in the noise. We're not HFT.

What's a circuit breaker in this context?

A pattern from distributed systems: if a route fails N consecutive times, the breaker 'opens' and routes are skipped until a cooldown elapses. Then a 'half-open' state lets one trial request through. If it succeeds, the breaker closes. If it fails, the cooldown extends. This prevents 32 agents from hammering a broken endpoint.

Why both Claude and OpenAI?

Different models have different strengths. Claude is preferred for reasoning-heavy decisions (regime classification, trade reasoning, journal writing). Codex / GPT is preferred for code-shaped tasks (parsing JSON, validating regex, generating MQL5 fragments). Having two providers is also a hedge against provider downtime.

How does the router decide which route to use?

Each route has a priority (CLI > API), a circuit breaker state (closed / half-open / open), and a per-route cost ledger. The router picks the highest-priority route whose breaker is closed. If all are open, it raises an exception that surfaces to the EmergencyAgent. Cost-aware routing is configurable: an operator can cap monthly API spend, and the router refuses API calls once the cap is hit.

What's the rough cost of running iQntX per month?

If you have a Claude Max x20 subscription and a Codex CLI subscription (both ~$20/month each as of 2026), most reasoning runs on those. The Anthropic API and OpenAI API are fallbacks; expect $20–80/month if your CLIs are stable, more during outages. Plus a Windows VPS (~$30–60/month) and Supabase free tier.

Can I run this without subscriptions, on API only?

Yes, but expect $300–800/month in API costs for an active 32-agent setup. The architecture is designed for subscription-first economics; pure-API is a degraded mode the router supports but doesn't optimize for.

Why CLI-first and not just API-first?

Cost. The Anthropic Claude API meters by token; the Claude Max x20 subscription is a flat monthly fee. A 32-agent system making thousands of small reasoning calls per day costs $1–3 per call on the API but is essentially free on the subscription (within fair-use). CLI-first lets us reason cheaply for the common case and pay per-token only for the bursts.

What about latency? Isn't the API faster than the CLI?

Marginally — typical Claude API latency is 1–4 seconds for a small reasoning call; CLI is 2–5 seconds. For a trading decision that takes 30+ seconds end-to-end (chart capture → ingress → pipeline → router → reasoning → fact check → execute), the CLI overhead is in the noise. We're not HFT.

What's a circuit breaker in this context?

A pattern from distributed systems: if a route fails N consecutive times, the breaker 'opens' and routes are skipped until a cooldown elapses. Then a 'half-open' state lets one trial request through. If it succeeds, the breaker closes. If it fails, the cooldown extends. This prevents 32 agents from hammering a broken endpoint.

Why both Claude and OpenAI?

Different models have different strengths. Claude is preferred for reasoning-heavy decisions (regime classification, trade reasoning, journal writing). Codex / GPT is preferred for code-shaped tasks (parsing JSON, validating regex, generating MQL5 fragments). Having two providers is also a hedge against provider downtime.

How does the router decide which route to use?

Each route has a priority (CLI > API), a circuit breaker state (closed / half-open / open), and a per-route cost ledger. The router picks the highest-priority route whose breaker is closed. If all are open, it raises an exception that surfaces to the EmergencyAgent. Cost-aware routing is configurable: an operator can cap monthly API spend, and the router refuses API calls once the cap is hit.

What's the rough cost of running iQntX per month?

If you have a Claude Max x20 subscription and a Codex CLI subscription (both ~$20/month each as of 2026), most reasoning runs on those. The Anthropic API and OpenAI API are fallbacks; expect $20–80/month if your CLIs are stable, more during outages. Plus a Windows VPS (~$30–60/month) and Supabase free tier.

Can I run this without subscriptions, on API only?

Yes, but expect $300–800/month in API costs for an active 32-agent setup. The architecture is designed for subscription-first economics; pure-API is a degraded mode the router supports but doesn't optimize for.

Building Multi-Agent Trading Systems with Claude (the Subscription-First Way)

The cost problem nobody talks about

When you build a multi-agent AI trading system, every agent that needs reasoning makes an LLM call. iQntX has 32 agents. A complete decision cycle — from chart capture to executed order — typically fires 5–15 LLM calls (regime classification, trade decision, fact check, double check, journal, etc.).

At 5 LLM calls per cycle, running 24×7 with new market data every minute, you're looking at ~7,000 LLM calls per day. On Claude Sonnet at typical retail rates, that's ~$50–80/day, or $1,500–2,400 per month. On Opus, it's 3–5x that.

This is the cost ceiling that has kept multi-agent trading systems out of retail hands. iQntX gets around it by routing subscriptions first, API last.

The four-route architecture

iQntX defines four LLM routes with explicit priorities:

Priority	Route	What it is	Cost model
1	Claude CLI	`claude` command line, signed into Max x20	Flat subscription (~$200/mo for Max x20)
2	Codex CLI	`codex` command line, signed into Codex subscription	Flat subscription (~$20/mo)
3	Claude API	Anthropic API direct	Per-token, ~$3/M input / $15/M output (Sonnet 4.6)
4	OpenAI API	OpenAI direct	Per-token, varies by model

The LLM Router picks the highest-priority route whose circuit breaker is closed.

Subscription routes

Claude CLI + Codex CLI

API fallbacks

Anthropic + OpenAI

Breaker states

closed · half-open · open

Typical $/month

$40-80

Subscription + small fallback

Why CLI first

Two reasons: cost and rate limits.

Cost. The Claude Max x20 subscription is a flat $200/month and (subject to fair use) is essentially unlimited for the kinds of small, fast reasoning calls iQntX makes. The Anthropic API meters by token; a 32-agent system at 7,000 daily calls would cost roughly $1,500–2,400/month at Sonnet rates. The subscription is at least 10x cheaper for this workload.

Rate limits. API rate limits are per-token per minute. Subscription CLI is gated by the subscription's own throttling, which for Max x20 is generous enough that a single multi-agent system never hits it under normal operation.

The trade-off is that subscription CLI is slightly slower (2–5s vs 1–4s for the API), but trading-system decisions are not HFT — adding 2 seconds to a 30-second end-to-end decision cycle is in the noise.

Why API fallback

Three failure modes happen, occasionally:

Subscription CLI crashes. A subprocess can hang. The router needs to recover without manual intervention.
Anthropic / OpenAI rolls out a CLI update that breaks compatibility. Rare but real.
The operator wants to burst above subscription limits. During a black-swan day, a 32-agent system might want to fire 50+ decisions per minute — beyond the subscription's comfortable throughput.

The API routes handle all three. They're slower and more expensive per call, but they're always there, and the cost ledger logs exactly how much each fallback cost.

Circuit breakers, properly implemented

A circuit breaker is one of the most important patterns in distributed systems and one of the most commonly misimplemented. Done right, it prevents 32 agents from hammering a broken endpoint and creating an outage cascade.

The implementation pattern iQntX uses:

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_seconds=60):
        self.failure_threshold = failure_threshold
        self.recovery_seconds = recovery_seconds
        self.failures = 0
        self.opened_at = None
        self.state = "closed"  # closed | half_open | open

    def before(self):
        if self.state == "open":
            if time.time() - self.opened_at > self.recovery_seconds:
                self.state = "half_open"
                return True   # trial request
            return False      # short-circuit; don't call
        return True           # closed or half_open

    def after(self, success: bool):
        if success:
            self.failures = 0
            self.state = "closed"
        else:
            self.failures += 1
            if self.failures >= self.failure_threshold or self.state == "half_open":
                self.state = "open"
                self.opened_at = time.time()

Three crucial details most implementations get wrong:

Half-open allows exactly one trial. Some implementations let several requests through during half-open, which defeats the purpose — you want exactly one trial to test recovery.
Counter resets on success. If you decrement on success instead of resetting, intermittent failures keep the breaker permanently dangerous.
Recovery is exponential. Successive open-state durations should grow (e.g., 60s → 120s → 240s → 480s) until a real human intervention happens. iQntX caps at 1 hour to bound damage.

The cost ledger

Every LLM call writes one row to llm_cost_ledger:

Column	Purpose
`id`	UUID
`agent`	Which agent made the call
`route`	claude_cli / codex_cli / anthropic_api / openai_api
`model`	Specific model used
`input_tokens`	Prompt tokens
`output_tokens`	Completion tokens
`cost_usd`	Computed cost (0 for subscription routes)
`latency_ms`	Round-trip time
`success`	Boolean
`created_at`	Timestamp

You can SELECT against this table to see exactly where the money went:

SELECT
  route,
  COUNT(*) AS calls,
  SUM(cost_usd) AS total_cost,
  AVG(latency_ms) AS avg_latency
FROM llm_cost_ledger
WHERE created_at >= NOW() - INTERVAL '1 day'
GROUP BY route
ORDER BY total_cost DESC;

For a healthy subscription-first deployment, you'll typically see:

claude_cli: 4,000 calls, $0.00, avg 2.8s
codex_cli: 1,200 calls, $0.00, avg 3.1s
anthropic_api: 80 calls, $1.20, avg 1.6s
openai_api: 5 calls, $0.04, avg 1.4s

Total daily LLM cost: $1.24. Total monthly LLM cost: ~$37 plus the two subscriptions ($220).

Skill-based prompting

Each iQntX agent has one or more skills, defined as markdown files under skills/iqntx-*/SKILL.md. A skill is a structured prompt with three sections:

Persona — who is reasoning (e.g., "You are the RiskGate agent of a 32-agent AI hedge fund").
Inputs — what arrives in the prompt (e.g., "stance, proposed trade, current exposure, drawdown headroom").
Output schema — what the response must look like (e.g., a JSON object with decision, reasoning, caveats).

Skills are loaded by the SkillLoader at boot and reloaded on file change — so an operator can iterate on prompting without restarting the service.

Skills go in version control and become the single most readable artifact in the codebase. Anyone (operator, auditor, future engineer) can read the skill files and understand exactly what each agent is being asked to do.

What this looks like in production

A complete decision cycle for an EURUSD setup:

Step	Agent	Route used	Tokens	Cost
1. Regime classification	Phase1Pipeline	claude_cli	1.2K in / 200 out	$0
2. Macro check	MacroOfficer	claude_cli	0.8K in / 150 out	$0
3. Setup proposal	StrategistAgent	claude_cli	2.5K in / 400 out	$0
4. Risk gate	RiskGateAgent	claude_cli	1.0K in / 100 out	$0
5. Fact check	FactCheckerAgent	claude_cli	1.2K in / 150 out	$0
6. Double check	DoubleCheckerAgent	claude_cli	1.5K in / 200 out	$0
7. Journal	JournalAgent	claude_cli	0.5K in / 300 out	$0

Total cost: $0. All seven calls hit the Claude CLI subscription. At API rates, that single decision cycle would cost ~$0.06–0.10. Over 100 cycles a day, that's $6–10/day vs $0 on subscription.

The system, fully subscription-routed

A trading day where every LLM call hit the Claude CLI subscription. Cost: $0. Calls: 1,847. Latency p95: 4.1s. Illustrative.

illustrative

iQntX multi-agent baseline (illustrative)

Total return

+51.40%

Sharpe ratio

6.14

Win rate

63.1%

Max drawdown

-2.08%

What you need to replicate this

Claude Max x20 subscription + the claude CLI installed and signed in on your trading host.
Codex CLI + a Codex subscription (optional but cheap and useful).
Anthropic API key for fallback.
OpenAI API key for fallback.
A subprocess wrapper that spawns the CLI, sends prompts on stdin, parses stdout, and times out cleanly.
The circuit breaker + cost ledger described above.

That's the entire LLM-routing stack. Maybe 400 lines of Python. The rest of the system (the 32 agents, the message bus, the EA) is a much larger investment, but the routing layer is small and replicable.

Why this matters for retail multi-agent systems

The honest truth about multi-agent trading systems is that they've been institutional-only until 2025 because the LLM cost made retail economics impossible. A 32-agent fund running on per-token billing costs $1,500+/month at minimum. Most retail traders won't pay that on top of the brokerage costs.

Subscription-first routing collapses that cost ceiling. Now a retail operator with a Max x20 subscription can run an institutional-grade agent topology for $220–280/month all-in (subscription + VPS + small fallback). That puts hedge-fund-grade discipline within reach of a serious retail trader for the first time.

The architecture is the unlock. The architecture is the moat.

Keep reading

How a 32-Agent AI Hedge Fund Beats a Single-Model Bot — the org chart.
The Anatomy of a Drawdown — what survival actually costs.
Why Most MT5 EAs Fail — the layer the LLMs sit on top of.

Building Multi-Agent Trading Systems with Claude (the Subscription-First Way)

The cost problem nobody talks about

The four-route architecture

Why CLI first

Why API fallback

Circuit breakers, properly implemented

The cost ledger

Skill-based prompting

What this looks like in production

What you need to replicate this

Why this matters for retail multi-agent systems

Keep reading

Questions readers ask about this

Keep reading

Claude Trading Bot: From Single Agent to Production Multi-Agent System

How a 32-Agent AI Hedge Fund Beats a Single-Model Bot

What Is Multi-Agent Trading? (And Why It Beats Single-Model Bots)

Ready to put this on autopilot?