Inferventis
Systems Architecture · Living Document · April 2026

End-to-end system schema

How a client agent intent becomes a tool call — and how we make our tools win. Read left → right.

Live in production
In development
Planned
Client agent
Client Agent
An autonomous AI agent running locally or in the cloud. Receives a user task and determines an external tool is needed.
· Name & description
· System prompt — behaviour & persona
· LLM — Claude / GPT / Gemini / other
· Tool list — MCP server connected
· Memory — context window + history
· Trigger — user / schedule / event
· Output — user / agent / system
Any MCP client
intent
(natural
language)
Orchestrator
Local Orchestrator
Receives agent intent. Embeds it as a vector. Broadcasts semantic payload to registered MCP registries simultaneously. Collects manifests. Selects by discovery score.
Live — tool_finder
Framework default registry
LangChain / AutoGen / CrewAI register Inferventis as default endpoint. Option A — concurrent development track.
Planned
semantic
broadcast
(embedding)
MCP server · discovery
MCP Server
Cloud Run europe-west1. Streamable HTTP. Two endpoints: /mcp (API-key + Stripe billing, existing customers) and /mcp-pay (x402/MPP micropayments, no API key). tools/list returns manifest. tools/call routes to handler. Fires billing + telemetry per call.
Live
tool_finder
Semantic discovery engine. MiniLM-L6-v2 embeddings. Scores: 80% cosine similarity + 20% manifest quality bonus. Returns ranked tool list. Warm: 133ms.
Live
manifest
request
Network layer
Edge nodes
Cloudflare Workers deployed globally. Intercepts manifest requests before they reach origin. Returns cached responses in under 20ms from the node nearest to the calling agent.
Planned
Semantic path cache
Recognises semantically equivalent intents across different agents. "Current weather in London" and "Temperature in London right now" resolve to the same cached skill path — no re-embedding required.
Planned
Regional replicas
Manifest index replicated across global regions — Europe, North America, APAC, Middle East. Broadcast hits the nearest Inferventis node first, returning results before unoptimised registries can respond.
Planned
Micropayment layer
x402 protocol live on /mcp-pay. HTTP 402 challenge issued per tool call → agent pays USDC on Base → Coinbase facilitator verifies on-chain → tool served. Stripe MPP implemented (pending merchant token). Base Sepolia testnet live and verified. Mainnet: swap two env vars.
Live — testnet
optimised
manifests
written
Optimisation engine — core IP
Manifest quality scoring
Every manifest scored on field completeness, description richness, example coverage, and precondition clarity. Quality bonus (20% weight) applied per call. Richer manifests win by design.
Live
Telemetry capture
Every tool call logs: calling model, tool selected, intent query, latency, success/fail. Accumulates as proprietary dataset per LLM × tool pair. The raw material for all optimisation.
Live
A/B variant testing
Multiple description variants run simultaneously per tool. Selection rate measured per variant per calling model. Winning variant promoted. Challenger variants continuously introduced.
In development
Judge LLM eval loop
Synthetic eval pipeline. Judge LLM tests variants against thousands of generated intents. Measures trigger rate and hallucination rate per tool × model pair. Runs on every model version update.
Planned
Per-model manifest variants
For each tool × LLM pair a separately tuned manifest is maintained. Claude: precise bounded descriptions. GPT-4: action-first natural language. Gemini: schema-focused. Mistral: concise enterprise. 16 variant files live across 4 tools × 4 models.
Live
Dynamic manifest serving
Calling model detected from model_hint in request. Correct per-model variant served automatically. Falls back to base manifest if no variant exists. variant_id returned in results for tracking.
Live
ranked
manifests
returned
MCR — Multi-model Contextual Registry
currency_convert
62.7
score
Live FX via open.er-api.com. No auth. ISO 4217. Rate + converted amount + timestamp.
Optimised · standard
stripe_payments
59.5
score
Stripe test mode. Payments, charges, customers, subscriptions. Real data.
Optimised · premium
open_banking
59.3
score
TrueLayer. 300+ UK/EU banks. PSD2. Sandbox blocked — realistic mock data active.
Optimised · premium
finnhub_stock_quote
49.2
score
Real-time stock prices. Company info, sector, intraday high/low, % change.
Optimised · standard
fx_converter
currency_basic
42–45
score
Demo foils — identical handlers, weak manifests. Prove optimisation delta.
Unoptimised
payment_tool
bank_data
stock_tool
27–29
score
Demo foils — identical handlers, weak manifests. Prove optimisation delta.
Unoptimised
handler
called
Connectors
open.er-api.com
Free FX rates. No auth required. 60+ currencies.
Live
Stripe API
Test mode. sk_test key in Secret Manager.
Live
Finnhub
Free tier. Real-time stock quotes. API key in Secret Manager.
Live
TrueLayer
Open Banking. 300+ UK/EU banks. Sandbox incident March 2026.
Blocked
Companies House
UK company registry. Free API. No auth required.
Planned
result
returned
Platform services
Auth
x-api-key header. Secret Manager. SHA256 log hashing. 401 on failure.
Live
Stripe billing
Metered. api_transaction + agent_task meters. €0.05/unit. Non-blocking.
Live
x402 / MPP payments
Per-tool payment mode via env vars (none/x402/mpp). USDC on Base Sepolia testnet verified 2026-04-28. Coinbase facilitator. Stripe MPP ready — pending merchant token. All tools default to none.
Live — testnet
Telemetry
Cloud Logging. Two event types: (1) call events — tool, latency, success. (2) optimisation events — intent, model_hint, variant_served, variant_id, full_ranking, runner_up, margin, is_optimised_winner. Intent corpus feeds A/B loop.
Live
CI/CD
Cloud Build on push to main. Artifact Registry → Cloud Run auto-deploy.
Live
Stripe Connect
Automatic developer revenue splits. 80/20 early, 70/30 standard tier.
Planned
result to
agent
Output
Client Agent
Receives structured tool result. Continues reasoning. Delivers answer to user or downstream system.
Task complete
Data layer — three stores, one flywheel
Hot events — Cloud Logging
Every tool_finder call emits a structured optimisation_event.
Fields: intent · model_hint · variant_served · variant_id · tool_selected · discovery_score · runner_up · margin · full_ranking · is_optimised_winner

The intent field is the critical data point — it tells us exactly what agents asked for when they selected or bypassed a tool.
Live
Intent corpus — Cloud Storage
Daily JSONL files: intents_YYYY_MM_DD.jsonl
Each line: timestamp, session_id, model, intent (raw text), embedding (384-dim vector), tool_selected, variant_served, discovery_score.

Feeds Judge LLM eval loop. Enables semantic clustering — "78% of FX intents contain invoice language."
Planned
Variant ledger — Firestore
Per variant per model: calls_served, times_selected_rank_1, selection_rate, avg_discovery_score, avg_margin, status (active/challenger/retired).

Drives A/B winner promotion. Maps to Stripe Connect — higher selection rate = higher developer revenue bonus.
Planned
Optimisation engine — the core IP (runs continuously beneath every call)
Intent received
Model detected
Variant selected
Vector embedded
Similarity scored
Quality bonus applied
Tool selected
Optimisation event logged
Intent + variant stored
A/B data accumulated
Variant ledger updated
Winner promoted
Score improves
↩ repeat
Commercial flow
Stripe metered — human operators / developers
Agent operator
pays £0.02/call
Inferventis margin
£0.01 (100%)
Developer receives
£0.01/call
Developer earns
10× vs self-hosted
Cloud licence
Huawei / AWS / Azure
Inferventis takes
30% of their margin
x402 micropayments — autonomous agents (live testnet · mainnet ready)
Autonomous agent
pays $0.001 USDC
x402 on Base
settles in ~2s
CDP wallet receives
USDC on-chain
Coinbase converts
USDC → GBP
Revolut Business
fiat treasury
Spend via card
or bank transfer
Client agent
Platform core
Network layer
Optimised tools (MCR)
Unoptimised foils
Optimisation engine
Data layer
Connectors
Platform services
Billing
x402 / Micropayments