Productised AI for operations-heavy and regulated teams.
Audit · Pilot · Multi-agent system · Operations retainer. Built by a senior UK engineering team — twenty-three years of shipping production software for operations-heavy customers, including the Claude-powered AI agent live on Bison Track for Siem Car Carriers.
What a production-grade agent looks like, on paper.
An illustrative sketch — typed definition, hybrid retrieval, tool calling, guardrails, evals, deploy step. This is the shape we scope into Pilot Builds and Multi-Agent Systems, not a screenshot of a deployed Team Bison system. For what we actually have shipped today, see Currently Shipping below.
// illustrative · agent-shape sketch // What a production-grade agent looks like on paper — // retrieval, tool use, guardrails, evals, deploy step. // Not a snapshot of a deployed Team Bison system. import { Agent, tool, retriever } from '@teambison/agents'; import { policies, precedent, vehicles } from './stores'; const policyStore = retriever({ store: policies, hybrid: { lexical: 0.4, vector: 0.6 }, topK: 8, }); export const claimsTriage = new Agent({ name: 'claims.triage', model: 'claude-sonnet-4-7', system: 'You triage vehicle damage claims. Cite policy and precedent.', tools: [ tool('searchPolicies', policyStore), tool('extractClaimFields' ,extract), tool('lookupVehicle', vehicles), tool('escalateAdjuster', page), ], guardrails: { pii: true, audit: true, maxCostUsd: 0.20, handoffBelow: 0.85, // confidence }, evals: ['./evals/claims-triage.jsonl'], }); await claimsTriage.deploy({ env: 'prod' }); // → shape only — see Currently Shipping for what we run
Three shapes of AI work. The pattern, not a fleet count.
These are the three shapes of work we scope into Pilot Builds and Multi-Agent Systems. We bring twenty-three years of integration depth and a senior UK engineer on every Tier 1 build. Production guardrails — evals, audit trails, cost ceilings — are configured into each engagement to the depth the system requires, not bolted on as a marketing claim.
Document-heavy retrieval
Hybrid lexical + vector retrieval over policy documents, claims precedent, OEM specs, compliance frameworks (DVSA, GDPR, IMO), and operational SOPs. Citations attached, not hallucinated. The same retrieval pattern generalises to TMS, ERP and operations-document corpora.
- — Hybrid retrieval — lexical + vector
- — Reranking where the corpus rewards it
- — Eval harness for retrieval quality before launch
Multi-step operations agents
Agents that do more than one round-trip — tool use, memory, supervisor loops, and a clear human-in-the-loop handoff when confidence drops. The Bison Track AI agent at Siem Car Carriers is the live reference. We scope what we build with bounded tool surfaces, audited traces and human escalation built in from day one.
- — Tool calling with typed schemas
- — Audit traces from the first run
- — Human-in-the-loop handoff on confidence threshold
Inbox and ticket automation
Agents for triage, document extraction, supplier queries and freight-rate variance — measured against the human baseline before they take live work, then rolled out in shadow mode before full cutover. Cost ceilings per run. No surprise invoices.
- — Async batch and real-time pipelines
- — Cost ceilings per run
- — Shadow-mode rollout, then cutover
AI is 30% of the value. The other 70% is integration.
Off-the-shelf AI products rarely fit logistics workflows out of the box. We integrate AI — ours or yours — into your TMS, WMS, ERP, OEM EDI, fleet telematics, ePOD and carrier APIs. The customisation around the model is where the value lives.
Built on Anthropic. Multi-substrate by design.
Pilot Builds and Multi-Agent Systems are delivered on Anthropic infrastructure — Managed Agents where session state, agent loop and sandboxed tool execution should be platform responsibilities; direct Claude API where data residency or single-tenant requirements call for it. Bison Insights at Siem Car Carriers runs on direct API today. Multi-substrate capability is itself a credential: we deliver on whichever substrate fits your procurement and operational constraints. Buyers who’d prefer to hold the Anthropic relationship directly are supported via that route — we bill only for Pilot Build, Multi-Agent or Retainer scope. The work that doesn’t compress is twenty-three years of logistics integration depth, evals calibrated against your operations, human-in-the-loop design and named-customer accountability — the layers that sit on top of the substrate, not the substrate itself.
What we set up before an AI build goes near production.
Every item below is a default we configure into AI engagements at Pilot Build tier and above. Not a feature list of a productised framework — a checklist of the things we treat as table stakes for putting AI on operations data.
PII scrubbing
Detected & redacted pre-LLM
Sensitive fields detected and redacted before the LLM sees them. Configured on at the start of an engagement, not after the first incident.
Cost ceilings
Hard stop per run
A maximum spend per run is set against the cost envelope you signed off. Spend goes to a dashboard, not into a surprise invoice.
Audit traces
Every tool call logged
Every tool call, every prompt, every retrieval — written to a destination you control, not a black box.
Confidence handoff
Threshold → human
Below the configured confidence threshold, the agent pages a human rather than acting. Agents that know when not to act are the only ones worth deploying.
Eval suites
JSONL test cases
Regression test cases run on every change. Failures block deploy. Built up across the engagement, not promised at the end.
Region pinning
Inference stays in-region
Data and inference stay in the region you committed to. No silent egress, no surprise transatlantic round-trips.
AI we’ve shipped, with named customers.
A Claude-powered AI agent in production on the Bison Track Vessel Tracking module — the first deployed module of Bison Insights, our modular operational insights platform. Tool-use against the ingested voyage, finance, operations and tracking data; operations teams ask questions in plain English and get cited answers without leaving the dashboard.
Public case studies in development. Reference calls available on request after a discovery conversation.
Four tiers. The audit is the wedge.
AI Operations Audit
Two-week paid discovery sprint. Stack assessment, AI use-case identification against your operations data, security posture review, written 90-day plan with feasibility scoring and ROI estimates against named use cases. Credited against any follow-on build over £25k.
Single Agent Build (Pilot)
One agent in production. Document extraction, triage, retrieval or a similarly bounded use case. 4–6 weeks. Audit credit applied where one was run.
Multi-Agent System
Coordinated agents with hand-offs, supervisor loops, and human-in-the-loop on the boundaries that matter. Production deployment, observability and ops baseline included.
Production AI Programme
Custom agents on your operations stack. Eval suites, ops, retraining cycles, ongoing capacity. Scoped per-engagement.
AI Operations Retainer
Ongoing AI ops once an agent is live. Eval refresh, prompt and tool maintenance, drift monitoring, light scope changes. Added on top of any of the build tiers above.
Have an AI use case that needs more than a demo? Start with the Audit.
Most AI engagements start with the Operations Audit — two weeks, fixed £3,500, ending with cost and ROI estimates against feasibility scoring on the use cases we identify. The fee is credited against any follow-on build over £25k. Not ready to commit yet? The 30-minute consultation is the better starting point.
We sell AI horizontally but lead with operations-heavy and regulated buyers. If you’re looking for a horizontal AI strategy deck without engineering, we’ll politely refer you elsewhere.