Productised AI for operations-heavy and regulated teams.
Audit · Pilot · Multi-agent system · Operations retainer. Built by a senior UK engineering team that has shipped production software for operations-heavy customers since 2003 — including Bison Press, our own AI marketing automation plugin live with three named external customers, and the Claude-powered chat agent live on Bison Track for Siem Car Carriers.
What a production-grade agent looks like, on paper.
An illustrative sketch — typed definition, hybrid retrieval, tool calling, guardrails, evals, deploy step. This is the shape we scope into Pilot Builds and Multi-Agent Systems, not a screenshot of a deployed Team Bison system. For what we actually have shipped today, see Currently Shipping below.
// illustrative · agent-shape sketch // What a production-grade agent looks like on paper — // retrieval, tool use, guardrails, evals, deploy step. // Not a snapshot of a deployed Team Bison system. import { Agent, tool, retriever } from '@teambison/agents'; import { policies, precedent, vehicles } from './stores'; const policyStore = retriever({ store: policies, hybrid: { lexical: 0.4, vector: 0.6 }, topK: 8, }); export const claimsTriage = new Agent({ name: 'claims.triage', model: 'claude-sonnet-4-7', system: 'You triage vehicle damage claims. Cite policy and precedent.', tools: [ tool('searchPolicies', policyStore), tool('extractClaimFields' ,extract), tool('lookupVehicle', vehicles), tool('escalateAdjuster', page), ], guardrails: { pii: true, audit: true, maxCostUsd: 0.20, handoffBelow: 0.85, // confidence }, evals: ['./evals/claims-triage.jsonl'], }); await claimsTriage.deploy({ env: 'prod' }); // → shape only — see Currently Shipping for what we run
Three shapes of AI work. The pattern, not a fleet count.
These are the three shapes of work we scope into Pilot Builds and Multi-Agent Systems. We bring twenty-three years of integration depth and a senior UK engineer on every Tier 1 build. Production guardrails — evals, audit trails, cost ceilings — are configured into each engagement to the depth the system requires, not bolted on as a marketing claim.
Document-heavy retrieval
Hybrid lexical + vector retrieval over policy documents, claims precedent, CDS documentation, OEM specs, and operational SOPs. Citations attached, not hallucinated. The same retrieval pattern sits inside Bison Press; it generalises to TMS-, ERP- and customs-document corpora.
- — Hybrid retrieval — lexical + vector
- — Reranking where the corpus rewards it
- — Eval harness for retrieval quality before launch
Multi-step operations agents
Agents that do more than one round-trip — tool use, memory, supervisor loops, and a clear human-in-the-loop handoff when confidence drops. The Bison Track AI chat agent at Siem Car Carriers is the live reference. We scope what we build with bounded tool surfaces, audited traces and human escalation built in from day one.
- — Tool calling with typed schemas
- — Audit traces from the first run
- — Human-in-the-loop handoff on confidence threshold
Inbox and ticket automation
Agents for triage, document extraction, supplier queries and freight-rate variance — measured against the human baseline before they take live work, then rolled out in shadow mode before full cutover. Cost ceilings per run. No surprise invoices.
- — Async batch and real-time pipelines
- — Cost ceilings per run
- — Shadow-mode rollout, then cutover
AI is 30% of the value. The other 70% is integration.
Off-the-shelf AI products rarely fit logistics workflows out of the box. We integrate AI — ours or yours — into your TMS, WMS, ERP, HMRC CDS, OEM EDI, fleet telematics, ePOD and carrier APIs. The customisation around the model is where the value lives.
What we set up before an AI build goes near production.
Every item below is a default we configure into AI engagements at Pilot Build tier and above. Not a feature list of a productised framework — a checklist of the things we treat as table stakes for putting AI on operations data.
PII scrubbing
Detected & redacted pre-LLM
Sensitive fields detected and redacted before the LLM sees them. Configured on at the start of an engagement, not after the first incident.
Cost ceilings
Hard stop per run
A maximum spend per run is set against the cost envelope you signed off. Spend goes to a dashboard, not into a surprise invoice.
Audit traces
Every tool call logged
Every tool call, every prompt, every retrieval — written to a destination you control, not a black box.
Confidence handoff
Threshold → human
Below the configured confidence threshold, the agent pages a human rather than acting. Agents that know when not to act are the only ones worth deploying.
Eval suites
JSONL test cases
Regression test cases run on every change. Failures block deploy. Built up across the engagement, not promised at the end.
Region pinning
Inference stays in-region
Data and inference stay in the region you committed to. No silent egress, no surprise transatlantic round-trips.
AI we’ve shipped, with named customers.
A WordPress plugin shipping AI-driven marketing automation. Live in production with three external customers: Herd Group, New Team Services and Siem Car Carriers. Built, deployed and kept running by us.
A Claude-powered chat agent in production on the Bison Track Vessel Tracking module — the first deployed module of Bison Insights, our modular operational dashboards platform. Operations teams ask questions of their own data and get cited answers without leaving the dashboard.
Public case studies in development. Reference calls available on request after a discovery conversation.
Four tiers. The audit is the wedge.
AI Operations Audit
Two-week paid discovery sprint. Stack assessment, AI use-case identification against your operations data, security posture review, written 90-day plan with feasibility scoring and ROI estimates against named use cases. Credited against any follow-on build over £25k.
Single Agent Build (Pilot)
One agent in production. Document extraction, triage, retrieval or a similarly bounded use case. 4–6 weeks. Audit credit applied where one was run.
Multi-Agent System
Coordinated agents with hand-offs, supervisor loops, and human-in-the-loop on the boundaries that matter. Production deployment, observability and ops baseline included.
Production AI Programme
Custom agents on your operations stack. Eval suites, ops, retraining cycles, ongoing capacity. Scoped per-engagement.
AI Operations Retainer
Ongoing AI ops once an agent is live. Eval refresh, prompt and tool maintenance, drift monitoring, light scope changes. Added on top of any of the build tiers above.
Have an AI use case that needs more than a demo? Start with the Audit.
Most AI engagements start with the Operations Audit — two weeks, fixed £3,500, ending with cost and ROI estimates against feasibility scoring on the use cases we identify. The fee is credited against any follow-on build over £25k. Not ready to commit yet? The 30-minute consultation is the better starting point.
We sell AI horizontally but lead with operations-heavy and regulated buyers. If you’re looking for a horizontal AI strategy deck without engineering, we’ll politely refer you elsewhere.