Service · AI Workflows & Agents

Productised AI for operations-heavy and regulated teams.

Audit · Pilot · Multi-agent system · Operations retainer. Built by a senior UK engineering team that has shipped production software for operations-heavy customers since 2003 — including Bison Press, our own AI marketing automation plugin live with three named external customers, and the Claude-powered chat agent live on Bison Track for Siem Car Carriers.

— Audit · £3,500 — Pilot · £8–20k — Multi-agent · £25–80k — Retainer · £1.5–3.5k/mo
Illustrative · agent shape

What a production-grade agent looks like, on paper.

An illustrative sketch — typed definition, hybrid retrieval, tool calling, guardrails, evals, deploy step. This is the shape we scope into Pilot Builds and Multi-Agent Systems, not a screenshot of a deployed Team Bison system. For what we actually have shipped today, see Currently Shipping below.

Hybrid retrieval
lexical + vector
Tool calling
with audit traces
Guardrails
PII, cost, handoff
Evals as tests
CI-blocking
example.agent.ts • EXAMPLE
// illustrative · agent-shape sketch
// What a production-grade agent looks like on paper —
// retrieval, tool use, guardrails, evals, deploy step.
// Not a snapshot of a deployed Team Bison system.
import { Agent, tool, retriever } from '@teambison/agents';
import { policies, precedent, vehicles } from './stores';

const policyStore = retriever({
  store: policies,
  hybrid: { lexical: 0.4, vector: 0.6 },
  topK:   8,
});

export const claimsTriage = new Agent({
  name:  'claims.triage',
  model: 'claude-sonnet-4-7',
  system: 'You triage vehicle damage claims. Cite policy and precedent.',
  tools: [
    tool('searchPolicies',     policyStore),
    tool('extractClaimFields' ,extract),
    tool('lookupVehicle',      vehicles),
    tool('escalateAdjuster',   page),
  ],
  guardrails: {
    pii:           true,
    audit:         true,
    maxCostUsd:    0.20,
    handoffBelow:  0.85,    // confidence
  },
  evals: ['./evals/claims-triage.jsonl'],
});

await claimsTriage.deploy({ env: 'prod' });
// → shape only — see Currently Shipping for what we run
Capabilities

Three shapes of AI work. The pattern, not a fleet count.

These are the three shapes of work we scope into Pilot Builds and Multi-Agent Systems. We bring twenty-three years of integration depth and a senior UK engineer on every Tier 1 build. Production guardrails — evals, audit trails, cost ceilings — are configured into each engagement to the depth the system requires, not bolted on as a marketing claim.

RAG

Document-heavy retrieval

Hybrid lexical + vector retrieval over policy documents, claims precedent, CDS documentation, OEM specs, and operational SOPs. Citations attached, not hallucinated. The same retrieval pattern sits inside Bison Press; it generalises to TMS-, ERP- and customs-document corpora.

  • — Hybrid retrieval — lexical + vector
  • — Reranking where the corpus rewards it
  • — Eval harness for retrieval quality before launch
AGENTS

Multi-step operations agents

Agents that do more than one round-trip — tool use, memory, supervisor loops, and a clear human-in-the-loop handoff when confidence drops. The Bison Track AI chat agent at Siem Car Carriers is the live reference. We scope what we build with bounded tool surfaces, audited traces and human escalation built in from day one.

  • — Tool calling with typed schemas
  • — Audit traces from the first run
  • — Human-in-the-loop handoff on confidence threshold
AUTOMATION

Inbox and ticket automation

Agents for triage, document extraction, supplier queries and freight-rate variance — measured against the human baseline before they take live work, then rolled out in shadow mode before full cutover. Cost ceilings per run. No surprise invoices.

  • — Async batch and real-time pipelines
  • — Cost ceilings per run
  • — Shadow-mode rollout, then cutover
Integration

AI is 30% of the value. The other 70% is integration.

Off-the-shelf AI products rarely fit logistics workflows out of the box. We integrate AI — ours or yours — into your TMS, WMS, ERP, HMRC CDS, OEM EDI, fleet telematics, ePOD and carrier APIs. The customisation around the model is where the value lives.

TMSWMSERPHMRC CDSOEM EDI Royal Mail OBADPDEvriDHLYodel TelematicsePOD
Guardrails & observability

What we set up before an AI build goes near production.

Every item below is a default we configure into AI engagements at Pilot Build tier and above. Not a feature list of a productised framework — a checklist of the things we treat as table stakes for putting AI on operations data.

Default on

PII scrubbing

Detected & redacted pre-LLM

Sensitive fields detected and redacted before the LLM sees them. Configured on at the start of an engagement, not after the first incident.

Default on

Cost ceilings

Hard stop per run

A maximum spend per run is set against the cost envelope you signed off. Spend goes to a dashboard, not into a surprise invoice.

Default on

Audit traces

Every tool call logged

Every tool call, every prompt, every retrieval — written to a destination you control, not a black box.

Default on

Confidence handoff

Threshold → human

Below the configured confidence threshold, the agent pages a human rather than acting. Agents that know when not to act are the only ones worth deploying.

Default on

Eval suites

JSONL test cases

Regression test cases run on every change. Failures block deploy. Built up across the engagement, not promised at the end.

Default on

Region pinning

Inference stays in-region

Data and inference stay in the region you committed to. No silent egress, no surprise transatlantic round-trips.

Currently shipping · deployed AI

AI we’ve shipped, with named customers.

Bison Press
Our own product · three named customers

A WordPress plugin shipping AI-driven marketing automation. Live in production with three external customers: Herd Group, New Team Services and Siem Car Carriers. Built, deployed and kept running by us.

Bison Track AI chat agent
Claude-powered · live at Siem Car Carriers

A Claude-powered chat agent in production on the Bison Track Vessel Tracking module — the first deployed module of Bison Insights, our modular operational dashboards platform. Operations teams ask questions of their own data and get cited answers without leaving the dashboard.

Public case studies in development. Reference calls available on request after a discovery conversation.

Pricing

Four tiers. The audit is the wedge.

£8k–£20k fixed

Single Agent Build (Pilot)

One agent in production. Document extraction, triage, retrieval or a similarly bounded use case. 4–6 weeks. Audit credit applied where one was run.

£25k–£80k

Multi-Agent System

Coordinated agents with hand-offs, supervisor loops, and human-in-the-loop on the boundaries that matter. Production deployment, observability and ops baseline included.

£80k+

Production AI Programme

Custom agents on your operations stack. Eval suites, ops, retraining cycles, ongoing capacity. Scoped per-engagement.

£1,500–£3,500 / month

AI Operations Retainer

Ongoing AI ops once an agent is live. Eval refresh, prompt and tool maintenance, drift monitoring, light scope changes. Added on top of any of the build tiers above.

Have an AI use case that needs more than a demo? Start with the Audit.

Most AI engagements start with the Operations Audit — two weeks, fixed £3,500, ending with cost and ROI estimates against feasibility scoring on the use cases we identify. The fee is credited against any follow-on build over £25k. Not ready to commit yet? The 30-minute consultation is the better starting point.

We sell AI horizontally but lead with operations-heavy and regulated buyers. If you’re looking for a horizontal AI strategy deck without engineering, we’ll politely refer you elsewhere.