A sovereign voice AI appliance — 2U, on-prem — ships the entire voice AI platform inside: the same model, the same pipeline, and the same APIs across Auricus Voice 8 / 16 / 32.

Two paths, one stack

  • Live and near–real-time — windowed streaming for contact centers, supervision, and compliance workflows. ~48–64 concurrent near–real-time streams per Auricus Voice 32 appliance before queue limits dominate.
  • Batch at scale — sustained ~100 files/min on Auricus Voice 32; end-to-end (ingest, language ID, delivery) ~240–300 files/hour under typical overhead.

Same pipeline serves both — no separate “real-time” and “batch” SKUs, no second integration to build.

Quality observability

  • Per-language WER dashboards — separate tracking for 16 kHz vs 8 kHz so phone-band quality is never hidden behind wideband averages.
  • Continuous evaluation harness — same suite used for customer pilots and internal regression checks, so the numbers we share with you are directly comparable to the numbers we run against.
  • Customer-shared benchmarks — published quality figures will be added here as we complete the next round of cross-language reference runs.

Language reach

  • ~99 transcription languages.
  • 107 automatic language-ID classes (93.3% benchmark accuracy).
  • Mixed-language deployments handled by per-language priors and regional defaults.
  • LID detection runs in the sub-second band on the primary path; CPU fallback available for resilience.

Integration

  • REST + JSON ingestion over HTTPS.
  • Bearer-token authentication.
  • Async worker model — submit a job, then either poll or receive a webhook callback with the result.
  • Webhook events: transcript (raw transcript) · wer (corrected + raw + measured WER when ground truth is supplied).
  • HTTP 429 backpressure with Retry-After semantics for both per-job and queue-depth limits.
  • Webhook retries with dead-letter handling for terminal failures.

→ Public spec sheet: Specifications.

Observability

Auricus Voice ships with the operational surface area of a modern platform — not a black-box appliance.

  • Prometheus metrics endpoint in standard exposition format. Stage durations, device utilisation, queue depth, jobs in flight, jobs by language, WER ratio, language-detection counters, canary metrics.
  • Grafana dashboards: system overview, pipeline performance (P50 / P95 / P99 + per-stage breakdown), quality (WER trends + per-language), device health (temperature, power, utilisation, errors), queue management, SLO tracking, language analytics.
  • Structured audit logs — every job submission, completion, and failure emitted with a request ID for downstream SIEM ingestion.

Service-level objectives

SLO Target
Availability 99.9% (1-hour completed-vs-all-events ratio); ~43.2 min downtime / 30-day budget
Latency 95% of pipeline stages complete within ≤ 10 s (1-hour rolling)

Sustainability

  • Auricus Voice 32 at full load: ~600 W typical appliance draw — vs comparable GPU-based racks measured in kilowatts.
  • 2U chassis replaces multi-server GPU racks, cutting embodied materials, cooling load, and end-of-life e-waste.
  • End-of-life take-back under the EPR framework — vendor-managed return logistics for retired appliances; no customer responsibility for downstream WEEE handling.

SKU comparison

SKU Concurrent live calls Matched annual audio (M min/yr) Batch (files/min) End-to-end (files/hr) Peak accelerator power Target use case
Auricus Voice 8 12 2 25 60–75 80 W Mid-size contact center, departmental fleet
Auricus Voice 16 24 4 50 120–150 160 W Large enterprise contact center, regional carrier
Auricus Voice 32 48 8 100 240–300 320 W National contact center, telco / government scale

Concurrent live calls = sustained concurrent near-real-time conversations per appliance. Matched annual audio = realistic operational capacity per appliance under a typical mixed real-time + batch duty cycle. Batch (files/min) = sustained transcription throughput per appliance. files/hr (E2E) = end-to-end including ingest, language ID, decode, and delivery overhead. Peak accelerator power scales linearly across the family; add ~120 W chassis baseline for total appliance draw.

→ See the matched-workload savings comparison vs cloud STT.