A sovereign voice AI appliance — 2U, on-prem — ships the entire voice AI platform inside: the same model, the same pipeline, and the same APIs across Auricus Voice 8 / 16 / 32.
Two paths, one stack
- Live and near–real-time — windowed streaming for contact centers, supervision, and compliance workflows. ~48–64 concurrent near–real-time streams per Auricus Voice 32 appliance before queue limits dominate.
- Batch at scale — sustained ~100 files/min on Auricus Voice 32; end-to-end (ingest, language ID, delivery) ~240–300 files/hour under typical overhead.
Same pipeline serves both — no separate “real-time” and “batch” SKUs, no second integration to build.
Quality observability
- Per-language WER dashboards — separate tracking for 16 kHz vs 8 kHz so phone-band quality is never hidden behind wideband averages.
- Continuous evaluation harness — same suite used for customer pilots and internal regression checks, so the numbers we share with you are directly comparable to the numbers we run against.
- Customer-shared benchmarks — published quality figures will be added here as we complete the next round of cross-language reference runs.
Language reach
- ~99 transcription languages.
- 107 automatic language-ID classes (93.3% benchmark accuracy).
- Mixed-language deployments handled by per-language priors and regional defaults.
- LID detection runs in the sub-second band on the primary path; CPU fallback available for resilience.
Integration
- REST + JSON ingestion over HTTPS.
- Bearer-token authentication.
- Async worker model — submit a job, then either poll or receive a webhook callback with the result.
- Webhook events:
transcript (raw transcript) · wer (corrected + raw + measured WER when ground truth is supplied).
- HTTP 429 backpressure with
Retry-After semantics for both per-job and queue-depth limits.
- Webhook retries with dead-letter handling for terminal failures.
→ Public spec sheet: Specifications.
Observability
Auricus Voice ships with the operational surface area of a modern platform — not a black-box appliance.
- Prometheus metrics endpoint in standard exposition format. Stage durations, device utilisation, queue depth, jobs in flight, jobs by language, WER ratio, language-detection counters, canary metrics.
- Grafana dashboards: system overview, pipeline performance (P50 / P95 / P99 + per-stage breakdown), quality (WER trends + per-language), device health (temperature, power, utilisation, errors), queue management, SLO tracking, language analytics.
- Structured audit logs — every job submission, completion, and failure emitted with a request ID for downstream SIEM ingestion.
Service-level objectives
| SLO |
Target |
| Availability |
99.9% (1-hour completed-vs-all-events ratio); ~43.2 min downtime / 30-day budget |
| Latency |
95% of pipeline stages complete within ≤ 10 s (1-hour rolling) |
Sustainability
- Auricus Voice 32 at full load: ~600 W typical appliance draw — vs comparable GPU-based racks measured in kilowatts.
- 2U chassis replaces multi-server GPU racks, cutting embodied materials, cooling load, and end-of-life e-waste.
- End-of-life take-back under the EPR framework — vendor-managed return logistics for retired appliances; no customer responsibility for downstream WEEE handling.
SKU comparison
| SKU |
Concurrent live calls |
Matched annual audio (M min/yr) |
Batch (files/min) |
End-to-end (files/hr) |
Peak accelerator power |
Target use case |
| Auricus Voice 8 |
12 |
2 |
25 |
60–75 |
80 W |
Mid-size contact center, departmental fleet |
| Auricus Voice 16 |
24 |
4 |
50 |
120–150 |
160 W |
Large enterprise contact center, regional carrier |
| Auricus Voice 32 |
48 |
8 |
100 |
240–300 |
320 W |
National contact center, telco / government scale |
Concurrent live calls = sustained concurrent near-real-time conversations per appliance. Matched annual audio = realistic operational capacity per appliance under a typical mixed real-time + batch duty cycle. Batch (files/min) = sustained transcription throughput per appliance. files/hr (E2E) = end-to-end including ingest, language ID, decode, and delivery overhead. Peak accelerator power scales linearly across the family; add ~120 W chassis baseline for total appliance draw.
→ See the matched-workload savings comparison vs cloud STT.