Human conversation operates on a precise rhythm. When you ask someone a question, you expect a response in roughly 200–400 milliseconds. Anything longer feels like hesitation. Anything much longer feels like the other person isn't listening, doesn't understand, or has a bad connection.
Voice AI in India operates in an environment where this timing is everything — and where most global platforms structurally fail.
What "Latency" Actually Measures
In voice AI, latency typically refers to Time-to-First-Byte (TTFB) — the time between when the caller stops speaking and when the AI starts responding. This metric combines:
- VAD delay: How long the system waits to confirm the caller has stopped (typically 200–400ms)
- STT processing: Transcribing what was said (80–200ms on optimized systems)
- LLM inference: Generating the response (100–500ms depending on model size and hardware)
- TTS synthesis: Converting response text to audio (50–150ms with streaming)
- Network round-trip: Data traveling to and from servers
For a US-based server handling an Indian call, the network round-trip alone adds 120–200ms. Before the AI even processes anything, you've already spent your latency budget.
The India advantage: Running AI inference on India-based servers cuts network latency to 10–30ms for Indian callers. This isn't a minor optimization — it's the difference between a natural conversation and an awkward one.
Why India's Mobile Network Compounds the Problem
India's mobile network is predominantly 4G, with significant 3G pockets in Tier-2 and Tier-3 cities. On 3G networks, audio packet loss rates can reach 3–5%, forcing retransmission that adds variable latency.
Voice AI systems that aren't designed for variable-quality Indian connections struggle disproportionately in these environments — the AI doesn't know if the caller paused intentionally or if there was packet loss, leading to false start-of-response triggers or excessive VAD delays.
What Sub-300ms Feels Like in a Real Call
At 250ms: The AI responds immediately after the caller finishes. The conversation feels natural and intelligent.
At 500ms: There's a perceptible pause. The caller often starts speaking again — creating an interruption cycle.
At 800ms+: The caller assumes the system didn't understand and either repeats themselves or hangs up. This is the latency range of most US-routed AI calls to India.
How Agni Achieves Sub-300ms in India
Three infrastructure choices combine to achieve consistent sub-300ms latency:
- India-based inference servers: LLM and TTS run on hardware co-located in Indian data centres
- Streaming TTS: The system starts synthesizing and transmitting audio before the full response is generated
- Adaptive VAD: End-of-speech detection is calibrated for Indian mobile network patterns, including 3G packet loss profiles
The result: median TTFB of 240ms for Hindi/Hinglish calls, 270ms for other Indian language variants. Consistent across network conditions.