How I Built Voice AI That Calls 101 Government Offices

The Problem Nobody Talks About

Personal injury law firms need CHP traffic collision reports to build cases. There's a state portal for requesting them, but here's the catch — the portal requires fields that the firm almost never has at intake: the exact crash time, the responding officer's badge number, the NCIC code for the jurisdiction. Without those fields, the request gets rejected.

So what happens? A paralegal picks up the phone, calls the local CHP area office, sits on hold, talks to whoever answers, and asks for the missing info. They do this dozens of times a week. Across 101 CHP offices statewide.

I've been building the system that replaces those calls.

Why Voice AI Is the Right Tool Here

This isn't a chatbot problem. Government offices don't have APIs. They don't have portals for third-party data lookups. What they do have is a phone number and a desk officer who handles attorney inquiries all day long.

The interaction is structured and predictable: identify yourself, provide a case number or incident details, request specific data fields, confirm what you heard, hang up. It's exactly the kind of conversation that a well-designed voice agent can handle — scripted enough to stay on track, flexible enough to handle the officer's phrasing.

I built this on VAPI, which handles the real-time speech pipeline. VAPI manages the telephony, speech-to-text, the LLM turn-taking, and text-to-speech in a single integrated loop. My job was designing what the agent actually says and does.

The Conversation Flow

The agent has a clear mission: call a specific CHP office, identify itself as calling on behalf of the law firm, provide whatever incident details we already have, and ask for the missing fields.

The tricky part isn't the happy path. It's everything else. What happens when you hit an IVR tree instead of a person? What happens when the officer says "I can't find that incident"? What happens when they give you a partial answer — the crash time but not the badge number?

I designed branching logic for all of these. IVR navigation uses DTMF tones. Dead ends trigger a graceful exit and flag the case for human follow-up. Partial data gets captured and tagged as incomplete — the system knows what it got and what it still needs.

The Data Provenance Problem

This was the hardest technical challenge and the one I'm most proud of solving.

When a report request goes to CHP, every field has to be accurate. But the data in our system comes from two very different sources: the law firm (who fills in what the client told them) and the CHP officer (who has access to the actual records).

These sources have completely different reliability levels. A client might say the crash happened "around 3pm" — that's an estimate. An officer pulling it from the system says "15:27" — that's authoritative.

I built a tagged data model where every field carries a source and a confidence level. Data from client intake is tagged source: "client" with confidence low or medium. Data confirmed by a CHP officer is tagged source: "chp_officer" with confidence high. The system won't submit a portal request using low-confidence data for critical fields. It knows the difference between "we think this is right" and "an officer confirmed this."

This seems obvious in retrospect, but most systems I've seen just overwrite fields without tracking where the data came from. That's how you end up submitting bad report requests and wondering why they keep getting rejected.

The Office Routing Problem

California has 101 CHP area offices, each responsible for different jurisdictions. An incident on the 405 in Orange County goes to the Westminster office. An incident on the 5 near Camp Pendleton goes to Oceanside. Get the office wrong and you waste a call — the officer will just tell you to try a different one.

I built a routing table that maps incident locations to the correct CHP office. It uses a combination of freeway, city, and county data to determine jurisdiction. The table isn't perfect — edge cases near jurisdiction boundaries are genuinely ambiguous — but it gets it right about 95% of the time.

I also added a critical rule: never call the same office more than once per day for the same case. CHP officers notice, and it doesn't help your relationship with the office. If the first call didn't resolve it, the system queues it for the next day or flags it for a human to handle differently.

What Didn't Work

The first version of the system tried to handle ambiguous officer responses automatically. An officer would say something like "well, it could have been unit 42 or unit 48, I'm not sure from what I'm seeing here" — and the agent would try to pick one or ask clarifying questions.

It couldn't. Ambiguity in human speech is contextual in ways that even good LLMs struggle with in real-time voice. The agent would either pick wrong or create an awkward conversational loop that frustrated the officer.

The fix was simple: add human escalation. When the agent detects ambiguity above a certain threshold, it thanks the officer, captures exactly what was said as a transcript snippet, and flags the case for a paralegal to review. The paralegal sees the raw quote and makes the judgment call.

This made the whole system work. Not because the AI got smarter, but because I stopped asking it to do something it wasn't good at. The best AI systems I've built all have this property — they know where their competence boundary is.

Results

The system handles the majority of routine CHP office calls without human intervention. Report turnaround time dropped significantly. Paralegals went from spending hours on hold to reviewing flagged edge cases.

But the real win is reliability. The system calls at consistent times, never forgets a follow-up, and maintains a complete audit trail of every interaction. It's not faster than a good paralegal — it's more consistent than any human could be across hundreds of cases.