An AI layer for the VHI service workflow: the system understands the patient’s current intent, checks program constraints and clinic logic, and prepares the next action inside CRM.
It is easy to underestimate VHI service if you look at it as a chat between a patient and an operator. In reality the chat is only the outer interface. Inside each request sits a decision route: what exactly the patient wants, whether the service is covered, whether the clinic is valid, whether approval is needed, whether a guarantee letter is required, and which action is actually allowed inside CRM.
In VHI service operations a request rarely lives as one isolated question. A patient can spend days or weeks inside one global chat thread: first asking for a consultation, then cancelling a visit, then coming back with a new request. The operator does not just read text. They assemble a decision from a changing operational state.
That is why a single case can stretch to nearly two hours. Not because people are slow at writing messages, but because the workflow is spread across time: the patient clarifies a preferred window, the operator looks for a clinic, the clinic answers with slots, the program has to be checked again, the guarantee letter may need to be prepared, and then the whole chain may restart if the slot is gone or the patient changes plans.
1. A new request must be separated from the old one.
All patient correspondence lives in one global chat across all appeals. The system has to find the first new topic after the appeal start date and avoid mistaking it for a continuation of the old one. That is already temporal reasoning inside an operational workflow, not plain summarization.
2. Time is part of the meaning.
Phrases like “tomorrow after lunch” or “Friday morning works for me” cannot become a nice summary line. They have to become concrete booking windows tied to the original message timestamp. Vague phrases like “sometime next week” cannot become slots at all.
3. The decision is assembled from several live systems at once.
One request pulls together visits, claims, complaints, warnings, approvals, guarantee letter history, products, program constraints, clinics, telemed-first routing, and the current chat. What an experienced operator does with eyes and habit had to be decomposed into explicit data sources and explicit steps.
4. The failure is not bad text but a wrong business action.
If AI makes a mistake in a support chatbot, that is unpleasant. If AI inside a VHI workflow gets the service, clinic, approval path, or booking mode wrong, the mistake lands in CRM and then hits SLA, operating cost, and patient experience.
On a slide this looks much simpler than it is.
| On the slide | In production |
|---|---|
| “We need to read the patient chat” | You need to find the first new intent in a global thread and separate it from previous appeals |
| “We need to understand which service the patient wants” | You need to combine service matching with coverage, limits, approvals, telemed-first routing, and clinic-in-program checks |
| “We need to automate booking” | You need a booking engine that works with live slots, clinic history, online or offline branches, and changing availability |
| “We only need to verify the output” | You need separate datasets and evals for each intermediate step, not one final QA pass |
That is why the standard playbook of “take an open-source product, tune the prompt, connect APIs” would not be enough here. As soon as the system starts creating value in one part of the process, it immediately runs into the next layer of operational dependency. In workflows like this AI quickly stops being a smart overlay and becomes a decision layer.
The first layer is not about text but about the meaning of the request. It parses the current slice of the global chat, extracts service, clinic, complaint, preferred time windows, doctor, and product, and pulls in previous context only when it is truly relevant to the same topic.
Claims and notes are processed by separate modules. That distinction matters: this is not one big model that understands everything. It is a set of narrow agents, each with its own quality contract. That makes the system easier to constrain and much easier to debug on real cases.
Once the request is understood, the more expensive part begins. The system has to verify service availability, clinic eligibility under the program, whether approval is required, whether limits apply, whether telemed-first routing is active, and whether a matching visit or guarantee letter already exists.
This is where the boundary between AI and rules sits. AI handles semantic ambiguity. Everything with a high cost of error moves toward explicit checks. In enterprise workflows that is not a compromise. It is the only safe architecture.
If the previous steps pass, the system does not stop at a suggestion for the operator. It prepares a real CRM action: pre-fills a visit, assembles clinic advisory options, creates or updates a guarantee letter, and in some scenarios cancels an existing visit.
The phrase “automatically books the patient” hides a separate mini-system. It has to consider the preferred clinic, the last clinic from history, approvals, geography, virtual-clinic exclusions, online availability, and live slots. For part of the service catalog the flow goes through a schedule request, waits for the response, cuts slots down to the patient’s time window, and then returns the result to the operator flow. That is much closer to operational dispatch than to text generation.
The fourth layer turned out to be the most underestimated one. In a workflow like this it is not enough to reach acceptable quality once. You need to know exactly where the system failed: in chat understanding, service matching, notes, visit handling, quality checks, or CRM side effects.
That is why the project grew an explicit measurement layer: evals for services, chat, claims, notes, visits, and OKK; scripts to replay real CRM cases; separate datasets for intermediate steps instead of only the final answer. A large share of enterprise AI cost here sits not in inference itself, but in the system that proves quality.
This project could not be launched safely as a big-bang replacement. The error cost was too high and the side effects were too real. So the automation had to be rolled out in stages.
First the system learned to gather context and suggest a decision. Then it started pre-filling entities inside CRM. Only after that did the team enable riskier actions such as visit updates, guarantee letter flows, and other scenarios where the error leaves the chat and enters the live operating process.
For the business this is probably the main lesson of the case. Enterprise AI is not deployed by replacing the human in one move. It is deployed as staged autonomy, with an explicit risk boundary at every step.
In the pilot the operator stopped being the person who manually stitched together chat, program rules, history, clinics, slots, and documents. Their role moved toward reviewing a prepared decision and confirming the action.
Where a case used to stretch close to two hours because of repeated checks, patient-clinic back-and-forth, and document preparation, the process came down to roughly half an hour. But the bigger change is not speed alone. The bigger change is that the system stopped being one more AI window next to CRM and became part of the operating machine itself.
That is what makes this case useful. It shows that enterprise AI becomes truly hard not when you need to choose a model, but when you need to turn professional operator judgment into a partially autonomous, testable, and safe decision machine.
A stack of narrow agents for chat understanding, claims, notes, service and clinic matching, and quality checks. Each one owns a specific uncertainty instead of asking one model to replace the whole workflow
Embeddings and similarity search for services and clinics on top of CRM reference data. Operator phrasing and canonical service names do not match, so retrieval has to bridge that gap
Explicit checks for coverage, approvals, telemed-first routing, clinic-in-program, limits, and CRM side effects. AI cannot cross into the zone where an error becomes an operational incident
Separate eval datasets for chat, services, notes, claims, visits, and OKK. Quality is measured at every intermediate contract, not only on the final answer
Structured logs, tracing, saved input API bundles, and control over CRM side effects. Without that layer you cannot investigate failures or scale the automation safely
We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.
Inquiry sent
We will reply within one business day to the email you provided.
or write directly to ilya@manaraga.ai