← All cases
Insurance VHI Service

Luchi: a decision system for the VHI service workflow

An AI layer for the VHI service workflow: the system understands the patient’s current intent, checks program constraints and clinic logic, and prepares the next action inside CRM.

VHI service is not a chat problem

It is easy to underestimate VHI service if you look at it as a chat between a patient and an operator. In reality the chat is only the outer interface. Inside each request sits a decision route: what exactly the patient wants, whether the service is covered, whether the clinic is valid, whether approval is needed, whether a guarantee letter is required, and which action is actually allowed inside CRM.

In VHI service operations a request rarely lives as one isolated question. A patient can spend days or weeks inside one global chat thread: first asking for a consultation, then cancelling a visit, then coming back with a new request. The operator does not just read text. They assemble a decision from a changing operational state.

That is why a single case can stretch to nearly two hours. Not because people are slow at writing messages, but because the workflow is spread across time: the patient clarifies a preferred window, the operator looks for a clinic, the clinic answers with slots, the program has to be checked again, the guarantee letter may need to be prepared, and then the whole chain may restart if the slot is gone or the patient changes plans.

Where the project becomes expensive

1. A new request must be separated from the old one.
All patient correspondence lives in one global chat across all appeals. The system has to find the first new topic after the appeal start date and avoid mistaking it for a continuation of the old one. That is already temporal reasoning inside an operational workflow, not plain summarization.

2. Time is part of the meaning.
Phrases like “tomorrow after lunch” or “Friday morning works for me” cannot become a nice summary line. They have to become concrete booking windows tied to the original message timestamp. Vague phrases like “sometime next week” cannot become slots at all.

3. The decision is assembled from several live systems at once.
One request pulls together visits, claims, complaints, warnings, approvals, guarantee letter history, products, program constraints, clinics, telemed-first routing, and the current chat. What an experienced operator does with eyes and habit had to be decomposed into explicit data sources and explicit steps.

4. The failure is not bad text but a wrong business action.
If AI makes a mistake in a support chatbot, that is unpleasant. If AI inside a VHI workflow gets the service, clinic, approval path, or booking mode wrong, the mistake lands in CRM and then hits SLA, operating cost, and patient experience.

Why a standard AI copilot does not work here

On a slide this looks much simpler than it is.

On the slideIn production
“We need to read the patient chat”You need to find the first new intent in a global thread and separate it from previous appeals
“We need to understand which service the patient wants”You need to combine service matching with coverage, limits, approvals, telemed-first routing, and clinic-in-program checks
“We need to automate booking”You need a booking engine that works with live slots, clinic history, online or offline branches, and changing availability
“We only need to verify the output”You need separate datasets and evals for each intermediate step, not one final QA pass

That is why the standard playbook of “take an open-source product, tune the prompt, connect APIs” would not be enough here. As soon as the system starts creating value in one part of the process, it immediately runs into the next layer of operational dependency. In workflows like this AI quickly stops being a smart overlay and becomes a decision layer.

How the system had to be assembled

1. Understand what the patient wants now

The first layer is not about text but about the meaning of the request. It parses the current slice of the global chat, extracts service, clinic, complaint, preferred time windows, doctor, and product, and pulls in previous context only when it is truly relevant to the same topic.

Claims and notes are processed by separate modules. That distinction matters: this is not one big model that understands everything. It is a set of narrow agents, each with its own quality contract. That makes the system easier to constrain and much easier to debug on real cases.

2. Check whether the action is allowed

Once the request is understood, the more expensive part begins. The system has to verify service availability, clinic eligibility under the program, whether approval is required, whether limits apply, whether telemed-first routing is active, and whether a matching visit or guarantee letter already exists.

This is where the boundary between AI and rules sits. AI handles semantic ambiguity. Everything with a high cost of error moves toward explicit checks. In enterprise workflows that is not a compromise. It is the only safe architecture.

3. Prepare the action, not just a suggestion

If the previous steps pass, the system does not stop at a suggestion for the operator. It prepares a real CRM action: pre-fills a visit, assembles clinic advisory options, creates or updates a guarantee letter, and in some scenarios cancels an existing visit.

The phrase “automatically books the patient” hides a separate mini-system. It has to consider the preferred clinic, the last clinic from history, approvals, geography, virtual-clinic exclusions, online availability, and live slots. For part of the service catalog the flow goes through a schedule request, waits for the response, cuts slots down to the patient’s time window, and then returns the result to the operator flow. That is much closer to operational dispatch than to text generation.

4. Prove that the system stays in bounds after launch

The fourth layer turned out to be the most underestimated one. In a workflow like this it is not enough to reach acceptable quality once. You need to know exactly where the system failed: in chat understanding, service matching, notes, visit handling, quality checks, or CRM side effects.

That is why the project grew an explicit measurement layer: evals for services, chat, claims, notes, visits, and OKK; scripts to replay real CRM cases; separate datasets for intermediate steps instead of only the final answer. A large share of enterprise AI cost here sits not in inference itself, but in the system that proves quality.

Under the hood: the engineering map
REQUEST
appeal id · patient id · one global chat across all appeals · timestamp of the latest message
in parallel
STATE COLLECTION
lift the live operating state, not only the last message
13 sources in one pass async fan-out
visits · patient profile · claims and complaints · warnings · full chat · options · program expiration · appeal history · approvals · rating clusters · virtual clinic · telemed-first · GP history
patient products are fetched after that as a separate step
Freshness cache summary store
the cached decision is tied to the latest patient message; stale state cannot be reused safely
REQUEST UNDERSTANDING
find the new intent inside a shared chat and normalize it into working entities
Chat split appeal vs context
messages are separated into the current appeal slice and historical context
the model looks for the first new topic after the appeal start date
Narrow AI modules multi-step
separate passes for chat summary · claims · notes · service matching · clinic matching · visit matching
Hybrid retrieval Weaviate + similarity search
services and clinics go through hybrid search, then the service is normalized again through a dedicated similarity-search service
Time windows ISO-8601
phrases like “tomorrow after lunch” become explicit intervals tied to the timestamp of the original message
unified decision context →
normalized service clinic candidates patient history time windows
CHECKS boundary between AI and code
high-cost failure modes are moved into explicit checks
approval logic
hardcoded approval checks, existing approvals, and guarantee letters already issued in the last 30 days
coverage and limits
service-in-program, service limits, and whether the route can go through the virtual clinic
clinic eligibility
whether the clinic is in program and valid for this specific route
telemed-first routing
not a suggestion but a blocking condition inside the workflow
script-service rules
keyword and scenario rules for cases where free interpretation is too risky
CRM ACTIONS gated by flags
the same endpoint can run read-only or action-capable
Risk gating feature flags
`enable_prefill_visit` · `enable_update_visit` · `enable_gp`
action branches are triggered by `CREATE_VISIT`, `CREATE_GUARANTEE_LETTER`, and `CANCEL_VISIT` topics
Visit branch prefill + update
builds the Visit payload, finds the linked visit by appeal, updates the visit and notice, and returns a CRM prefill response
Booking branch booking engine
works through preferred clinic, last clinic from history, approvals, and patient geography
for part of the catalog it creates a schedule request, polls for the response, filters slots by the patient window, groups them by doctor, and merges adjacent intervals
Guarantee letter branch GP flow
finds the relevant visit or approval, selects `template_id`, creates the letter, and updates its contents once the clinic data is resolved
WHAT HAD TO BECOME ITS OWN SUBSYSTEM
git history shows where the real complexity lived: not in nicer text, but in operating risk
Clinic selection and routing
rewritten repeatedly until it became a routing mechanism with alternatives, online booking, telemed-first, and virtual clinic logic
Decision freshness
cache design changed multiple times because a stale decision for a live appeal is more dangerous than recomputing it
Guarantee letters
this branch grew from templates into an independent flow with `template_id`, its own payload, and a post-create update step
Quality control
OKK became a separate API, agent mode, dataset family, and offline eval layer because operator-message quality could not stay manual
13 sources per appeal · one global chat across all patient appeals · narrow AI modules instead of one universal call
Weaviate + similarity search + rule layer + action flags · visit prefill · booking · guarantee letter · visit cancellation
separate eval loops for chat / services / visits / clinic retrieval / OKK

Why the rollout had to be staged

This project could not be launched safely as a big-bang replacement. The error cost was too high and the side effects were too real. So the automation had to be rolled out in stages.

First the system learned to gather context and suggest a decision. Then it started pre-filling entities inside CRM. Only after that did the team enable riskier actions such as visit updates, guarantee letter flows, and other scenarios where the error leaves the chat and enters the live operating process.

For the business this is probably the main lesson of the case. Enterprise AI is not deployed by replacing the human in one move. It is deployed as staged autonomy, with an explicit risk boundary at every step.

What changed in the process

In the pilot the operator stopped being the person who manually stitched together chat, program rules, history, clinics, slots, and documents. Their role moved toward reviewing a prepared decision and confirming the action.

Where a case used to stretch close to two hours because of repeated checks, patient-clinic back-and-forth, and document preparation, the process came down to roughly half an hour. But the bigger change is not speed alone. The bigger change is that the system stopped being one more AI window next to CRM and became part of the operating machine itself.

That is what makes this case useful. It shows that enterprise AI becomes truly hard not when you need to choose a model, but when you need to turn professional operator judgment into a partially autonomous, testable, and safe decision machine.

workflow demo accelerated playback 0.0 sec
appeal
DMS service
under the hood
waiting to start
Once started, the system separates a new request from the old topic and builds the decision out of the live process state.

What we learned in the pilot

Not a chatbot but a decision system inside CRM
13 parallel data sources per request
One global chat across all patient appeals
Live slots, approvals, and guarantee letters in one flow
Separate eval datasets for each intermediate step
Staged autonomy instead of a big-bang rollout
Not a chatbot but a decision system inside CRM
13 parallel data sources per request
One global chat across all patient appeals
Live slots, approvals, and guarantee letters in one flow
Separate eval datasets for each intermediate step
Staged autonomy instead of a big-bang rollout

Platform modules used in this project

Chat & Agents pydantic-ai

A stack of narrow agents for chat understanding, claims, notes, service and clinic matching, and quality checks. Each one owns a specific uncertainty instead of asking one model to replace the whole workflow

Documents Weaviate

Embeddings and similarity search for services and clinics on top of CRM reference data. Operator phrasing and canonical service names do not match, so retrieval has to bridge that gap

Guardrails

Explicit checks for coverage, approvals, telemed-first routing, clinic-in-program, limits, and CRM side effects. AI cannot cross into the zone where an error becomes an operational incident

Evaluation

Separate eval datasets for chat, services, notes, claims, visits, and OKK. Quality is measured at every intermediate contract, not only on the final answer

Observability

Structured logs, tracing, saved input API bundles, and control over CRM side effects. Without that layer you cannot investigate failures or scale the automation safely

Tell us which process you want to break down.

We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.

or write directly to ilya@manaraga.ai