We built an assistant that answers railway engineers' and dispatchers' questions about the network infrastructure. The data lives in several disconnected systems, so the assistant picks the right source for each question on its own and never conflates things that look related but mean different things: a phone planned in the directory is not yet equipment installed at the station.
A national railway is being pulled into a single dispatch center: movement, communications, video surveillance, fire and intrusion alarms, data networks — everything that used to live station by station is being drawn into one place. The people who run it have questions about the infrastructure itself every day. A dispatcher asks what is connected to the node at Makiš. A network engineer asks which VLANs are configured at Vrčin station. A service technician asks what equipment is already installed at a site and under what warranty.
The questions sound alike, but the answers live in completely different systems. The link topology is in one export, station configs in another, telephone directories in a third, and equipment deliveries and installations in a separate procurement database. The person who answers these questions is really working as a translator between those systems: they know where to look for each kind of question, and they keep in mind that the same name means different things in different places.
This is not specific to railways. Almost any company that has accumulated a zoo of record-keeping systems over the years is in the same situation: the same object is scattered across several databases under different names, and someone holds that whole map in their head.
The system takes a question in natural language and answers it across all of these sources at once, in the same language it was asked — Serbian, Russian, or English. From the outside it looks like a chat. Inside there is one orchestrator agent and a set of typed tools, one per source.
The core difficulty of this project hides inside a single word: “Makiš”. It is the central node of the network, and it shows up in almost every source. The trouble is that in each one it means a different kind of fact and is written differently.
| Where it lives | How it’s written | What kind of fact |
|---|---|---|
| Network topology | MAKIS-PE2 | a node router; its ports are physically wired to ports on other devices |
| Object registry | Макиш | the dispatch center itself — a row in the catalog of objects and sections |
| Comms directory | Makiš | a planned console or phone number |
| Procurement database | Макиш | an equipment line item with a warranty, a waybill, and a flag for whether it is installed |
Three spellings in Latin and Cyrillic, four different kinds of fact. For a system that dumps everything into one search index, “Makiš” is a single string, and it will just as readily return a device, a building, a phone, and a warehouse line item all mixed into one answer. In a dispatch room that means a confidently wrong answer: that a phone at the station exists, say, when the directory only has it planned — and whether it was ever installed is something an entirely different system knows.
So the system is built as a router. For every question the agent first decides which source to look at, and only then answers. Each source gets its own tool: topology, object registry, station configs, telephone directories, the procurement database, and vendor documentation. Six sources instead of one search box.
There are explicit precedence rules and bans on mixing between the sources, because their data overlaps and in places contradicts itself. Links between devices appear both in the topology file and as hints inside the station configs; the system knows that topology is the authoritative source for links, and that the hints from configs are secondary.
The object registry looks like a map of the network, but the system is told plainly that it is a catalog of objects, not a link graph. The telephone directory looks like a list of installed phones, but it describes a plan: an entry in it does not mean the device is in place and working.
It even goes as far as this: some of the “phones” in the directory are not phones at all. Alongside ordinary SIP handsets sit dispatch-console buttons and line terminals — they have a number, but you cannot call it. The system flags them separately so it never hands out a button’s number as a phone number.
Five of the six sources are small files: a few hundred rows of topology, a few hundred objects in the registry, configs for half a dozen stations, the telephone directories. They are small, so at startup they are loaded into memory whole and handed to the model as is. The sixth, the procurement database, is fundamentally more complex: it is a live relational database, and its tool is the only one with real query work in it. By volume of code it is larger than all the other tools combined, and here is why.
First, the warehouse. Equipment sitting in a warehouse as a spare and equipment that has been delivered to a site but not yet installed answer different questions for an engineer. Behind that are two independent facts — whether the item is in a warehouse or on a site, and whether it is installed or not; the system separates their combinations by the meaning of the question.
| Engineer’s question | What it means |
|---|---|
| what is at the station | installed and running on site |
| what was delivered but not installed | on site, not yet in service |
| what is kept as a spare | reserve in the warehouse |
Next, warranty. Neither “installed” nor “under warranty” is stored as a ready field: they are computed facts. An item counts as installed if at least one of its deliveries is marked as mounted, and the warranty term is taken from the latest of those deliveries. A single item often has several delivery records, so the system collects both facts across all the deliveries of that item.
And language. An engineer types “switch” — or its Russian form, “свич” — while the catalog descriptions are in Serbian and Russian. The search runs across both language fields and the vendor name at once, and the model first translates the query into the database’s language. If a search by location name finds nothing, the system does not give up: it goes to the object registry for the canonical spelling of the name and runs the query again with it. One question about equipment at a station turns into a chain across two sources, just to identify the station correctly.
Domain assumptions are baked in here too. Cisco was never deployed on this network, so the model is explicitly forbidden from suggesting it and is told to assume, by default, the equipment families that are actually installed.
The sixth source is vendor documentation: telephone-system manuals, switch references, station design documents, datasheets. These are PDFs, and it would be tempting to slice them into fixed-size chunks and drop them into one search. We did it differently.
Each document is split into chapters by its own table of contents. First the system tries to take the structure from the PDF bookmarks; if there are none, it detects the table of contents from the text; and if the document turns out to be a scan with no text layer, it runs it through OCR in three languages and looks for the table of contents in the recognized text. One chapter becomes one record. A large document where no structure could be found is not loaded at all: better to leave it out than to hand the model an unreadable wall of text it will grab a random fragment from. Short documents like a two-page datasheet go in whole — there is nothing to slice.
The text in the database is a derivative of the source PDF, and over time it can drift from the original through recognition errors or shifted chapter boundaries. So there is a separate check: the system takes random chapters from the database, re-extracts the same pages from the source, and compares them. This catches exactly the case where the model confidently cites “page 14” while page 14 says something else.
A reliable answer starts with the system answering from a specific source and showing where each fact came from. Beyond that, several boundaries hold it in place, because in this environment a confidently wrong answer costs more than an honest “I don’t know”.
What matters most is what the assistant is even connected to. It reads reference and record-keeping sources: topology, configs, directories, the procurement database. To the systems that actually control movement, switches, and routes it is not connected at all, and it works with their descriptions and exports rather than with live control.
There are smaller but important safeguards too. While paging through a large result set from the procurement database, the model likes to repeat the same query; the system catches the repeat before it reaches the database and returns a message that turns the model back toward answering from what it already has. When an engineer asks about the live state of equipment, the system does not invent telemetry — it suggests the commands they can run to read that state themselves. And when there is no answer, or the question is off-topic, the assistant does not guess: it hands over the real on-call contacts so the person can call someone who will sort it out.
Honestly, about the limits. The model assigns each retrieved chunk a confidence score, but for now that lives as an instruction in the prompt and is not enforced in the code, so it is too early to rely on it as a hard threshold. The reference files are read into memory at startup: if something is switched over at a station, the assistant will only see it after the next restart.
Both of these — moving the confidence check into the code and refreshing the references on the fly — are understood and on the list. The system already works as it is.
An engineer or dispatcher asks a question in plain words, in Serbian, Russian, or English, and gets an assembled answer: from the right source, by the correct spelling of the name, with a citation to where it came from. Where confidence is not enough, the system hands the question to a human and does not pass off the planned as the installed — which is why the answer can be trusted. This walk across several systems used to be held in the head of one experienced person, and the process bottlenecked on them. Now a single question does the same thing.
One orchestrator on pydantic-ai: it parses the question, picks the source for it, and answers in the question's language. Domain behavior comes from the system prompt and routing, not from fine-tuning the model
Vendor documentation split into chapters by its own table of contents, not into fixed-size chunks. Scans are run through OCR in three languages, and large documents with no structure never enter the database
Read-only access across every tool, deduplication of repeated database queries, caps on context size and chapter count, a ban on inventing live telemetry, and a mandatory fallback to real on-call contacts when there is no answer
An external Claude Sonnet 4.5 model via OpenRouter; built-in web search over vendor docs is switched on by the :online suffix. There is no project-specific fine-tuning — behavior comes from the instructions and the tools
We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.
Inquiry sent
We will reply within one business day to the email you provided.
or write directly to ilya@manaraga.ai