AI Assistant for Railway Engineers and Dispatchers

One question, answers in different systems

A national railway is being pulled into a single dispatch center: movement, communications, video surveillance, fire and intrusion alarms, data networks — everything that used to live station by station is being drawn into one place. The people who run it have questions about the infrastructure itself every day. A dispatcher asks what is connected to the node at Makiš. A network engineer asks which VLANs are configured at Vrčin station. A service technician asks what equipment is already installed at a site and under what warranty.

The questions sound alike, but the answers live in completely different systems. The link topology is in one export, station configs in another, telephone directories in a third, and equipment deliveries and installations in a separate procurement database. The person who answers these questions is really working as a translator between those systems: they know where to look for each kind of question, and they keep in mind that the same name means different things in different places.

This is not specific to railways. Almost any company that has accumulated a zoo of record-keeping systems over the years is in the same situation: the same object is scattered across several databases under different names, and someone holds that whole map in their head.

The system takes a question in natural language and answers it across all of these sources at once, in the same language it was asked — Serbian, Russian, or English. From the outside it looks like a chat. Inside there is one orchestrator agent and a set of typed tools, one per source.

One station lives in several systems

The core difficulty of this project hides inside a single word: “Makiš”. It is the central node of the network, and it shows up in almost every source. The trouble is that in each one it means a different kind of fact and is written differently.

Where it lives	How it’s written	What kind of fact
Network topology	`MAKIS-PE2`	a node router; its ports are physically wired to ports on other devices
Object registry	`Макиш`	the dispatch center itself — a row in the catalog of objects and sections
Comms directory	`Makiš`	a planned console or phone number
Procurement database	`Макиш`	an equipment line item with a warranty, a waybill, and a flag for whether it is installed

Three spellings in Latin and Cyrillic, four different kinds of fact. For a system that dumps everything into one search index, “Makiš” is a single string, and it will just as readily return a device, a building, a phone, and a warehouse line item all mixed into one answer. In a dispatch room that means a confidently wrong answer: that a phone at the station exists, say, when the directory only has it planned — and whether it was ever installed is something an entirely different system knows.

A router, not one search index

So the system is built as a router. For every question the agent first decides which source to look at, and only then answers. Each source gets its own tool: topology, object registry, station configs, telephone directories, the procurement database, and vendor documentation. Six sources instead of one search box.

There are explicit precedence rules and bans on mixing between the sources, because their data overlaps and in places contradicts itself. Links between devices appear both in the topology file and as hints inside the station configs; the system knows that topology is the authoritative source for links, and that the hints from configs are secondary.

The object registry looks like a map of the network, but the system is told plainly that it is a catalog of objects, not a link graph. The telephone directory looks like a list of installed phones, but it describes a plan: an entry in it does not mean the device is in place and working.

It even goes as far as this: some of the “phones” in the directory are not phones at all. Alongside ordinary SIP handsets sit dispatch-console buttons and line terminals — they have a number, but you cannot call it. The system flags them separately so it never hands out a button’s number as a phone number.

The heaviest tool: the procurement database

Five of the six sources are small files: a few hundred rows of topology, a few hundred objects in the registry, configs for half a dozen stations, the telephone directories. They are small, so at startup they are loaded into memory whole and handed to the model as is. The sixth, the procurement database, is fundamentally more complex: it is a live relational database, and its tool is the only one with real query work in it. By volume of code it is larger than all the other tools combined, and here is why.

First, the warehouse. Equipment sitting in a warehouse as a spare and equipment that has been delivered to a site but not yet installed answer different questions for an engineer. Behind that are two independent facts — whether the item is in a warehouse or on a site, and whether it is installed or not; the system separates their combinations by the meaning of the question.

Engineer’s question	What it means
what is at the station	installed and running on site
what was delivered but not installed	on site, not yet in service
what is kept as a spare	reserve in the warehouse

Next, warranty. Neither “installed” nor “under warranty” is stored as a ready field: they are computed facts. An item counts as installed if at least one of its deliveries is marked as mounted, and the warranty term is taken from the latest of those deliveries. A single item often has several delivery records, so the system collects both facts across all the deliveries of that item.

And language. An engineer types “switch” — or its Russian form, “свич” — while the catalog descriptions are in Serbian and Russian. The search runs across both language fields and the vendor name at once, and the model first translates the query into the database’s language. If a search by location name finds nothing, the system does not give up: it goes to the object registry for the canonical spelling of the name and runs the query again with it. One question about equipment at a station turns into a chain across two sources, just to identify the station correctly.

Domain assumptions are baked in here too. Cisco was never deployed on this network, so the model is explicitly forbidden from suggesting it and is told to assume, by default, the equipment families that are actually installed.

Vendor docs: chapters, not chunks

The sixth source is vendor documentation: telephone-system manuals, switch references, station design documents, datasheets. These are PDFs, and it would be tempting to slice them into fixed-size chunks and drop them into one search. We did it differently.

Each document is split into chapters by its own table of contents. First the system tries to take the structure from the PDF bookmarks; if there are none, it detects the table of contents from the text; and if the document turns out to be a scan with no text layer, it runs it through OCR in three languages and looks for the table of contents in the recognized text. One chapter becomes one record. A large document where no structure could be found is not loaded at all: better to leave it out than to hand the model an unreadable wall of text it will grab a random fragment from. Short documents like a two-page datasheet go in whole — there is nothing to slice.

The text in the database is a derivative of the source PDF, and over time it can drift from the original through recognition errors or shifted chapter boundaries. So there is a separate check: the system takes random chapters from the database, re-extracts the same pages from the source, and compares them. This catches exactly the case where the model confidently cites “page 14” while page 14 says something else.

What makes the answers reliable

A reliable answer starts with the system answering from a specific source and showing where each fact came from. Beyond that, several boundaries hold it in place, because in this environment a confidently wrong answer costs more than an honest “I don’t know”.

What matters most is what the assistant is even connected to. It reads reference and record-keeping sources: topology, configs, directories, the procurement database. To the systems that actually control movement, switches, and routes it is not connected at all, and it works with their descriptions and exports rather than with live control.

There are smaller but important safeguards too. While paging through a large result set from the procurement database, the model likes to repeat the same query; the system catches the repeat before it reaches the database and returns a message that turns the model back toward answering from what it already has. When an engineer asks about the live state of equipment, the system does not invent telemetry — it suggests the commands they can run to read that state themselves. And when there is no answer, or the question is off-topic, the assistant does not guess: it hands over the real on-call contacts so the person can call someone who will sort it out.

Honestly, about the limits. The model assigns each retrieved chunk a confidence score, but for now that lives as an instruction in the prompt and is not enforced in the code, so it is too early to rely on it as a hard threshold. The reference files are read into memory at startup: if something is switched over at a station, the assistant will only see it after the next restart.

Both of these — moving the confidence check into the code and refreshing the references on the fly — are understood and on the list. The system already works as it is.

How the system is built — the engineering map

QUESTION

natural language (Serbian · Russian · English) · arrives in the chat, the answer comes back in the language of the question

the orchestrator picks a source

ROUTING precedence rules and a ban on mixing

decide which source to look at first, then answer

topology — the authoritative source for links

link hints from the station configs are treated as secondary

object registry — a catalog, not a link graph

comms directory — a plan, not proof of installation

installed · under warranty · by waybill — only from the procurement database

six sources · one tool each

STATIC FILES loaded into memory at startup

small, rarely change, handed to the model whole

Network topology

port-to-port links between devices, management IPs, chassis IDs

Object registry

stations, posts, sections, and regions; object types

Station configs

PE-router configs: VLANs, routing, MPLS

Comms directories plan

dispatch consoles and workplace phones; SIP handsets, console buttons, and line terminals flagged separately

LIVE SOURCES queried on the fly

large, changeable, need real queries

Procurement database PostgreSQL

an equipment line item tied to its install location

“spare” and “not installed” — two independent flags; “installed” and warranty term — aggregates across all deliveries

search across the Serbian and Russian descriptions plus vendor; on a miss, normalize the name through the registry and retry

Vendor documentation chapters by table of contents

PDFs split by table of contents (bookmarks → text → OCR in three languages); one chapter, one record

a large document with no structure is not loaded; at most five chapters per request

boundaries that keep the answer reliable →

every tool is read-only dedup of repeated database queries no inventing live telemetry no answer → real on-call contacts

ANSWER

the reasoning folded into a collapsible block · the result in the language of the question · with a citation to the source

six sources · one orchestrator on pydantic-ai · an external Claude model via OpenRouter
five static files in memory · one relational procurement database · vendor docs by chapter
read-only access across every tool

What it changes

An engineer or dispatcher asks a question in plain words, in Serbian, Russian, or English, and gets an assembled answer: from the right source, by the correct spelling of the name, with a citation to where it came from. Where confidence is not enough, the system hands the question to a human and does not pass off the planned as the installed — which is why the answer can be trusted. This walk across several systems used to be held in the head of one experienced person, and the process bottlenecked on them. Now a single question does the same thing.