Three infrastructure layers and six engineering tracks. Every project takes the exact subset it needs and nothing more.
A telecom operator, an investment business, and a transport system all have different workflows and constraints. The engineering problems still overlap: all of them need inference, guardrails, and document work. The tuning is what changes. In finance, filters block investment recommendations. In transport, they block hallucinated incidents. In telecom, they block tone violations and answers the agent should never give.
Below is the full map. Rows are three layers with different degrees of customization. Columns are six engineering tracks. Each cell falls into one of three types. Open source means mature components that do not need rewriting: vLLM for inference, Langfuse for observability, Qdrant for vector search. The Manaraga platform layer contains modules we move between projects and harden on every deployment: agent orchestration, chat, inference scaling, corporate tone. Custom development is the code that remains process-specific: RAG pipelines, domain agents, CRM and ERP integrations.
Every project is assembled from three layers. The bottom layer does not depend on the industry. The middle layer is reused across projects. The top layer is written for the exact business workflow. Security is not a separate layer but a cross-cutting requirement: data masking, attack filtering, decision audit, and access control live inside every component.
Hosting, request routing, vector databases. Mature open source already works here; our job is to tune it for enterprise load.
Observability, guardrails, evaluation, orchestration, and memory. This is where most of our own reusable engineering work lives.
Client document search, domain agents, CRM and ERP connectors, synthetic datasets, fine-tuning. Code written for the business process and handed over to the client.
Every project needs several compute profiles at once: classification, generation, vectorization, each with different latency and cost constraints. One model and one shared pool of capacity do not work in an enterprise environment. Tasks compete for resources, and a single provider outage can freeze the whole system — exactly what happened on the transport project before we split workloads across separate instances with automatic fallback.
We split inference into four GPU instance types: reasoning, fast generation, vectorization, and vision. The router distributes requests, flips to a backup model on failure, and enforces quotas and priorities by project.
Latency and error rate do not explain why the agent responded the way it did or how much one outcome actually cost. On the telecom project it was business metrics, not infrastructure metrics, that showed where the agent already beat the operator and where it had to stay out of the loop.
We collect two layers of metrics. The engineering layer traces every call, every tool chain, and token-level cost. The business layer tracks support funnels, automation share, and cost per outcome.
Prompt injection and data leakage are baseline threats, and standard libraries can catch them. But every industry also has its own prohibitions that no generic library understands. In the investment project, the agent began hinting at answers to qualification tests — something the regulator bans outright.
We place filtering on both sides of every model call: data masking, attack detection, and business-specific rules. In banking we built a multi-layer compliance system with prompt constraints, refusal scenarios, checker loops, and audit logs for every answer.
In the investment project, compliance became a multi-layer stack: prompt constraints, refusal paths, checker loops, and audit of every answer. Case →
Quality cannot be checked once and forgotten. Models change, data changes, prompts drift, and answers get worse. On the transport project, a binary “confident / not confident” threshold created too many false escalations: the system sent tickets to operators that it was actually capable of answering safely.
We built a three-pass confidence formula with 30+ parameters calibrated on real requests. It decides when the agent can answer and when a human is still required. Alongside it run LLM judges, production benchmark cases, and synthetic datasets so quality regressions are caught before production, not after a complaint.
In the transport project, the key artifact was a three-pass confidence formula with more than thirty production-calibrated parameters. Case →
Every company has its own regulations, knowledge base, and normative documentation. Standard RAG can retrieve a vaguely similar paragraph, but enterprise tasks are stricter. On the telecom project, tariff questions required exact numbers from tables while vector search kept returning approximate narrative matches.
We built a dual index: one branch for semantic retrieval and another for exact data such as tariff tables, prices, and technical parameters. Pure vector search tends to lose numbers and tables because they vectorize badly.
In the telecom project, a dual index separated semantic search from exact-table retrieval with prices and parameters. Case →
An agent in a demo answers questions. An agent in production must remember context across sessions, call tools, follow a scenario, and escalate to a human at the right boundary. In the finance project, the agent ran a strict sales funnel, remembered previous client conversations without mixing products, and stayed inside compliance limits — effectively a finite-state machine with multiple control loops.
We built orchestration, chat, and memory infrastructure so it does not need to be rebuilt on every project. A dedicated component, Content Digital Twin, is responsible for corporate tone: it took more than sixty iterations before the agent sounded like an actual employee rather than a chatbot.
Every project pulls its own subset from the map. Observability and evaluation are required everywhere. Guardrails are tuned to the industry: multi-layer compliance in finance, hallucination filters in transport, corporate tone and escalation boundaries in telecom. Document and agent modules are always assembled around the exact process.
Everything is deployed inside the client perimeter. Every component ships as a standard container.
We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.
Inquiry sent
We will reply within one business day to the email you provided.
or write directly to ilya@manaraga.ai