Public Sector Legal AI

AI assistant for a government lawyer in procurement disputes

The legal team of a government customer litigates against contractors under Russia's public procurement law (44-FZ): recovering penalties and terminating contracts. We built a system that prepares the lawyer's position on a dispute — the applicable rules, case law, risks, and the other side's counterarguments — a draft that the lawyer reviews and takes to court.

The challenge

A large government customer has hundreds of contracts with contractors, and some of them end up in court. Every dispute lands on the legal team.

A typical 44-FZ dispute

The contractor missed the deadlines, failed to deliver, broke the terms. The customer has to recover a penalty, terminate the contract, or fend off a counterclaim.

the position is assembled by hand from four sources

44-FZ Civil Code case law the customer's past cases

the cost of a mistake is double money not recovered, and questions from oversight bodies

We built a system that prepares a draft of that position for the lawyer. The genuinely hard part turned out to be nothing like what you would expect from a task like this.

Take a penalty for late delivery: can it be recovered, and what is the ceiling? Any legal database will return the relevant articles in a second — liability applies, here is the formula, go to court. That is how a first-week intern would answer.

A lawyer who has actually run these disputes asks something else first: did the customer itself breach anything, hand over the site on time, sign off what was required, accept the milestones? If the customer failed to cooperate, the court can cut the penalty down or refuse to award it at all. These claims fall apart more often on counterclaims, on the customer’s own delay, or on a botched termination procedure than on a weak position on the merits.

The law here is public, it sits in open databases, and anyone can find an article. The lawyer’s value is elsewhere: remembering what gets used against you under that same law, and not repeating the mistake that tripped you up last time. The system assembles the position on a dispute — applicable rules, case law, risks, the other side’s counterarguments — and hands it over as a draft memorandum: a finished opinion with a stated position that the lawyer reviews and takes to court.

The most dangerous rule is the one missing from the case

The obvious first move for a product like this is document search: take the question, find the most similar passages from laws and rulings, hand them to the model, and let it write the answer. For a quick reference question that works. For a court dispute it breaks on the very first case.

Back to the article on the customer’s duty to cooperate. In a court ruling about a contractor’s delay those words may not appear at all: cooperation is the contractor’s defense, something it raises in response, and it does not feature in the subject of the claim. Semantic search will not pull it up, because it is not in the text of the case. And the lawyer who forgot about it will hear it in the courtroom from the other side.

That is where the central decision of the whole system came from: when it comes to completeness, do not trust search. A rule being absent from the retrieved documents does not mean it is not needed. More often the opposite is true: the rules most dangerous to the customer are exactly the ones it would never write into its own claim.

An engine that supplies the other side’s arguments

So half of the engineering here lives outside search, in what the system adds to the retrieved material on its own. From the wording of the question it identifies the type of dispute and pulls in the rules the lawyer is obliged to check, even when they are nowhere in the case.

Ask about a penalty for delay, and the system lays out what the contractor will defend with before the lawyer even thinks of it. The links are written by hand, like the memory of someone who has sat through hundreds of hearings and knows which rules travel to court together. It is manual work, and for every new type of dispute the map is written out again.

There is caution in the other direction too: the system just as deliberately drops a rule that does not belong. The Civil Code and the public procurement law run on different regimes, and mixing their sanctions in one argument is a mistake the opponent spots immediately. So on a purely civil question the system does not drag in procurement articles, and when the question is specifically about reducing a penalty it removes the cooperation rules: that is a different axis of the dispute, and the extra material only adds noise.

Search across three corpora and rule injection — the map

THE LAWYER'S QUESTION

“the contractor was 45 days late — we want to recover the penalty, what is the maximum?”

category and facts are derived from the question

CATEGORY AND TAGS

the model assigns the question to one of seven dispute categories; if it is unavailable, a keyword fallback takes over

fact tags come from the wording of the question: acceptance of work, termination, weather, a direct reference to 44-FZ

in parallel

SEARCH ACROSS THREE CORPORA

find what is in the texts

Statutory base lawyers_knowledge

44-FZ, the Civil Code, government decrees, letters from the Ministry of Finance / FAS / Ministry of Economic Development, Supreme Court clarifications

General case law lawyers_practice

court rulings, reviews and clarifications from the higher courts

The customer's cases lawyers_practice_cases

its own rulings on similar disputes, labeled by category

RULE INJECTION

add what is not in the texts but the lawyer must check

Penalty for delay

the customer's duty to cooperate (Art. 718, Civil Code) → the creditor's own delay (Art. 406) → reduction of the penalty (Art. 333) + reduction criteria from Supreme Court practice

By the facts of the case

acceptance → quality of work (Art. 720, 753, 715); termination → procedure (Art. 450 Civil Code, Art. 103 of 44-FZ); weather → force majeure (Art. 401, 416)

A direct question about the procurement

44-FZ articles are mixed in (34, 94, 95, 103); otherwise the procurement-law regime is kept out of the answer

An injected rule is flagged priority + reason

protected from being cut by quota and carrying a note on why it entered the answer

memorandum assembly →

a finished example is further down the page

3 corpora in Qdrant · dispute category by model + tags by keywords · hand-built ontology of rule links
pydantic-ai · Qdrant · Qwen3-Embedding-0.6B · DeepSeek · FastAPI / SSE · PostgreSQL · inside the customer's perimeter

Saying it confidently and wrongly is the worst outcome here

In an ordinary chatbot the worst that happens is an invented fact. In a legal assistant something else is more dangerous: a smooth, well-phrased untruth about the law.

The same cooperation article again. Its text says “the customer is obliged,” and the model cheerfully concludes that under this article the customer has the right to recover from the contractor. It sounds confident, on point, with the right citations. The meaning is the reverse: the article protects the contractor, and a lawyer who believes the model walks into court with an argument against himself.

So a layer of checks sits on top of generation, and each one grew out of a specific mistake the model had already made. The system learned to spot these failures phrase by phrase and cut or soften the sentence they appear in — whether it is the cooperation article turned against the customer, or a penalty reduction that has drifted into a 44-FZ argument.

The most important articles have a safety net: if the model went silent or got it wrong, a wording vetted in advance by a lawyer goes into the answer. There must be no empty space where a key rule should be.

Correctness filters — what gets cut after generation

PHRASE BY PHRASE, AFTER GENERATION

each rule is a specific mistake the model has already made

Cooperation turned against the customer cut

Art. 718 in the same sentence as “recover,” “unilateral,” or “reduce” → the sentence is removed: the rule protects the contractor, it does not grant rights to the customer

The wrong legal regime cut

Art. 333 of the Civil Code on reducing a penalty inside a 44-FZ argument → removed: the Civil Code and the procurement law are not mixed in one argument

“Automatically” next to a penalty soften

→ “in the manner established by law,” so the lawyer does not skip the pre-claim procedure

An empty spot on a key rule fallback

if cleanup leaves almost nothing of the rule, a wording vetted in advance by a lawyer goes into the answer

rules fire on a match between an article number and the words around it · the fallback never leaves a key rule empty

The answer is assembled like a lawyer’s reasoning

The answer is built like a memorandum a senior lawyer would write. Each block is assembled separately and under its own constraints, so the recommendations do not dissolve into polite mush. Here is what such a draft looks like on the running penalty example.

Draft memorandum · example

The contractor was 45 days late on delivery. Recover the penalty — and what is the ceiling?

Bottom line

The penalty can be recovered. But expect it to be reduced: the court may lower it under Art. 333 of the Civil Code as disproportionate.

Applicable rules

Art. 330 of the Civil Code — penalty for delay; Art. 333 — its reduction; Art. 718 and 406 — the customer's duty to cooperate and its own delay.

Case law

Supreme Court clarifications on the criteria for reducing a penalty; case law on contractor-delay disputes.

The customer's experience

Its own past cases on similar delays — how they ended and on which argument they fell apart.

Critical risks

Check yourself first: did you hand over the site on time and sign off what was required? If you did not cooperate, the contractor will invoke Art. 718, and the penalty will be cut.

Recommendations

Follow the pre-claim procedure before filing; justify the penalty's proportionality in advance.

Conclusion

The position holds. The main threats are reduction under Art. 333 and the contractor's cooperation argument.

The benchmark here is set by real answers from lawyers. For each of the seven dispute categories there is a model analysis in the same memorandum structure, and the system measures itself against them: did it match on substance, what did it miss, what did it add that did not belong. The same benchmarks drive a regular run so that quality does not drift after the next round of edits.

Inside the customer’s perimeter

The whole system runs inside the customer’s perimeter. Court cases, correspondence, and internal materials never leave it, and model inference runs in the same place, inside the boundary. For a government body this is the condition without which the conversation does not even start.

And all of it was assembled on the customer’s real cases, not on a clean training corpus. The data arrives dirty:

court rulings come as exports from legal databases, with technical headers, cloud archives, and encodings from the nineties;
commercial-court case numbers show up with a Latin “A” one time and a Cyrillic “А” the next, separated by a slash or an underscore;
the category-labeling spreadsheet has typos right in the headers.

All of this has to be normalized to a single form, otherwise the customer’s own case cannot be found by its own number. The work is thankless, but there is no way around it: a citation to a rule or a case has to be exact down to the article and the number, and a wrong citation in a memorandum is worse than a missing one. A lawyer notices an absence right away; a wrong reference he catches only in court.

Result

The system works as an assistant to the customer’s legal team. The lawyer used to start a dispute from a blank page, assembling the position by hand across several databases. Now he opens a ready draft that already brings together the applicable rules, including the ones the other side will strike with, the customer’s relevant past cases, the risks, and the recommendations. From there he reviews, edits, and decides what to take to court. The final word stays with him.

The system does not know the law better than the lawyer, and it does not need to. It holds what a human holds poorly: it remembers the inconvenient rule that was left out, and it does not confuse who each article actually helps. Those are the two things that lose procurement disputes.

What we learned in the pilot

Three corpora: statutes, case law, and the customer's own cases

The engine adds the rules the other side strikes with

Phrase-level filters against confidently inverted law

The answer takes the form of a memorandum across seven dispute categories

Quality is measured against benchmark answers from lawyers

The whole system runs inside the customer's perimeter; data never leaves

Three corpora: statutes, case law, and the customer's own cases

The engine adds the rules the other side strikes with

Phrase-level filters against confidently inverted law

The answer takes the form of a memorandum across seven dispute categories

Quality is measured against benchmark answers from lawyers

The whole system runs inside the customer's perimeter; data never leaves

Platform modules used in this project

Documents Qdrant

Three corpora in Qdrant: statutes (44-FZ, the Civil Code, government decrees, letters from the ministries), general case law (reviews and plenary rulings of the Supreme and Higher Commercial Courts), and the customer's own cases. On top of retrieval, a manual injection of rules that are not in the case text but the lawyer is obliged to check

Chat & Agents pydantic-ai

pydantic-ai over the legal pipeline. The answer is assembled section by section — bottom line, rules, case law, the customer's experience, risks, recommendations — each with its own constraints. Conversation history lives in threads

Guardrails

Phrase-level filters against inverted interpretation: the cooperation article without false recovery rights for the customer, the Civil Code rule on reducing a penalty kept out of a 44-FZ argument. For key articles, lawyer-vetted wordings in case the model fails

Evaluation

Benchmark answers from lawyers across seven dispute categories. Word-level similarity plus a judge-model assessment: what matched, what was missed, what was added that did not belong. A regular run so quality does not drift after edits

Inference

DeepSeek for section-by-section memorandum assembly, Qwen3-Embedding-0.6B for vectorizing the three corpora. Inference runs inside the customer's perimeter

All platform modules →

Tell us which process you want to break down.

We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.

or write directly to ilya@manaraga.ai