FlexAI News
Everyone assumes “bigger model = better results.” LegML set out to prove otherwise.
LegML fine-tuned a 32B-parameter legal LLM that outperformed a frontier model on their domain benchmarks while cutting both training and serving costs. With FlexAI as the workload platform, they trained reliably at scale, deployed efficiently, and demonstrated that smaller, specialized, and sovereign can beat larger, generic, and outsourced.
LegML’s goal was simple: deliver higher accuracy for legal tasks with a model that could run inside a company’s own infrastructure with no data leaves, no black-box APIs, and no lock-in.
They combined supervised fine-tuning on legal workflows (drafting, compliance, Q&A) with Group-Relative Policy Optimization (GRPO) to raise reasoning quality, citation accuracy, and legal consistency.
The constraint: do it on a predictable budget and timeline.
Before FlexAI, LegML hit the same walls most teams trying to fine tune a model face:
The net effect: unreliable schedules, unpredictable spend, and engineering time spent on infrastructure instead of model quality.
FlexAI removed the infrastructure drag so LegML could focus on the model:
Legml’s Hugo outperformed Mistral’s flagship model across every benchmark, delivering +10% higher factual precision, with 50% fewer parameters and 75% lower compute cost.

LegML’s 32B model (“Hugo”) outperformed a leading frontier model on LegML’s benchmark suite, including +10% higher factual precision, with 50% fewer parameters.
14 days on FlexAI-provisioned distributed H100 (via partner capacity) for approximately €22,500 total.
Hugo runs efficiently on a single H200 GPU at $3.15/hour, about $9,072 for six months of continuous operation.
A comparable 70B model would require 2× B200 at $6.25/hour, roughly $36,000 over the same period.
In plain terms, Hugo delivers higher accuracy with ~4× lower operating cost than the larger alternative.
Two independent legal experts reviewed outputs and concluded Hugo produced more accurate and complete answers for legal reasoning tasks.
LegML describes the result as “peace of mind”: every workload completed successfully and cost-effectively, so the team could iterate on data and training signals instead of babysitting clusters.
LegML and FlexAI are turning this into a repeatable blueprint: full-parameter fine-tuning on curated sector corpora, with continuous learning pipelines and hybrid human + LLM evaluation.
The approach is already moving from law into finance, insurance, and public administration—domains where precision, governance, and sovereignty are non-negotiable.
FlexAI remains the foundation: a single platform that handles orchestration, scaling, and monitoring so teams can spend their time on data, training signals, and product.
If you’re exploring a domain-specific LLM—and want accuracy, control, and predictable economics—let’s talk. We can explore the optimal cost/performance set up before you run to help you achieve the same economics.
Get in touch → hello@flex.ai
P.S. We’ll be at TechCrunch Disrupt next week. Stop by the FlexAI booth for a live walkthrough of the LegML blueprint and our new Inference Sizer.

To celebrate this launch we’re offering €100 starter credits for first-time users!
Get Started Now