Joel Niklaus (Huggingface): SwiLTra-Bench: The Swiss Legal Translation Benchmark

Artificial Justice Vortragsreihe

Datum: 03.06.2026
Uhrzeit: 14:00
Ort: Online-Veranstaltung

Der Vortrag findet auf Englisch statt!

About the Speaker
Joel Niklaus is a Machine Learning Engineer at Hugging Face working on synthetic data. Previously, Joel was a Research Scientist at Harvey, specializing in large language model systems for legal applications. Before that, he pretrained LLMs at (Google) X and Thomson Reuters Labs. Joel holds a PhD in NLP from the University of Bern and conducted research at Stanford University, leading projects for the Swiss Federal Supreme Court and AI startups such as Darrow and Libra. His multilingual datasets and benchmarks have shaped the evaluation of LLMs in the legal domain. He regularly contributes to open-source projects including datatrove, lighteval, and Marin. His work has been published at top conferences, honored with an Outstanding Paper Award at ACL, and covered by Anthropic and the Swiss National Radio & Television. His teaching and speaking experience spans universities, high schools, and corporate audiences across NLP and computer science.

About the Topic
In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.

About the Speaker Series
The Artificial Justice Speaker Series features guests working at the intersection between law, computer science, and the humanities. Neither technical nor juristic knowledge is a prerequisite for participation—the Series is aimed at anyone with an interest in critical and interdisciplinary perspectives on “Law and AI.” The event takes place on Zoom and is scheduled to last one hour.