lawbench

Self-contained MinT experiment for LawBench. Evaluates the official 20-task LawBench benchmark under one fixed local benchmark contract and keeps a maintained local execution baseline built around Qwen/Qwen3-4B-Instruct-2507 plus LoRA SFT. Not a paper-faithful reproduction of Qzhou-Law or DISC-LawLLM — the official scorer and benchmark contract stay fixed, but the maintained runnable line is a smaller local execution baseline.

At a glance


Algorithm	LoRA SFT
Base model	`Qwen/Qwen3-4B-Instruct-2507`
Training data	public `DISC-Law-SFT` train artifact
Benchmark	full 20-task LawBench (`data/eval/full.jsonl`, ~10,000 rows)
Primary metric	`METRIC eval_lawbench_avg`
Upstream README	Open in mint-cookbook →

For setup, runnable commands, and full eval protocol, see the upstream README. The experiment follows the shared cookbook lifecycle: uv sync → --dry-run → --eval-only → train.

lawbench

At a glance

On this page