Crown Citadel Performance Lab

Strix Halo local AI benchmarks

Model throughput, memory use, long-context behavior, serving latency, speculative decoding, and task quality measured on the Crown Citadel Ryzen AI Max+ 395 lab system.

Updated NixOS • Linux 7.0.1 RADV STRIX_HALO
AMD Ryzen AI Max+ badge
Ryzen AI Max+ 395 128 GiB UMA lab host

Articles

Articles about running AI locally: hardware, settings, model behavior, tooling, and lessons from the lab.

Your Local Model Settings Matter More Than You Think

How tuned llama.cpp settings changed prompt processing and generation speed on the same Strix Halo machine, with paired before/after bars and percent lifts.

Read article

VRAM / Speed Frontier

GPU memory on x, standalone decode on y, and selected context speed as color.

Prompt / Prefill vs Decode

Context-side prompt processing on x and standalone decode on y, grouped by model family.

Context Curves

Only models with at least four measured context points and one 64k+ result are drawn, so the lines represent real scaling behavior rather than one-off tests.

Benchmark Atlas

Full benchmark matrix. Context columns show prompt-processing or combined prompt+generation throughput, while standalone decode is kept separate in the TG column.

Profile Backend KV / Shape 4k 8k 16k 32k 64k 80k 100k 128k TG Memory Measurements

MTP Serving Lab

Speculative serving sweeps: draft settings, acceptance rate, request time, decode speed, and memory deltas.

Run Config Total Prompt Decode Acceptance Memory

Serving Request Runs

End-to-end local serving measurements: first-token delay, total request time, prompt size, generated output, and sampled memory.

Run First token Total Prompt Output Memory

Hermes Aux Behavior Tests

These tests run Hermes-style support tasks through candidate auxiliary models. They measure whether an aux model can reliably handle agent side work, while also tracking first-token latency, total latency, score, pass rate, and coverage. Dot size shows how many eval rows support the point.

Candidate Pass Score Total TTFP Coverage

Hermes Loadout Behavior

Loadouts test the Hermes main model and auxiliary model together as a routing plan. They measure more than throughput because an agent loadout has to be correct, responsive at p95, and small enough to stay loaded beside the main model without crowding memory.

Loadout Main Aux P95 VRAM

Race Gallery

Each card explains what the race tests before you open it. These are focused head-to-head pages for speculative decoding, long-output behavior, task correctness, run health, and memory pressure.

Coding Quality Lab

Code review security tasks from the Hermes auxiliary model suite. Harness or launch/config invalid rows are excluded from model scoring and shown as excluded coverage.

Code Review Ranking

Ranked by adjusted pass rate, then score, then latency. Coverage shows scored code-review tasks separately from harness/config rows.

Candidate Pass Score Total Coverage

Method Notes

How to read the dashboard without knowing the benchmark harness internals.