Crown Citadel Performance Lab

Strix Halo local AI benchmarks

Model throughput, memory use, long-context behavior, serving latency, speculative decoding, and task quality measured on the Crown Citadel Ryzen AI Max+ 395 lab system.

Updated NixOS • Linux 7.0.1 RADV STRIX_HALO

Ryzen AI Max+ 395 128 GiB UMA lab host

Articles

Articles about running AI locally: hardware, settings, model behavior, tooling, and lessons from the lab.

Your Local Model Settings Matter More Than You Think

How tuned llama.cpp settings changed prompt processing and generation speed on the same Strix Halo machine, with paired before/after bars and percent lifts.

Strix Halo llama.cpp Tuning

Read article

VRAM / Speed Frontier

GPU memory on x, standalone decode on y, and selected context speed as color.

Context

Prompt / Prefill vs Decode

Context-side prompt processing on x and standalone decode on y, grouped by model family.

Context Curves

Only models with at least four measured context points and one 64k+ result are drawn, so the lines represent real scaling behavior rather than one-off tests.

Benchmark Atlas

Full benchmark matrix. Context columns show prompt-processing or combined prompt+generation throughput, while standalone decode is kept separate in the TG column.

Search Metric Sort Family Show

Profile	Backend	KV / Shape	4k	8k	16k	32k	64k	80k	100k	128k	TG	Memory	Measurements

MTP Serving Lab

Speculative serving sweeps: draft settings, acceptance rate, request time, decode speed, and memory deltas.

Run	Config	Total	Prompt	Decode	Acceptance	Memory

Serving Request Runs

End-to-end local serving measurements: first-token delay, total request time, prompt size, generated output, and sampled memory.

Run	First token	Total	Prompt	Output	Memory

Hermes Aux Behavior Tests

These tests run Hermes-style support tasks through candidate auxiliary models. They measure whether an aux model can reliably handle agent side work, while also tracking first-token latency, total latency, score, pass rate, and coverage. Dot size shows how many eval rows support the point.

Candidate	Pass	Score	Total	TTFP	Coverage

Hermes Loadout Behavior

Loadouts test the Hermes main model and auxiliary model together as a routing plan. They measure more than throughput because an agent loadout has to be correct, responsive at p95, and small enough to stay loaded beside the main model without crowding memory.

Loadout	Main	Aux	P95	VRAM

Race Gallery

Each card explains what the race tests before you open it. These are focused head-to-head pages for speculative decoding, long-output behavior, task correctness, run health, and memory pressure.

Coding Quality Lab

Code review security tasks from the Hermes auxiliary model suite. Harness or launch/config invalid rows are excluded from model scoring and shown as excluded coverage.

Code Review Ranking

Ranked by adjusted pass rate, then score, then latency. Coverage shows scored code-review tasks separately from harness/config rows.

Candidate	Pass	Score	Total	Coverage

Method Notes

How to read the dashboard without knowing the benchmark harness internals.