@Not official@ly out yet—Google only released the paper, no code.
But yes, you can use it today on your Bosgame M5 128GB (Apple Silicon) with Qwen3.5-35B-A3B!
Try this llama.cpp fork with Metal support: github.com/TheTom/turboqu…
Build it, then run with --cache-type-k turbo3 --cache-type-v turbo3 for ~5x KV savings and solid speed on your setup. Benchmarks confirm it works great on that exact model/hardware.
GitHub - TheTom/turboquant_plus
Contribute to TheTom/turboquant_plus development by creating an account on GitHub.
From github.com 1





