iharness
iharness @iharnesscom ·
AI捜査官、ヌンチャクを語る! ComfyUIプラグイン活用の手引 i-harness.com/articles/nunch… #comfyuinunchaku #flux #quantization
flux - AI捜査官、ヌンチャクを語る! ComfyUIプラグイン活用の手引 - quantization - diffusion

お巡りさん、お疲れ様です!‍♀ 現場のエンジニア諸君、最近のAIモデルは本当に容量がデカくて困っているようだな? そんなメモリと計算リソースの圧迫、どうにかしてくれと泣きつかれましてね。 そこで、この「nunchaku-tech/ComfyUI-nunchaku」、通称 "ヌンチャク" の出番だ!

From i-harness.com
26
Telos
Telos @teloslab ·
Replying to @saen_dev
@saen_dev 🚨 Breaking: TurboQuant in MLX achieves 6/6 exact match across 64K context—no accuracy loss, pure compression sorcery. 2.5-bit slashes KV cache 4.9x; 3.5-bit hits 3.8x. Models like Qwen3.5-35B-A3B now run locally on 13GB RAM. 🧠🔥 #Quantization #MLX #KVCacheOptimizatiwbb
290
Awesome Agents
Awesome Agents @awagents ·
Google's TurboQuant Cuts LLM Memory 6x With Zero Loss Google Research's TurboQuant compresses LLM key-value cache by 6x and delivers 8x speedup on H100 GPUs with zero accuracy loss - no fine-tuning required. #Google #Quantization
2
53
Matthieu Morel
Matthieu Morel @MorelMatth66161 ·
DynaMo: runtime bit-width switching for MoE. No retraining. arXiv 2503.21135: channel-level adaptation. Works on Qwen3-MoE and Mistral Small 4 without accuracy loss. Practical for local inference today. arxiv.org/abs/2503.21135 #MachineLearning #Quantization #LocalLLM #LLMs
arXiv logo
DynaMo: Runtime Switchable Quantization for MoE with Cross-Dataset...

As the Mix-of-Experts (MoE) architecture increases the number of parameters in large models, there is an even greater need for model quantization. However, existing quantization methods overlook...

From arxiv.org
28
Matthieu Morel
Matthieu Morel @MorelMatth66161 ·
ParoQuant (ICLR 2026): pairwise Givens rotations suppress PTQ outliers in reasoning LLMs. Reasoning models hit quant artifacts harder than standard LLMs. This targets that. No retraining. #AI #Quantization #LocalLLM #AIEngineering #ModelEvaluation arxiv.org/abs/2511.10645
arXiv logo
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning...

Post-training quantization (PTQ) compresses the weights and activations of large language models (LLMs) into low-precision representations to reduce memory footprint and accelerate inference....

From arxiv.org
26
mssfj
mssfj @_mssfj ·
I started a new project: lowbit-math-reasoning Initial setup: - Qwen3.5-9B quantized with GPTQ (8-bit / 4-bit) ・mssfj/Qwen3.5-9B-GPTQ-INT8 / -INT4 Next: GSM8K/MATH/HLE(MATH) evaluation to measure reasoning collapse under lowbit constraints. #LLM #Quantization #MathReasoning
1
46
Akash Motghare
Akash Motghare @codesacure ·
Huge news in AI efficiency! A researcher just trained 4-bit CNNs from scratch on a CPU, achieving near FP32 accuracy. This could revolutionize deployment for tiny devices. #AI #DeepLearning #Quantization #EdgeAI blog.codesacure.com/4-bit-quantiza…
4-Bit Quantization: A Breakthrough for Efficient AI

In the rapidly evolving landscape of artificial intelligence, the quest for more efficient and less resource-intensive models is paramount. Deep learning models, while incredibly powerful, often...

From blog.codesacure.com
16