Introducing Mistral Small 4 | Mistral AI

ikt@aussie.zone · 13 hours ago

Introducing Mistral Small 4 | Mistral AI

panda_abyss@lemmy.ca · edit-2 9 hours ago

Looks a little underwhelming with Qwen3.5 and Haiku beating it.

However, 6B active parameters and it’s trained to return short results could make this useful as a Qwen for local model. I’ve overall found Mistral models to be better to discuss with, but also the devstral small models were kinda janky last I used them (stuff like infinite loops and getting confused by less common programming languages). Qwen models are by far the most verbose out of the box, and happily burn a ton of tokens on useless thought. It’s an over-emphasis on reinforcement learning.

Also weird they use GPT 4.1 as the judge model. That’s a year old model, not nearly SOTA, and IIRC underwhelmed on most metrics. So it feels like a poor candidate judge.

Edit: we have a GPT5 – some of the charts are labelled wrong

Not mentioned in the blog post, but on HF: they created a small speculative decoding model go with it – https://huggingface.co/mistralai/Mistral-Small-4-119B-2603-eagle

That should accelerate inference speeds on some setups.

MalReynolds@slrpnk.net · 7 hours ago

For certain values of small…

That said, Mistral is strong in world knowledge and something this big is likely quite so. The 6B experts can fit in reasonable amounts of system RAM (Q4_K_M is ~ 72 GB so it’d likely run reasonably in 64 GB system RAM and 24 GB VRAM) and run at reasonable if not spectacular speeds, speculative decoding could help too (but that eagle is 392MB, which is scary tiny).