Key architectural details

Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.

119B total parameters, with 6B active parameters per token (8B including embedding and output layers).

256k context window, supporting long-form interactions and document analysis.

Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.

Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.

  • panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    9 hours ago

    Looks a little underwhelming with Qwen3.5 and Haiku beating it.

    However, 6B active parameters and it’s trained to return short results could make this useful as a Qwen for local model. I’ve overall found Mistral models to be better to discuss with, but also the devstral small models were kinda janky last I used them (stuff like infinite loops and getting confused by less common programming languages). Qwen models are by far the most verbose out of the box, and happily burn a ton of tokens on useless thought. It’s an over-emphasis on reinforcement learning.

    Also weird they use GPT 4.1 as the judge model. That’s a year old model, not nearly SOTA, and IIRC underwhelmed on most metrics. So it feels like a poor candidate judge.

    Edit: we have a GPT5 – some of the charts are labelled wrong

    Not mentioned in the blog post, but on HF: they created a small speculative decoding model go with it – https://huggingface.co/mistralai/Mistral-Small-4-119B-2603-eagle

    That should accelerate inference speeds on some setups.

    • MalReynolds@slrpnk.net
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      For certain values of small…

      That said, Mistral is strong in world knowledge and something this big is likely quite so. The 6B experts can fit in reasonable amounts of system RAM (Q4_K_M is ~ 72 GB so it’d likely run reasonably in 64 GB system RAM and 24 GB VRAM) and run at reasonable if not spectacular speeds, speculative decoding could help too (but that eagle is 392MB, which is scary tiny).