Key architectural details

Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.

119B total parameters, with 6B active parameters per token (8B including embedding and output layers).

256k context window, supporting long-form interactions and document analysis.

Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.

Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.

  • fubarx@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    6 hours ago

    At this point, these small models should add explicit minimum hardware requirements just so they can stand out. STM32 w xxGB of PSRAM. Android phone w this much RAM, how many TOPS, and minimum OS version. ESP32-S3 or S4? That sort of thing.

    If you just say ‘small,’ you get lost in the noise.

    • ikt@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      5 hours ago

      tbh that’s the main thing I took away from this, since when did small equal 119b ?!

      Does that mean they’ve got large models lined up approaching 1tb?

      • fubarx@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 hours ago

        Cloud-based LLMs have been commodotized. Lots of options.

        There’s room for someone to lead the local on-device space. Anything from a high-end workstation (Apple Studio, Nvidia DGX Spark, AMD Strix) to laptop (MBPro, Windows AI) down to embedded (Qualcomm, STM32) and ultra-small (ESP32, ARM/RISC).

        Lots of room there and no clear winners. Mistral, at this point could focus on the other tiers, make a name, and carve a lot of mindshare.