Key architectural details

Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.

119B total parameters, with 6B active parameters per token (8B including embedding and output layers).

256k context window, supporting long-form interactions and document analysis.

Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.

Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.

  • MalReynolds@slrpnk.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 day ago

    For certain values of small…

    That said, Mistral is strong in world knowledge and something this big is likely quite so. The 6B experts can fit in reasonable amounts of system RAM (Q4_K_M is ~ 72 GB so it’d likely run reasonably in 64 GB system RAM and 24 GB VRAM) and run at reasonable if not spectacular speeds, speculative decoding could help too (but that eagle is 392MB, which is scary tiny).