• mcv@lemmy.zip
    link
    fedilink
    arrow-up
    2
    ·
    1 month ago

    But if that’s how you’re going to run it, why not also train it in that mode?

    • Xylight@lemdro.id
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      That is a thing, and it’s called quantization aware training. Some open weight models like Gemma do it.

      The problem is that you need to re-train the whole model for that, and if you also want a full-quality version you need to train a lot more.

      It is still less precise, so it’ll still be worse quality than full precision, but it does reduce the effect.