Definition of can dish it but can’t take it

  • P03 Locke@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 hours ago

    DeepSeek API isn’t free, and to use Qwen you’d have to sign up for Ollama Cloud or something like that

    To use Qwen, all you need is a decent video card and a local LLM server like LM Studio.

    Local deploying is prohibitive

    There’s a shitton of LLM models in various sizes to fit the requirements of your video card. Don’t have the 256GB VRAM requirements for the full quantized 8-bit 235B Qwen3 model? Fine, get the quantized 4-bit 30B model that fits into a 24GB card. Or a Qwen3 8B Base with DeepSeek-R1 post-trained Q 6-bit that fits on a 8GB card.

    There are literally hundreds of variations that people have made to fit whatever size you need… because it’s fucking open-source!

    • lacaio@mander.xyz
      link
      fedilink
      arrow-up
      1
      ·
      7 hours ago

      Training LLMs is very costly, and open-weights aren’t open-source. For example, there are some LLMs in Brazil, but there is a notable case for a brazilian student on the University of Dusseldorf that banded together with two other students of non-brazilian origin to make a brazilian LLM. 4B model. They used Google to train the LLM, I think, because any training on low VRAM won’t work. It took many days and over $3000 dollars. The name is Tucano.

      I know it looks cheap because there are many, but many country initiatives are eager on AI technology. It’s costly.