Zotac accidentally lists RTX 5090, RTX 5080, and RTX 5070 family weeks before launch — accidental listing seemingly confirms the RTX 5090 with 32GB of GDDR7 VRAM

Xatolos@reddthat.com · 2 years ago

Zotac accidentally lists RTX 5090, RTX 5080, and RTX 5070 family weeks before launch — accidental listing seemingly confirms the RTX 5090 with 32GB of GDDR7 VRAM

BaroqueInMind · 2 years ago

I just want one to self host a 70B LLM model for fuck’s sake. I don’t want to be forced to take out a god damned mortgage/personal loan to buy one.

2 years ago

Damn, they require like 50GB vram, that’s nuts.

teuto@lemmy.teuto.icu · 2 years ago

I picked up a pair of old Tesla P40s. Right now I’m running a Q4 quant of Qwen 2.5 72B that fits in the combined 48GB of VRAM with 12k context. They aren’t as fast as newer consumer cards, but it generates as fast as I can read while costing less than a used 3080.

BatrickPateman@lemmy.world · 2 years ago

interesting. They are cooled passively, right? What’s your case and cooling setup?

teuto@lemmy.teuto.icu · 2 years ago

I have a dell power edge 730, which was about $200. It’s CPU shrouds perfectly match the GPU intakes so air just flows through both from the server fans. I’ve seen a few 3d printable fan mounts for jury rigging them into a regular tower too.

brucethemoose@lemmy.world · edit-2 2 years ago

Qwen 2.5 32B is where it’s at now. 24GB is affordable, and it fits perfectly.

Otherwise, stay on the lookout for AMD Strix Halo, which can reportedly allocate up to 96GB on its IGP, and you can run faster backends like vllm or exllama.

BaroqueInMind · 2 years ago

What’s up with Qwen that makes it better than anything else?

brucethemoose@lemmy.world · edit-2 2 years ago

It’s just smarter with the same number of parameters. Try Qwen QwQ or Qwen coder 32B, see for yourself… it stacks up well against huge models like the 123B Mistral Large, or even GPT-4.

Why? Alibaba trained it well, presumably with better data than OpenAI or whomever else, though specifics are up for debate. Some suggests that bilingual training on English/Chinese (aka the two largest text corpuses in existance) significantly helps the model over mostly english. Some say the government just gave them better data. There’s also suggestions that having so few GPUs compared to American AI companies made the Chinese “thrifty,” and gave them far more incentive to be innovative rather than brute forcing models (which has diminishing returns).

Ragdoll X@lemmy.world · 2 years ago

You might just want to use Kaggle tbh

BaroqueInMind · 2 years ago

Never heard of it.