The theory taking the rich by storm: China funds data center haters

Yuritopiaposadism [none/use name]@hexbear.net · 2 天前

The theory taking the rich by storm: China funds data center haters

LaughingLion [any, any]@hexbear.net · 2 天前

Qwen was the gold standard for small local models for a minute, too. Gemma4 doing some good stuff with optimization.

CodexArcanum@lemmy.dbzer0.com · 2 天前

I’ve been using Qwen 2.6 locally to good effect still. What are you on now? 2.7 seems good too but I’m very cagey about funding AI companies and so have had limited access. Same deal with Deepseek4 which seems alright.

LaughingLion [any, any]@hexbear.net · 2 天前

Recently? Gemma4 locally. Just got around to playing with the 12B Quantized Aware Training model today a little. Surprisingly powerful for only 12 billion parameters. What surprises me most is the efficiency stuff that’s packed into it. I’ve only got 8gb of VRAM, yet I can get 64k of context with faster-than-I-can read output. Before that I was using the 26B MOE model. Same context but faster at 20 tps.

We are starting to get to the era where people can run their own models that are more than capable for the average user on pretty standard hardware.

bourgeoisie_burgers [none/use any]@hexbear.net · edit-2 2 天前

I run one one a laptop like 3B slow . Thinking of aiming at getting something to run 12B. Think I can steal the setup at work for rebuilding it locally and use myself. What are your setup for running 12 tb , is it vram that’s the bottleneck ? Can I budget down everything else and just get a good gpu? Also are there any recommendation for true jailbreaked AI , that I can tweak myself without built in censoring or safety guards?

LaughingLion [any, any]@hexbear.net · edit-2 4 小时前

For at-home hobbyist setups VRAM is almost always your main bottleneck, followed by RAM. It’s not even speed so much as capacity. If you really want to budget a system, get a 16gb 5060 first, which is about as good a bang for your buck as you can get for starting with AI at home. Then build around that. DDR clock speed can also be helpful because anything that doesn’t fit into VRAM has to go there. 16GB of VRAM can fit the new Gemma4 12B QAT model easily, which sits at 6.5GB at quant4 giving you tons of room for context. Gemma4 26B which is an MOE model can also fully fit with less room for context (or more if you drop some tensors). Both will give you great performance. I’d recommend 32GB of RAM for that setup unless you plan on shoving a second 5060 in there later for bigger models, then maybe go for 64gb of RAM. It all depends on how much money you are working with. Nvidia still is king as ROCM and Vulcan still aren’t up to par with CUDA for AI applications, unfortunately. AMD or Intel doesn’t matter for your CPU.

Additionally, if you are playing with 16gb of RAM you might as well grab StabilityMatrix and download ComfyUI through it, then link up a CivitAI account and start downloading checkpoints and LORAs. Then you can play with video generation (LTX 2.3) and image generation (Illustrious, Anima, ZimageTurbo, Flux Klein 9B, Ideogram 4.0, whatever take your pick all are good for their own things). With 16GB you can even create your own LORAs with AItoolkit. I’m able to create LORAs with 8gb of VRAM but it is very slow and has some limitations.