The local model scene is getting really good, I just wish I’d sprung for more RAM and GPU when I bought my macbook m1.
Even then, I can still run 8-12gb models that are decently good, and I’m looking forward to the new Qwen3 30b to move my tool use local.
Be sure to grab a DWQ quant, like this:
https://huggingface.co/nightmedia/Qwen3-30B-A3B-Instruct-2507-dwq3-mlx
https://huggingface.co/models?sort=modified&search=DWQ
DWQ is like an enhanced MLX that’s much stronger with tight quants around 3-4bpw.
Im really hoping to see this as a new graphics thing. Specialized chips to bring down power consumption and having oses built around it where you have an ai installation interface.
The new Ryzen AI chips and Apples Neural Engine (or whatever is called) have great efficiency for performance and can run strong local models.
Intel also announced they’re going this route.
I know soldered memory isn’t popular, but right now the performance/energy benefits are big — you just have to buy the premium models.
I think NVIDIA will keep doing their massive GPU toaster ovens, Project Digits was supposed to be their low energy competitor and has been underwhelming.