You must log in or # to comment.
Anything that allows us to run our own software instead of living on “the cloud” is a good thing in my book.
Amen
You can already run LLMs on consumer hardware. They’re slow, but they definitely work. Mine is midrange from about 5 years ago and I can run a small one at a pretty reasonable speed or a medium (but “smarter”) one quite slowly at roughly 2 seconds per token.
A 3B Llama 2 model (or a 7b) can be run on literally anything.
I’d assume at some point the problem becomes more the memory bandwidth.