Are these USB "AI edge accelerators" useful for running models locally?

aanes_appreciator [he/him, comrade/them]@hexbear.net · 1 month ago

Are these USB "AI edge accelerators" useful for running models locally?

towhee@hexbear.net · edit-2 1 month ago

Roughly speaking, each token requires the computer to fetch & iterate over the entire model in memory. So memory bandwidth is usually the constraint. If you put a 10 GB model on it and the memory bandwidth is 10 GB/s (number made up) it will be one second per token. If you have multiple compute cores, each perhaps with their own 10 GB/s memory bandwidth limit, then you can divide one second by the number of cores to get the time per token.

Idk why you would use a USB stick and not just run it in CPU/RAM on an ordinary computer. Small models are shit anyway though (even against the baseline of large/frontier hosted models being shit).

aanes_appreciator [he/him, comrade/them]@hexbear.net · edit-2 1 month ago

Laziness and the prospect of a cheap hack to avoid having to drag my server out of it’s confines to sort it. Saw an ad a while ago and had the thought ever since!

Oh, and the Coral TPUs are at least m.2, but yeah I can see why usb dongles are just a meme…

towhee@hexbear.net · edit-2 1 month ago

At least try running a local model on your regular computer first to see whether you can deal with how shit they are. The quality of a model is roughly proportional to its size in memory (that’s why the memory chip market is fucked right now). Computation speed only controls how fast it generates tokens.