Selfhosted LLM (ChatGPT)

@autopilot@lemmy.world · edit-2 11 months ago

Selfhosted LLM (ChatGPT)

@pe1uca@lemmy.pe1uca.dev · 11 months ago

How do you know how much ram the model needs?

redcalcium · edit-2 11 months ago

The model creator usually mentioned it in the readme:

You will need at least 16GB of memory to swiftly run inference with Falcon-7B.

Usually the models support CPU inference. Tremendously slow but works in a pinch.

@CeeBee@lemmy.world · 11 months ago

There’s an average correlation between the models parameters and the execution precision being used (eg. 7b parameters at f16 precision). And then using optimized execution for 8 bit or even 4 bit will reduce memory usage and increase execution time.

It’s entirely dependent on the model, the framework, the hardware (CPU vs GPU).

Generally there should be some indication somewhere in the model’s repo that states what you need.