Dude, it doesn’t know what it’s looking at. It isn’t intelligent. It’s just a prediction algorithm called LLMs. It doesn’t matter if it’s predicting text or pixels. It’s all LLMs.
LLMs aren’t generating the images, when “using an LLM for image generation” what’s actually happening is the LLM talking to an image generation model and then giving you the image.
Ironically there’s a hint of truth in it though because for text-to-image generation the model does need to map words into a vector space to understand the prompt, which is also what LLMs do. (And I don’t know enough to say whether the image generation offered through LLMs just has the LLM provide the vectors directly to the image gen model rather than providing a prompt text).
You could also consider the whole thing as one entity in which case it’s just more generalized generative AI that contains both an LLM and an image gen model.
I think the term you’re looking for is “generative AI”
Nope. LLMs are still what’s used for image generation. They aren’t AI though, so no.
Which part of the image is language?
Dude, it doesn’t know what it’s looking at. It isn’t intelligent. It’s just a prediction algorithm called LLMs. It doesn’t matter if it’s predicting text or pixels. It’s all LLMs.
https://botpenguin.com/blogs/comparing-the-best-llms-for-image-generation
What do you think LLM stands for?
Large Language Image
You can generate images without ever using any text. By uploading and combining images to create new things.
No LLM will be used in that context.
Holy confidently incorrect
LLMs aren’t generating the images, when “using an LLM for image generation” what’s actually happening is the LLM talking to an image generation model and then giving you the image.
Ironically there’s a hint of truth in it though because for text-to-image generation the model does need to map words into a vector space to understand the prompt, which is also what LLMs do. (And I don’t know enough to say whether the image generation offered through LLMs just has the LLM provide the vectors directly to the image gen model rather than providing a prompt text).
You could also consider the whole thing as one entity in which case it’s just more generalized generative AI that contains both an LLM and an image gen model.