Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

Chris Remington · 7 months ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

@flora_explora@beehaw.org · 7 months ago

Wouldn’t you then have to run the AI locally on a machine (which probably draws a lot of power and memory) or use it via cloud (which depends on bandwidth just like a video call). I don’t really see where this technology could actually be useful. Sure, if it is only a minor computation just like if you take a picture/video with any modern smartphone. But computing an entire face and voice seems much more complicated than that and not really feasible for the usual home device.

@Markaos · 7 months ago

Yeah, it’s not practical right now, but in 10 years? Who knows, we might finally have some built-in AI accelerator capable of running big neural networks on consumer CPUs by then (we do have AI accelerators in a large chunk of current CPUs, but they’re not up to the task yet). The system memory should also go up now that memory-hungry AI is inching closer to mainstream use.

Sure, Internet bandwidth will also increase, meaning this compression will be less important, but on the other hand, it’s not like we’ve stopped improving video codecs after h.264 because it was good enough - there are better codecs now even though we have the resources to handle bigger h.264 videos.

The technology doesn’t have to be useful right now - for example, neural networks capable of learning have been studied since the 1940s, even though there would be no way to run them for many decades, and it would take even longer to run them in a useful capacity. But now that we have the technology to do so, they enjoy rapid progress building on top of that original foundation.

@flora_explora@beehaw.org · 7 months ago

Fair point, I agree.

@barsoap@lemm.ee · edit-2 7 months ago

A model that can only generate frontal to profile views of heads would be quite small, I can totally see that kind of thing running on current consumer GPUs, in real time. Near real time is already possible with SDXL-based models with some speedup tricks applied as long as you have a mid-range gaming GPU and those models are significantly more general. It’s not like the model would need to generate spaghetti and sports cars alongside with the head.