Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

@pelespirit@sh.itjust.works · 7 months ago

@simple@lemm.ee · 7 months ago

That lip sync is scary good. It’s still a little off, the teeth are weirdly stretchy, but nobody would notice it’s a deepfake on first glance.

Seems very similar to Nvidia’s idea of only having a moving photo for video calls to reduce bandwidth needed. Very nice.

Aatube · edit-2 7 months ago

We’d need better optimization and more powerful processing on ye average laputopu for that to happen.