微软展示了深度伪造AI，过于逼真不宜发布

Microsoft this week demoed VASA–1, a framework for creating videos of people talking from a still image, audio sample, and text script, and claims – rightly – it's too dangerous to be released to the public.

本周，微软展示了VASA-1框架，这是一个利用静态图像、音频样本和文字剧本生成说话人视频的技术。微软声称这项技术极具危险性，不应公开发布。

These AI-generated videos, in which people can be convincingly animated to speak scripted words in a cloned voice, are just the sort of thing the US Federal Trade Commission warned about last month, after previously proposing a rule to prevent AI technology from being used for impersonation fraud.

这种人工智能生成的视频能够令人信服地动画化地让人物说出剧本中的话，并模仿其声音，正是美国联邦贸易委员会上个月发出警告的事项，此前该委员会已提议制定规则，以防止人工智能技术用于冒充诈骗。

Microsoft's team acknowledge as much in their announcement, which explains the technology is not being released due to ethical considerations. They insist that they're presenting research for generating virtual interactive characters and not for impersonating anyone. As such, there's no product or API planned.

微软团队在宣布中承认了这一点，他们解释说由于伦理考虑，不会发布这项技术。他们坚称，他们展示的研究旨在生成虚拟互动角色，而非用于冒充任何人。因此，他们没有计划推出任何产品或API。

"Our research focuses on generating visual affective skills for virtual AI avatars, aiming for positive applications," the Redmond boffins state. "It is not intended to create content that is used to mislead or deceive.

雷德蒙德的研究人员表示：“我们的研究重点是为虚拟人工智能化身生成视觉情感技能，旨在实现积极的应用。它无意创建用于误导或欺骗的内容。”

"However, like other related content generation techniques, it could still potentially be misused for impersonating humans. We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection."

“然而，像其他相关内容生成技术一样，这种技术仍有可能被误用于冒充人类。我们反对任何创建误导性或有害真实人物内容的行为，并对使用我们的技术推进伪造检测表示兴趣。”

Kevin Surace, Chair of Token, a biometric authentication biz, and frequent speaker on generative AI, said that while there have been prior technology demonstrations of faces animated from a still frame and cloned voice file, Microsoft's demonstration reflects the state of the art.

生物识别认证公司 Token 的董事长 Kevin Surace 经常就生成式人工智能发表演讲，他表示，虽然之前已经有过通过静止帧和克隆语音文件制作人脸动画的技术演示，但微软的演示反映了最先进的技术。

"The implications for personalizing emails and other business mass communication is fabulous," he opined. "Even animating older pictures as well. To some extent this is just fun and to another it has solid business applications we will all use in the coming months and years."

“对个性化电子邮件和其他商业大众传播的影响是巨大的，”他认为。“甚至还可以对旧图片进行动画处理。在某种程度上，这很有趣，而在另一方面，它具有可靠的业务应用程序，我们将在未来几个月和几年内使用。”

Nonetheless, Microsoft's researchers suggest that being able to create realistic looking people and put words in their mouths has positive uses.

尽管如此，微软的研究人员建议，能够创造出逼真的人物并让他们说出特定话语，这种技术有其积极的用途。

"Such technology holds the promise of enriching digital communication, increasing accessibility for those with communicative impairments, transforming education, methods with interactive AI tutoring, and providing therapeutic support and social interaction in healthcare," they propose in a research paper that does not contain the words "porn" or "misinformation."

他们在一份研究论文中提出：“这种技术有望丰富数字通信、增加沟通障碍者的可达性、改变教育方法，通过互动AI辅导和在医疗保健中提供治疗支持及社交互动。”这份论文中并未提到“色情”或“误信息”这样的词汇。

While it's arguable AI generated video is not quite the same as a deepfake, the latter defined by digital manipulation as opposed to a generative method, the distinction becomes immaterial when a convincing fake can be conjured without cut-and-paste grafting.

虽然有争议的人工智能生成的视频与深度伪造视频并不完全相同，后者是通过数字操作而不是生成方法来定义的，但当无需剪切粘贴嫁接就可以变出令人信服的伪造视频时，这种区别就变得无关紧要了。

Asked what he makes of the fact that Microsoft is not releasing this technology to the public for fear of misuse, Surace expressed doubt about the viability of restrictions.

当被问及微软因担心滥用而未向公众发布这项技术的事实时，Surace 对限制的可行性表示怀疑。

"Microsoft and others have held back for now until they work out the privacy and usage issues," he said. "How will anyone regulate who uses this for the right reasons?"

他说：“微软和其他公司目前暂时搁置，直到解决隐私和使用问题。” “谁将如何监管谁出于正确的原因使用它？”

Surace added that there are already open source models that are similarly sophisticated, pointing to EMO. "One can pull the source code from GitHub and build a service around it that arguably would rival Microsoft's output," he observed. "Because of the open source nature of the space, regulating it will be impossible in any case."

Surace 补充说，已经有类似复杂的开源模型，并指出了 EMO。“人们可以从 GitHub 获取源代码，并围绕它构建一项服务，这可以说可以与微软的产品相媲美，”他观察到。“由于该领域的开源性质，无论如何对其进行监管都是不可能的。”

That said, countries around the world are trying to regulate AI-fabricated people. Canada, China, and the UK, among other nations, all have regulations that can be applied to deepfakes, some of which fulfill broader political goals. Britain just this week made it illegal to create a sexually explicit deepfake image without consent. The sharing of such images was already disallowed under the UK's Online Safety Act of 2023.

也就是说，世界各国都在努力监管人工智能制造的人。加拿大、中国和英国等国家都制定了适用于深度造假的法规，其中一些法规实现了更广泛的政治目标。英国本周刚刚将未经同意创建露骨的深度伪造图像定为非法行为。英国 2023 年在线安全法案已禁止共享此类图像。

In January, a bipartisan group of US lawmakers introduced the Disrupt Explicit Forged Images and Non-Consensual Edits Act of 2024 (DEFIANCE Act), a bill that creates a way for victims of non-consensual deepfake images to file a civil claim in court.

今年 1 月，一个由美国立法者组成的两党团体提出了《2024 年破坏显式伪造图像和未经同意的编辑法案》（《反抗法案》），该法案为未经同意的深度伪造图像的受害者向法庭提出民事索赔创造了一种方式。

And on Tuesday, April 16, the US Senate Committee on the Judiciary, Subcommittee on Privacy, Technology, and the Law held a hearing titled "Oversight of AI: Election Deepfakes."

4 月 16 日星期二，美国参议院司法委员会、隐私、技术和法律小组委员会举行了一场题为“人工智能的监督：选举 Deepfakes”的听证会。

In prepared remarks, Rijul Gupta, CEO of DeepMedia, a deepfake detection biz, said:

Deepfake 检测公司 DeepMedia 的首席执行官 Rijul Gupta 在准备好的讲话中表示：

The most alarming aspect of deepfakes is their ability to provide bad actors with plausible deniability, allowing them to dismiss genuine content as fake. This erosion of public trust strikes at the very core of our social fabric and the foundations of our democracy. The human brain, wired to believe what it sees and hears, is particularly vulnerable to the deception of deepfakes. As these technologies become increasingly sophisticated, they threaten to undermine the shared sense of reality that underpins our society, creating a climate of uncertainty and skepticism where citizens are left questioning the veracity of every piece of information they encounter.

深度造假最令人担忧的方面是它们能够为不良行为者提供看似合理的推诿，使他们能够将真实内容视为虚假内容。这种对公众信任的侵蚀打击了我们社会结构的核心和民主的基础。人脑天生就会相信自己所看到和听到的东西，因此特别容易受到深度造假的欺骗。随着这些技术变得越来越复杂，它们有可能破坏支撑我们社会的共同现实感，造成一种不确定和怀疑的气氛，让公民质疑他们所遇到的每条信息的真实性。

But think of the marketing applications.

但想想营销应用程序。