Docket #: S21-205

Txt2Vid: Compressing Talking-head Videos to Text

Researchers at Stanford University, UCSB and MIT have invented a novel video compression pipeline, called Txt2Vid, which substantially reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text transcript can be transmitted and decoded on the recipient's end into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models.

This generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent quality-of-experience based on a subjective evaluation by users (n=242) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. Additionally, the text transmitted has the potential to be translated into different languages or decoded into different voices and faces to create a custom end user experience for teaching and more.

Applications

Video compression with extremely low bit rate for good quality communication in areas of low internet connectivity
Video conferencing with text as the compression format, enabling real-time language translation and/or hybrid communication (i.e., where either typed messages or spoken messages can be transmitted and then reconstructed into voice with the same experience for the end user)
Transmission of pedagogical content for remote learning and online instruction, with the ability to reconstruct compressed content into more engaging formats (e.g., generate a math lesson in which a favorite movie character is the teacher)

Advantages

Audio-video communication compression of two-to-three orders of magnitude reduction with similar quality-of-experience (500-1000x compression)
Functions with bitrates as low as 100bps
Extremely low bandwidth requirements make communication accessible in areas of poor Internet availability
Flexibility to operate as a video player platform, a video streaming platform, or a real-time communication platform

Publications

Tandon, Pulkit, et al. "Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text." arXiv preprint arXiv:2106.14014 (2021).

Patents

Published Application: 20220417291

Innovators

Licensing Contact

Imelda Oropeza

Senior Licensing Manager, Physcial Sciences

Explore Similar Technologies

Download PDF

Similar Technologies

Method and System to Model TCP Throughput, Assess Power Control Measures, and Compensate for Fading and Path Loss for Highly Mobile Broadband Systems

S05-186

Method and System to Model TCP Throughput, Assess Power Control Measures, and Compensate for Fading and Path Loss for Highly Mobile Broadband Systems
Distributed Audio Transcoding For Peer-to-Peer Systems

S11-078

Distributed Audio Transcoding For Peer-to-Peer Systems
Adaptive Playout Scheduling for VoIP and Multimedia over IP

S01-088

Adaptive Playout Scheduling for VoIP and Multimedia over IP

Explore similar technologies by keyword:

Physical Science
- Software
  - Compression (Internet)
  - Photo & Video Compression

Applications

Advantages

Publications

Related Links

Patents

Similar Technologies

Method and System to Model TCP Throughput, Assess Power Control Measures, and Compensate for Fading and Path Loss for Highly Mobile Broadband Systems

Distributed Audio Transcoding For Peer-to-Peer Systems

Adaptive Playout Scheduling for VoIP and Multimedia over IP

Explore similar technologies by keyword: