Abstract: The rise of conversational AI and multimodal streaming applications has led to a significant demand for low-latency Text-to-Speech (TTS) systems. This work presents a multilingual ...
Abstract: Recently, audio-visual speech recognition has attracted increasing attention. However, most existing works only focused on scenarios with two speakers. In this work, we study the effect of ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results