Abstract: The rise of conversational AI and multimodal streaming applications has led to a significant demand for low-latency Text-to-Speech (TTS) systems. This work presents a multilingual ...
Abstract: Recently, audio-visual speech recognition has attracted increasing attention. However, most existing works only focused on scenarios with two speakers. In this work, we study the effect of ...