NVIDIA Forum - Keynote Speech: Technologies for Multimodal Video Understanding and Generation
15 Apr 2026
ConvergeTech Stage @ Hall C
NVIDIA Forum
Multimodal video understanding and generation technology aims to integrate multi-source information such as text, vision and audio to achieve video content understanding and further enable creation and generation of multimodal derivatives. Based on deep learning and cross-modal representation learning, this technology can accomplish video content understanding and support the generation of high-quality videos including AI-generated content. Relevant research is widely applied in scenarios such as video understanding, video content generation (highlights, secondary creation and derivatives), and 3D media, providing core technical support for next-generation media and intelligent video creation.


