NVIDIA Forum - Keynote Speech: Technologies for Multimodal Video Understanding and Generation

15 Apr 2026

16:00 - 16:30

ConvergeTech Stage @ Hall C

NVIDIA Forum

Multimodal video understanding and generation technology aims to integrate multi-source information such as text, vision and audio to achieve video content understanding and further enable creation and generation of multimodal derivatives. Based on deep learning and cross-modal representation learning, this technology can accomplish video content understanding and support the generation of high-quality videos including AI-generated content. Relevant research is widely applied in scenarios such as video understanding, video content generation (highlights, secondary creation and derivatives), and 3D media, providing core technical support for next-generation media and intelligent video creation.

Speakers

Shaohui Jiao, Head of 3D Video - Volcenginw

View all Beijing InfoComm China 2026 Summit Agenda