Abstract: Video Question Answering (VideoQA) demands complex reasoning about multi-granular information, requiring both fine-grained visual details and global event understanding from videos. While ...