less than 1 minute read

We have two papers accepted by CVPR 2026, including one Oral presentation! The oral paper, ARGUS, focuses on defending multimodal large language models against multimodal indirect prompt injection attacks. The other paper, SciEducator, introduces a Deming-Cycle multi-agent system for scientific video understanding and education. Detailed information about each publication is provided below.


Paper 1 (Oral): ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

ARGUS addresses multimodal indirect prompt injection (IPI) attacks, where malicious instructions embedded in images, videos, or audio can hijack multimodal large language models. Inspired by activation steering, ARGUS searches for a defense direction within the safety subspace and combines adaptive strength steering, lightweight injection detection, and post-filtering to improve robustness while preserving model utility.

Paper: [To be added]

Code: [To be added]


Paper 2: SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System

SciEducator is an iterative self-evolving multi-agent system for scientific video comprehension and education. Rooted in the Deming Cycle, it supports scientific video understanding and generates multimodal educational materials, including textual instructions, visual guides, audio narrations, and interactive references.

Paper: [To be added]

Code: [To be added]


Updated: