AssistQ

Affordance-centric Question-driven Task Completion for Egocentric Assistant


About AssistQ


We constructed AssistQ dataset based on that AI assistant should learn from instructional videos and scripts to guide the user step-by-step. The dataset comprises 529 question-answer samples derived from 100 newly filmed first-person videos.

Each question comes with multistep candidate answers and can be completed by inferring from visual details (e.g., buttons’ position) and textural details (e.g., actions like press/turn) form instructional videos.

demo image

Demo Video


Resource


Data

Data

Paper

Arxiv