AssistQ
Affordance-centric Question-driven Task Completion for Egocentric Assistant
About AssistQ
We constructed AssistQ dataset based on that AI assistant should learn from instructional videos and scripts to guide the user step-by-step. The dataset comprises 529 question-answer samples derived from 100 newly filmed first-person videos.
Each question comes with multistep candidate answers and can be completed by inferring from visual details (e.g., buttons’ position) and textural details (e.g., actions like press/turn) form instructional videos.
