AssistQ

About AssistQ

We constructed AssistQ dataset based on that AI assistant should learn from instructional videos and scripts to guide the user step-by-step. The dataset comprises 529 question-answer samples derived from 100 newly filmed first-person videos.

Each question comes with multistep candidate answers and can be completed by inferring from visual details (e.g., buttons’ position) and textural details (e.g., actions like press/turn) form instructional videos.

AssistQ

About AssistQ

Demo Video

Resource

Data

Data

Paper

Arxiv

Code

Github code