Code2Video: A Code-centric Paradigm

for Educational Video Generation

Yanzhe Chen^* Kevin Qinghong Lin^* Mike Zheng Shou^✉

S h o w Lab, National University of Singapore
^* Equal Contribution ^✉ Corresponding Author

Code ArXiv PDF Dataset

Showcase of Code2Video

The videos below are generated via Coding.

Comparison

How Diffusion models perform on these videos?

Veo3

Wan-2.2

Code2Video

Hanoi Problem

Veo3

Wan-2.2

Code2Video

Euler's Formula -> e^πi = -1

Veo3

Wan-2.2

Code2Video

Space-filling Curves

Veo3

Wan-2.2

Code2Video

Large Language Model

Veo3

Wan-2.2

Code2Video

The Determinant

Veo3

Wan-2.2

Code2Video

Pure Fourier Series

Abstract

While recent generative models advance pixel-space video synthesis, they remain limited in producing professional ecu videos, which demand disciplinary knowledge, precise visual structures, and coherent transitions, limiting their applicabilityl in educational scenarios. Intuitively, such requirements are better addressed through the manipulation of a renderable environment, which can be explicitly controlled via logical commands (e.g., code). In this work, we propose Code2Video, a codecentric agent framework for generating educational videos via executable Python code. The framework comprises three collaborative agents: (i) Planner, which structures lecture content into temporally coherent flows and prepares corresponding visual assets; (ii) Coder, which converts structured instructions into executable Python codes while incorporating scope-guided auto-fix to enhance efficiency; and (iii) Critic, which leverages vision-language models (VLM) with anchor visual prompts to refine spatial layout and ensure clarity. To support systemati evaluation, we build MMMC, a benchmark of professionally produced, long-form, disciplinespecific educational videos. We evaluate MMMC across diverse dimensions, including VLM-as-a-Judge aesthetic scores, code efficiency, and particularly, TeachQuiz, a novel end-to-end knowledge transfer measured by a VLM's ability to learn from the generated videos. Our results demonstrate the potential of Code2Video as a scalable, interpretable, and controllable approach for educational video generation.

Method

Illustration of Code2Video. Given a user inquiry, Code2Video aims to render an educational video via Manim code writing: (i) the Planner converts a learning topic into a storyboard and retrieves visual assets; (ii) the Coder performs parallel code synthesis with scope-guided refinement to ensure efficiency and temporal consistency; (iii) the Critic uses anchor visual prompts to iteratively adjust spatial layout and clarity, yielding reproducible, pedagogically structured videos.

Code2Video: A Code-centric Paradigm for Educational Video Generation

Showcase of Code2Video

The videos below are generated via Coding.

Hanoi Problem

Neural Network Structure

History and Definition of π

Space-filling Curves

Exponential Growth and Logistic Growth

Euler's Formula -> e^πi = -1

Pure Fourier Series

The Determinant

Large Language Model

NN Learning and Intuitive Backpropagation

Comparison

How Diffusion models perform on these videos?

Veo3

Wan-2.2

Code2Video

Hanoi Problem

Veo3

Wan-2.2

Code2Video

Euler's Formula -> e^πi = -1

Veo3

Wan-2.2

Code2Video

Space-filling Curves

Veo3

Wan-2.2

Code2Video

Large Language Model

Veo3

Wan-2.2

Code2Video

The Determinant

Veo3

Wan-2.2

Code2Video

Pure Fourier Series

Abstract

Method

Code2Video: A Code-centric Paradigm

for Educational Video Generation