Projects

Controllable Autoregressive Text-to-Motion Generation πŸ”—

Maria Pilligua, Pau Amargant, Miquel Lopez, Nahush Rajesh Kolhe

EPFL CS-503 (in progress)

A real-time text-to-motion model where the user can steer generation mid-sequence: at any frame, specify a full-body pose, hand or foot position, or a path the character must follow. Built on T2M-GPT with a VQ-VAE motion codebook and an 18-layer causal Transformer, trained on the BONES dataset (64k motion-capture sequences). Two constraint-injection architectures explored: a ControlNet-style cross-attention adapter and a prefix-token approach with custom causal masking. Aimed at the kind of online controllability that animation and interactive systems need.