Improving Controllable Text-To-Video Diffusion Models