Crafting an AI Generated Music Video (Part 1)
Sep 2025
Disclaimer: This is a work in progress. Many scenes, sequences, and details will change as the video evolves.
Building the Video, Scene by Scene
In the last year or so we have seen some major leaps in AI video generation. Given the most recent up-and-coming AI video advancements, the following is a paid project that I am currently working on.
At the time of writing this, Sora 2 is just starting to roll out to the general public. Knowing that video capabilities are about to take a big leap forward, I’ve been organizing my prompts and images in preparation for recreating the video as AI video technology improves. I don’t have access yet, but I’m looking forward to incorporating Sora 2 into the process once it becomes available.
So far, I’ve primarily been using Midjourney and Veo to create the visuals for the video. The challenge? A music video isn’t just one environment - it’s dozens of settings, moods, and transitions stitched together into a flowing narrative. Conceptualizing and generating those scenes from scratch has been a much bigger lift than I first expected. Some sequences come together in a few hours; others take days of trial, error, and prompt refinement.
Lessons Learned So Far
Color consistency is good, but still harder than it looks. Midjourney’s style weights, palettes, and image references help anchor a visual tone, but results can still drift. Matching one scene to the next requires persistence and a few creative hacks. I’m hopeful that Sora 2 will make maintaining color consistency a bit easier.
Not all scenes are equal. Action shots with fast movement remain the toughest. AI still struggles to keep anatomy, motion blur, and perspective natural when things get chaotic. By contrast, slower shots (especially in slow motion) are where these tools really shine. When (and if) a model reliably delivers slow-motion footage, the result can feel remarkably cinematic.
Workflow is everything. I’ve found myself storyboarding and sequencing in much greater detail than I would on a live-action shoot. Knowing the intended “look” and “camera movement” upfront saves an enormous amount of time and wasted generations.
What’s Next
This project is still a work in progress, and every new generation teaches me something about what AI video can and can’t yet do. With AI Video platforms evolving so quickly, the possibilities are only expanding
More to come…