Fluidity:
An AI-Driven AV Case Study




The Objective:
This case study outlines the end-to-end engineering of an AI-driven Audio-Visual (AV) pipeline, culminating in a premium, photorealistic 5-second motion graphic. The workflow bridges static key art generation in Midjourney V8 Alpha with temporal motion and sonic synthesis utilizing ElevenLabs' Seedance 2.0 and Eleven Music models.
By detailing the rigorous, multi-iteration prompting strategies required to overcome spatial instability in virtual fluid dynamics and frequency conflation in generative Foley, this breakdown highlights the intersection of strict technical precision and conceptual abstraction necessary to achieve commercial-grade cinematic execution.
Phase 1: Static Key Art Generation (Start & End Frames)
-
Midjourney V8 Alpha: Engineered high-fidelity, photorealistic base assets for the AV pipeline.
-
The Strategy: Utilized advanced prompting techniques such as --style raw (bypassing the model's default AI aesthetic to achieve true, unfiltered photographic realism) and --sref (Style Reference) to guarantee strict brand and color consistency across the start and end keyframes, establishing the visual identity before the motion phase.
.jpeg)
.jpeg)
Phase 2: Animated Key Art Generation
Iteration 01
The Approach: Executed the temporal transition between the static base assets for the AV pipeline. Utilized advanced cinematographic prompting techniques to precisely control virtual camera behavior and depth of field, maintaining the strict visual identity established in the keyframes.
Prompt: "Slow, cinematic fluid dynamics. The peach-colored liquid morphs smoothly, glossy and highly reflective, cinematic studio lighting. Shot on 100mm macro lens, slow subtle dolly push forward."

Iteration 02
The Problem: Shifted the objective to introduce the metallic nozzle, which was absent in Iteration 01. While the expanded prompt successfully rendered the material textures, the tight 100mm lens and forward dolly push caused a spatial error. This resulted in the metal organically morphing out of the liquid instead of appearing as a distinct, solid object.
Prompt: "Cinematic macro shot on 100mm lens. Extreme slow-motion. A complex, polished metallic robotic nozzle actively sprays and expels a massive, glossy, peach-colored viscous liquid. High-pressure air blows the thick paint from the nozzle, creating a delicate, atomized paint mist and smaller separated droplets around the metallic hardware. The heavy peach liquid morphs and swirls dynamically away from the metal tip under pressure. Slow, subtle dolly push forward, tracking the interaction between the metallic hardware and the flying liquid. Dramatic, shifting cinematic studio lighting, highly reflective surfaces, photorealistic"

Iteration 03
The Problem: Refined the focal length to 50mm and restructured the prompt to engineer more accurate, mechanical motion for the metallic nozzle. The objective was to establish the hardware as a rigid, stable source on a "fixed mechanical hinge." However, without strict spatial anchoring to the frame's edge, the model struggled to ground the object, resulting in an unanchored, floating trajectory.
Prompt: "Medium-macro shot on a 50mm lens. A sturdy, polished metallic robotic nozzle sits on a fixed mechanical hinge, slowly panning from left to right. The nozzle acts as the source, spraying a heavy, glossy, peach-colored viscous liquid outward. The heavy fluid swirls dynamically away from the metal under pressure, never engulfing the hardware. Delicate atomized paint mist surrounds the spray. Extreme slow-motion, cinematic studio lighting, photorealistic."

Iteration 04
The Solution: Achieved a highly successful final generation by resolving the spatial instability of the previous iteration. The prompt was restructured to dictate a logical camera movement—a smooth zoom-out reveal—while explicitly commanding the AI to keep the solid metallic nozzle "anchored at the bottom of the frame." This strict spatial grounding eliminated the random floating artifacts entirely, resulting in a perfectly stable, dynamic, and photorealistic fluid simulation that seamlessly bridged the start and end keyframes.
Prompt: "Extreme slow-motion. The camera starts on a tight, close-up shot of swirling, glossy, peach-colored liquid. The camera smoothly zooms out to reveal a solid, rigid metallic nozzle anchored at the bottom of the frame. The heavy, viscous paint is naturally and dynamically spraying upwards and outwards from this fixed metal nozzle. Delicate, high-pressure paint droplets fly into the air. The metallic hardware stays perfectly stable at the bottom, acting as the source of the flow. Cinematic studio lighting, photorealistic."

Phase 3: Audio Generation
Iteration 01
The Problem: Attempted to synthesize a hybrid 5-second soundscape combining fluid dynamics and mechanical textures. However, compressing heavy bass, a metallic hum, and rhythmic splashing into this short window caused the model to conflate the frequencies. This inadvertently created a low-end loop that sounded like a revving motorcycle engine rather than a cinematic liquid simulation.
Prompt: "Dark ambient cinematic synth drone featuring thick fluid splashing, futuristic metallic hum, deep bass cinematic whoosh. Duration: 5 seconds. Exactly timed to end with a sharp cinematic impact."

Iteration 02
The Problem: Pivoted the prompt to focus strictly on high-fidelity wet Foley, successfully eliminating the mechanical engine artifacts. However, emphasizing action verbs like "squirt" and "splatter" caused the model to generate abrupt, unrefined audio. While texturally realistic, the result lacked the continuous, elegant flow required for a high-end commercial sequence.
Prompt: "High-pressure squirt of thick, viscous paint. Heavy, wet, gooey liquid splattering and swirling in slow-motion. High-fidelity wet Foley, glossy fluid dynamics, continuous smooth liquid pouring with sharp droplet splashes at the end. Duration: 5 seconds."

Iteration 03
The Problem: Shifted the strategy to incorporate an elegant musical bed to match the premium visual aesthetic. However, introducing strong musical keywords like "classical piano" and "waltz" caused the model to categorize the prompt entirely as music generation. This keyword prioritization completely suppressed the Foley elements, resulting in a clean musical track that lacked any of the requested liquid fluid dynamics.
Prompt: "A delicate, rising classical piano and string waltz. Accompanied by the continuous, unbroken sound of thick, pressurized paint flowing smoothly. Heavy, wet, and rich liquid sound design, an even and steady fluid stream. Duration: 5 seconds."

Iteration 04
The Problem: Restructured the prompt hierarchy to explicitly prioritize Foley over musical elements, pushing the classical piano to the background. To achieve a more elegant flow, modifiers like "evenly and smoothly" were introduced. However, commanding the model to generate a perfectly smooth, continuous fluid pour stripped away the necessary textural weight. This resulted in a dense wall of white noise, causing the output to sound indistinguishable from static or a heavy rainstorm rather than viscous paint.
Prompt: "High-fidelity cinematic Foley. The loud, continuous, wet sound of thick, viscous paint pouring evenly and smoothly. Heavy liquid fluid dynamics, steady pressurized stream. In the far background, a very faint, muffled classical piano waltz plays softly underneath the loud liquid pouring sound. Duration: 5 seconds."

Iteration 05
The Problem: Pivoted the prompt strategy to eliminate the "white noise" rain effect from the previous iteration by reintroducing weight and texture. Keywords such as "gooey," "squishing," and "sloshing" were employed to force a thicker, heavier fluid dynamic. While this successfully restored the liquid texture, the aggressive Foley descriptors caused the model to overcompensate, generating an unappealing, biological squirting sound. The audio was highly textured but completely lacked the continuous, elegant flow required for a premium commercial aesthetic.
Prompt: "Close-up ASMR of thick, heavy, gooey paint pouring slowly. Deep, viscous liquid squishing and sloshing. Heavy, wet fluid dynamics, thick globs of slime falling. Cinematic, high-fidelity wet Foley. Duration: 5 seconds."

Iteration 06
The Solution: Executed a radical simplification of the prompt architecture. Abandoned complex acoustic engineering and physical Foley descriptors in favor of a purely conceptual, aesthetic approach. By prompting the model with color association ("pink") and abstract resonance ("elegant"), the engine stopped over-processing literal fluid dynamics. This minimalist strategy successfully bypassed the mechanical and biological artifacts of all previous iterations, allowing the AI to organically synthesize the exact premium, smooth commercial soundscape required. It proved that in audio latent space, conceptual prompting can sometimes outperform strict technical direction.
Prompt: "5 seconds sound of a continuos fluid imagine its pink and elegant"

Phase 3:
Final Animated Key Art with Audio
