top of page

Fluidity:
An AI-Driven AV Case Study

gemini.png
midjourney.png
seedance.png
11 labs 1.png

The Objective:

 

This case study outlines the end-to-end engineering of an AI-driven Audio-Visual (AV) pipeline, culminating in a premium, photorealistic 5-second motion graphic. The workflow bridges static key art generation in Midjourney V8 Alpha with temporal motion and sonic synthesis utilizing ElevenLabs' Seedance 2.0 and Eleven Music models.

 

By detailing the rigorous, multi-iteration prompting strategies required to overcome spatial instability in virtual fluid dynamics and frequency conflation in generative Foley, this breakdown highlights the intersection of strict technical precision and conceptual abstraction necessary to achieve commercial-grade cinematic execution.

Phase 1: Static Key Art Generation (Start & End Frames)

  • Midjourney V8 Alpha: Engineered high-fidelity, photorealistic base assets for the AV pipeline.

  • The Strategy: Utilized advanced prompting techniques such as --style raw (bypassing the model's default AI aesthetic to achieve true, unfiltered photographic realism) and --sref (Style Reference) to guarantee strict brand and color consistency across the start and end keyframes, establishing the visual identity before the motion phase.

WhatsApp Image 2026-04-01 at 14.55.34 (4).jpeg
WhatsApp Image 2026-04-01 at 14.55.34 (5).jpeg

Phase 2: Animated Key Art Generation

Iteration 01

The Approach: Executed the temporal transition between the static base assets for the AV pipeline. Utilized advanced cinematographic prompting techniques to precisely control virtual camera behavior and depth of field, maintaining the strict visual identity established in the keyframes.

 

Prompt: "Slow, cinematic fluid dynamics. The peach-colored liquid morphs smoothly, glossy and highly reflective, cinematic studio lighting. Shot on 100mm macro lens, slow subtle dolly push forward."

image 585.png

Iteration 02

The Problem: Shifted the objective to introduce the metallic nozzle, which was absent in Iteration 01. While the expanded prompt successfully rendered the material textures, the tight 100mm lens and forward dolly push caused a spatial error. This resulted in the metal organically morphing out of the liquid instead of appearing as a distinct, solid object.

 

Prompt: "Cinematic macro shot on 100mm lens. Extreme slow-motion. A complex, polished metallic robotic nozzle actively sprays and expels a massive, glossy, peach-colored viscous liquid. High-pressure air blows the thick paint from the nozzle, creating a delicate, atomized paint mist and smaller separated droplets around the metallic hardware. The heavy peach liquid morphs and swirls dynamically away from the metal tip under pressure. Slow, subtle dolly push forward, tracking the interaction between the metallic hardware and the flying liquid. Dramatic, shifting cinematic studio lighting, highly reflective surfaces, photorealistic"

image 587.png

Iteration 03

The Problem: Refined the focal length to 50mm and restructured the prompt to engineer more accurate, mechanical motion for the metallic nozzle. The objective was to establish the hardware as a rigid, stable source on a "fixed mechanical hinge." However, without strict spatial anchoring to the frame's edge, the model struggled to ground the object, resulting in an unanchored, floating trajectory.

 

Prompt: "Medium-macro shot on a 50mm lens. A sturdy, polished metallic robotic nozzle sits on a fixed mechanical hinge, slowly panning from left to right. The nozzle acts as the source, spraying a heavy, glossy, peach-colored viscous liquid outward. The heavy fluid swirls dynamically away from the metal under pressure, never engulfing the hardware. Delicate atomized paint mist surrounds the spray. Extreme slow-motion, cinematic studio lighting, photorealistic."

image 588.png

Iteration 04

The Solution: Achieved a highly successful final generation by resolving the spatial instability of the previous iteration. The prompt was restructured to dictate a logical camera movement—a smooth zoom-out reveal—while explicitly commanding the AI to keep the solid metallic nozzle "anchored at the bottom of the frame." This strict spatial grounding eliminated the random floating artifacts entirely, resulting in a perfectly stable, dynamic, and photorealistic fluid simulation that seamlessly bridged the start and end keyframes.

 

Prompt: "Extreme slow-motion. The camera starts on a tight, close-up shot of swirling, glossy, peach-colored liquid. The camera smoothly zooms out to reveal a solid, rigid metallic nozzle anchored at the bottom of the frame. The heavy, viscous paint is naturally and dynamically spraying upwards and outwards from this fixed metal nozzle. Delicate, high-pressure paint droplets fly into the air. The metallic hardware stays perfectly stable at the bottom, acting as the source of the flow. Cinematic studio lighting, photorealistic."

image 589.png

Phase 3: Audio Generation

Iteration 01

The Problem: Attempted to synthesize a hybrid 5-second soundscape combining fluid dynamics and mechanical textures. However, compressing heavy bass, a metallic hum, and rhythmic splashing into this short window caused the model to conflate the frequencies. This inadvertently created a low-end loop that sounded like a revving motorcycle engine rather than a cinematic liquid simulation.

 

Prompt: "Dark ambient cinematic synth drone featuring thick fluid splashing, futuristic metallic hum, deep bass cinematic whoosh. Duration: 5 seconds. Exactly timed to end with a sharp cinematic impact."

01.png
Iteration01
00:00 / 00:05

Iteration 02

The Problem: Pivoted the prompt to focus strictly on high-fidelity wet Foley, successfully eliminating the mechanical engine artifacts. However, emphasizing action verbs like "squirt" and "splatter" caused the model to generate abrupt, unrefined audio. While texturally realistic, the result lacked the continuous, elegant flow required for a high-end commercial sequence.

 

Prompt: "High-pressure squirt of thick, viscous paint. Heavy, wet, gooey liquid splattering and swirling in slow-motion. High-fidelity wet Foley, glossy fluid dynamics, continuous smooth liquid pouring with sharp droplet splashes at the end. Duration: 5 seconds."

02.png
ITERATION02
00:00 / 00:05

Iteration 03

The Problem: Shifted the strategy to incorporate an elegant musical bed to match the premium visual aesthetic. However, introducing strong musical keywords like "classical piano" and "waltz" caused the model to categorize the prompt entirely as music generation. This keyword prioritization completely suppressed the Foley elements, resulting in a clean musical track that lacked any of the requested liquid fluid dynamics.

 

Prompt: "A delicate, rising classical piano and string waltz. Accompanied by the continuous, unbroken sound of thick, pressurized paint flowing smoothly. Heavy, wet, and rich liquid sound design, an even and steady fluid stream. Duration: 5 seconds."

03.png
ITERATION03
00:00 / 00:05

Iteration 04

The Problem: Restructured the prompt hierarchy to explicitly prioritize Foley over musical elements, pushing the classical piano to the background. To achieve a more elegant flow, modifiers like "evenly and smoothly" were introduced. However, commanding the model to generate a perfectly smooth, continuous fluid pour stripped away the necessary textural weight. This resulted in a dense wall of white noise, causing the output to sound indistinguishable from static or a heavy rainstorm rather than viscous paint.

 

Prompt: "High-fidelity cinematic Foley. The loud, continuous, wet sound of thick, viscous paint pouring evenly and smoothly. Heavy liquid fluid dynamics, steady pressurized stream. In the far background, a very faint, muffled classical piano waltz plays softly underneath the loud liquid pouring sound. Duration: 5 seconds."

04.png
ITERATION04
00:00 / 00:05

Iteration 05

The Problem: Pivoted the prompt strategy to eliminate the "white noise" rain effect from the previous iteration by reintroducing weight and texture. Keywords such as "gooey," "squishing," and "sloshing" were employed to force a thicker, heavier fluid dynamic. While this successfully restored the liquid texture, the aggressive Foley descriptors caused the model to overcompensate, generating an unappealing, biological squirting sound. The audio was highly textured but completely lacked the continuous, elegant flow required for a premium commercial aesthetic.

 

Prompt: "Close-up ASMR of thick, heavy, gooey paint pouring slowly. Deep, viscous liquid squishing and sloshing. Heavy, wet fluid dynamics, thick globs of slime falling. Cinematic, high-fidelity wet Foley. Duration: 5 seconds."

05.png
ITERATION05
00:00 / 00:05

Iteration 06

The Solution: Executed a radical simplification of the prompt architecture. Abandoned complex acoustic engineering and physical Foley descriptors in favor of a purely conceptual, aesthetic approach. By prompting the model with color association ("pink") and abstract resonance ("elegant"), the engine stopped over-processing literal fluid dynamics. This minimalist strategy successfully bypassed the mechanical and biological artifacts of all previous iterations, allowing the AI to organically synthesize the exact premium, smooth commercial soundscape required. It proved that in audio latent space, conceptual prompting can sometimes outperform strict technical direction.

 

Prompt: "5 seconds sound of a continuos fluid imagine its pink and elegant"

06A.png
ITERATION0606
00:00 / 00:05

Phase 3:
Final Animated Key Art with Audio

image 590.png
bottom of page