We just produced our first episode of the Diagnostic Quiz Marketing series. Polished YouTube video. Branded animations. Custom thumbnail. Voiceover recorded and processed. Subtitles generated. Metadata, chapters, and YouTube description, complete.
Total time: under one hour.
Here's what we actually did, because the process itself is a demonstration of the thing we're teaching.
The Breakdown
The video is a 5-minute educational piece on why diagnostic quiz funnels convert 3x better than landing pages. Here's how the hour split:
| Phase | Time | What Happened |
|---|---|---|
| Outline + script | ~25 min | AI generated outline, refined it, full 750-word script written |
| Slide data + thumbnail | ~10 min | AI generated structured slide data; thumbnail designed and rendered |
| Recording | ~6 min | Read the script once. One take. |
| Audio processing | ~5 min | Throat clear removed, levels set |
| Rendering + export | ~8 min | Remotion rendered the full video; subtitles generated from transcript |
| Metadata + packaging | ~5 min | YouTube title, description, chapters, SEO tags - all structured |
Two-thirds of the work was research and planning. The actual production - recording, rendering, packaging - was under 20 minutes.
This is the pattern AI unlocks everywhere: expertise becomes the bottleneck in a good way. You're not blocked by production capacity. You're blocked only by the quality of your thinking.
What We Built
This wasn't a vibe-coded, improvised video. It runs on a production system we designed from scratch:
The content layer
- A brand-defined prompt architecture that enforces proper brand messaging
- Outline → script → slides generated sequentially by AI sub-agents
- Script-to-slide mapping with explicit timing targets per section
The rendering layer
- Remotion - a React-based video composition framework that allows for 100% automation
- Custom slide types:
hook,stat,process,insight,cta - Bullet reveals synced to the voiceover using Whisper word-level timestamps
- Branded thumbnail rendered as a still composition at 1280×720
The audio layer
- Voiceover recorded in one take with a decent mic (Shure MV7+)
- Whisper (OpenAI) for word-accurate transcription
- Custom SRT generator converting word timestamps to caption blocks
- ffmpeg for audio cleanup and timing cuts
The metadata layer
- Structured
readme.jsonper episode: SEO fields, production status, YouTube description, chapters, thumbnail concept, cards, end screen youtubeDescriptionFull- a paste-ready formatted description with above-fold CTA, body copy, and chapter timestamps
The result: every episode we produce lives in a folder with everything YouTube needs - video file, thumbnail, SRT subtitles, optimized description, chapter markers.
Why This Works: AI at the Top of the Funnel
The episode we made is about diagnostic quiz funnels. The meta-lesson is: AI doesn't replace your IP - it amplifies it.
In Diagnostic Quiz Marketing, the quiz is trained on your frameworks, your client language, your methodology. The AI reads every answer a respondent gives and generates a personalized diagnosis that sounds like it came from you. Not "you scored 34 - you're in the Medium bucket." A real written diagnosis, unique to that person.
The mechanism is reciprocity (Cialdini, Influence). You give genuine expert value first. The respondent is wired to give back. Your solution becomes the natural next step - not because you pushed, but because they received something real.
That's exactly what happened with this video:
- The outline captured our thinking on why quiz funnels work
- The script translated that thinking into 750 words a viewer would actually learn from
- The slides turned the script into visuals that reinforce the message
- The production system delivered it in an hour
The expertise was ours. The production leverage was AI's.
We're Now Running an Assembly Line
The infrastructure is built. The brand context is locked. The prompt architecture is validated. Every episode from here follows the same pattern:
- Define topic + keywords
- Run
npm run generate→ outline, script, slides, readme generated in parallel - Review outline, approve
- Record script (~6 minutes)
- Process audio, sync timing, render
- Export video, thumbnail, SRT, metadata
The creative constraint isn't production anymore. It's knowing what to teach next.