How It's Made | Cat and Daniel Storytelling & Media Lab

Story Architecture Human

This is the one layer that's entirely human. The story bible, character arcs, and 75-episode roadmap are written by the author. Every character motivation, plot twist, thematic thread, and emotional beat originates from a human mind. AI doesn't decide what the story is about. It's told what to build.

Script & Prose AI-Generated

The narrative text is AI-generated. Each scene is drafted by large language models (Gemini for planning, Claude for prose) working from detailed human directives: who's present, what must happen, the emotional trajectory, continuity requirements. The AI writes the words. The human defines exactly what those words need to accomplish.

Scene Imagery AI-Generated

Every image you see is AI-generated. Finished scripts are broken into visual beats by StoryArt, which produces detailed image prompts tailored for Flux and zImage models running locally on an RTX 4090 through SwarmUI and ComfyUI. Character consistency, lighting, and composition are all directed through prompt engineering, not manual illustration.

Voice & Audio AI-Generated

Narration and character voices are AI-synthesized. Sound effects and ambient music are produced using AI tools. There are no voice actors or session musicians on this project (yet). The entire audio layer is generated, selected, and assembled by one person.

Review & Quality Control Human

Every scene, every image, every audio take goes through human review. The author approves, revises, or regenerates. Nothing ships without explicit sign-off. This is the control point where direction meets execution. The difference between "AI-generated" and "AI-assisted" is the quality gate.

Video Assembly Human

Final video production happens in DaVinci Resolve. AI-generated images, synthesized voice, and generated audio are composited, timed, and edited by hand into finished YouTube episodes. This is traditional post-production, just with AI-produced raw materials instead of filmed footage.

Continuity Tracking AI + Human

A custom system tracks narrative facts, character knowledge, and arc progressions across all 75 episodes. AI extracts facts automatically; humans verify and approve them. When a character learns something in Episode 3, the system ensures they still know it in Episode 47.

The Custom Tools

Two purpose-built applications sit at the center of the production pipeline. Both were built from scratch specifically for this project.

StoryTeller

The story aggregation engine

StoryTeller is a custom Python/FastAPI application that acts as the single source of truth for the entire story. It aggregates every piece needed to produce an episode:

Story Bible & Roadmaps: The full 75-episode plan, broken down to scene-level directives, lives in a PostgreSQL database. Characters, locations, plot arcs, and thematic threads are all structured and queryable.
Scene Generation Pipeline: Scene-by-scene AI generation with detailed directives. Each scene is drafted by AI, reviewed by the author, and committed to the database only after approval.
Living Continuity System: Tracks narrative facts, character knowledge states, and arc progressions across every episode. When a character learns something in Episode 3, the system ensures they still know it in Episode 47. AI extracts facts; humans verify them.
Multi-AI Orchestration: Gemini for planning and long-context continuity, Claude for prose generation, plus specialized providers for different creative tasks. The right model for the right job.
Export & Ingest: Episodes can be exported for human editing, then ingested back with full continuity tracking preserved. The database is always the golden source.

StoryArt

From script to screen

StoryArt is a companion TypeScript/Vite application that bridges the gap between written narrative and visual production. It takes a finished episode script and turns it into everything needed for image generation:

Beat Analysis: Each scene is broken into visual beats: discrete moments that need to be illustrated. The analysis identifies the setting, characters present, lighting, mood, and camera framing for every beat.
Prompt Generation: For each beat, StoryArt generates detailed image prompts specifically tailored for SwarmUI and ComfyUI, our local image generation interfaces. Prompts include character consistency tokens, style directives, and compositional guidance.
Per-Scene Processing: Scripts are split by scene markers and each scene is analyzed independently with full context, ensuring visual continuity across the episode without duplication.
Session Handoff: Beat analysis results are written to Redis, where StoryTeller can read them for integration into the broader production pipeline.

The Rest of the Stack

SwarmUI

Local image generation interfaces running Flux1-dv and zImage models. Receives StoryArt's tailored prompts and generates scene imagery with character-consistent results.

DaVinci Resolve

Professional video editing. AI-generated imagery, voice synthesis, and sound design are assembled into finished YouTube episodes.

Voice Synthesis

AI-generated narration and character voices, integrated into the video production pipeline.

Fact Verification

Hybrid human-AI system extracts and verifies narrative facts to prevent continuity errors across 75 episodes.