Recipe: captioned_clip

Status: Live

Transcript-driven clip recipe for podcast/interview formats with full-frame or split-screen layout.

Bridge Note

Recipes are the recommended path. Use https://api.reelforger.com/v1/recipes/render as the canonical endpoint.

When to use it

  • You have spoken-video clips and transcript words
  • You need caption-forward social clipping quickly

Input assets

This example uses the following assets:

Required inputs

  • primary_video_url
  • transcript_words
  • layout_variant
  • style_preset

Optional overrides

  • correct_text (punctuated reference for caption alignment)
  • secondary_video_url (required for split_screen)
  • cta_text
  • cta_style/layout
  • captions_mode (phrase_karaoke default; supports phrase and word_only)
  • captions_style/layout

Request structure

  • Send requests to https://api.reelforger.com/v1/recipes/render.
  • Use canonical field recipe_id.
  • Keep style_preset at the root and recipe-specific values inside variables.

Variable behavior guide

  • primary_video_url: main speaking clip.
  • layout_variant: full_frame or split_screen mode.
  • transcript_words: timestamped word timing source.
  • correct_text: cleaned transcript to improve punctuation alignment.
  • captions_mode: phrase_karaoke, phrase, or word_only behavior.
  • captions_layout / captions_style: readability and placement tuning.

Payload and output preview

Payload example

{
  "recipe_id": "captioned_clip",
  "style_preset": "karaoke_yellow",
  "variables": {
    "primary_video_url": "https://pub-2ad5592bc4ca44abb609acfc0b7c5ceb.r2.dev/reel-forge-website-assets/talking%20head%20runner.mp4",
    "layout_variant": "full_frame",
    "transcript_words": [
      { "text": "Not", "start": 880, "end": 1000, "speaker": "A" },
      { "text": "going", "start": 1000, "end": 1160, "speaker": "A" },
      { "text": "to", "start": 1160, "end": 1320, "speaker": "A" },
      { "text": "pretend", "start": 1320, "end": 1640, "speaker": "A" },
      { "text": "I", "start": 1640, "end": 1760, "speaker": "A" },
      { "text": "want", "start": 1760, "end": 1880, "speaker": "A" },
      { "text": "to", "start": 1880, "end": 2040, "speaker": "A" },
      { "text": "do", "start": 2040, "end": 2200, "speaker": "A" },
      { "text": "this.", "start": 2200, "end": 2480, "speaker": "A" },
      { "text": "I", "start": 3120, "end": 3440, "speaker": "A" },
      { "text": "don't.", "start": 3440, "end": 3840, "speaker": "A" },
      { "text": "But", "start": 5040, "end": 5320, "speaker": "A" },
      { "text": "I", "start": 5320, "end": 5480, "speaker": "A" },
      { "text": "also", "start": 5480, "end": 5720, "speaker": "A" },
      { "text": "know", "start": 5720, "end": 5960, "speaker": "A" },
      { "text": "I'll", "start": 5960, "end": 6160, "speaker": "A" },
      { "text": "feel", "start": 6160, "end": 6320, "speaker": "A" },
      { "text": "better", "start": 6320, "end": 6560, "speaker": "A" },
      { "text": "after", "start": 6560, "end": 6840, "speaker": "A" },
      { "text": "and", "start": 6840, "end": 7160, "speaker": "A" },
      { "text": "hate", "start": 7160, "end": 7480, "speaker": "A" },
      { "text": "myself", "start": 7480, "end": 7800, "speaker": "A" },
      { "text": "if", "start": 7800, "end": 8000, "speaker": "A" },
      { "text": "I", "start": 8000, "end": 8240, "speaker": "A" },
      { "text": "don't.", "start": 8240, "end": 8640, "speaker": "A" },
      { "text": "So—", "start": 10080, "end": 10480, "speaker": "A" },
      { "text": "off", "start": 12720, "end": 13080, "speaker": "A" },
      { "text": "we", "start": 13080, "end": 13360, "speaker": "A" },
      { "text": "go.", "start": 13360, "end": 13680, "speaker": "A" }
    ],
    "correct_text": "Not going to pretend I want to do this. I don't. But I also know I'll feel better after and hate myself if I don't. So - off we go.",
    "captions_mode": "phrase",
    "captions_layout": { "y": "72%" }
  }
}

Output preview

Talking-head clip with Karaoke Yellow captions, correct_text for punctuation alignment.

Constraints

  • secondary_video_url is required when layout_variant=split_screen
  • Duration derives from transcript_words

Common mistakes and errors

  • Choosing split_screen without secondary_video_url
  • Supplying unsorted transcript tokens
  • Omitting correct_text when raw transcript has poor punctuation

Make/Zapier mapping tips

  • Map incoming speaker clip URL to primary_video_url and optional B-roll URL to secondary_video_url.
  • If your no-code tool has branching, only include secondary_video_url when layout_variant is split_screen.
  • Feed transcript_words directly from your STT output object without renaming start/end keys.