Payload and output preview
Payload example
{
"version": "v1",
"output": {
"width": 1080,
"height": 1920,
"fps": 30
},
"assets": [
{ "id": "talking-head", "type": "video", "url": "https://pub-2ad5592bc4ca44abb609acfc0b7c5ceb.r2.dev/reel-forge-website-assets/talking%20head%20runner.mp4" }
],
"composition": {
"timeline": [
{
"id": "video-layer",
"type": "video",
"asset_id": "talking-head",
"time": { "start_seconds": 0, "duration_seconds": 13.8 }
}
],
"captions": {
"provider": "assemblyai",
"preset": "karaoke_yellow",
"mode": "phrase_karaoke",
"max_chars_per_segment": 20,
"correct_text": "Not going to pretend I want to do this. I don't. But I also know I'll feel better after and hate myself if I don't. So - off we go.",
"layout": { "x": "8%", "y": "77%", "width": "84%", "height": "18%" },
"style": { "font_size": 64, "highlight_color": "#FFEA00" },
"words": [
{ "text": "Not", "start": 880, "end": 1000, "speaker": "A" },
{ "text": "going", "start": 1000, "end": 1160, "speaker": "A" },
{ "text": "to", "start": 1160, "end": 1320, "speaker": "A" },
{ "text": "pretend", "start": 1320, "end": 1640, "speaker": "A" },
{ "text": "I", "start": 1640, "end": 1760, "speaker": "A" },
{ "text": "want", "start": 1760, "end": 1880, "speaker": "A" },
{ "text": "to", "start": 1880, "end": 2040, "speaker": "A" },
{ "text": "do", "start": 2040, "end": 2200, "speaker": "A" },
{ "text": "this.", "start": 2200, "end": 2480, "speaker": "A" },
{ "text": "I", "start": 3120, "end": 3440, "speaker": "A" },
{ "text": "don't.", "start": 3440, "end": 3840, "speaker": "A" },
{ "text": "But", "start": 5040, "end": 5320, "speaker": "A" },
{ "text": "I", "start": 5320, "end": 5480, "speaker": "A" },
{ "text": "also", "start": 5480, "end": 5720, "speaker": "A" },
{ "text": "know", "start": 5720, "end": 5960, "speaker": "A" },
{ "text": "I'll", "start": 5960, "end": 6160, "speaker": "A" },
{ "text": "feel", "start": 6160, "end": 6320, "speaker": "A" },
{ "text": "better", "start": 6320, "end": 6560, "speaker": "A" },
{ "text": "after", "start": 6560, "end": 6840, "speaker": "A" },
{ "text": "and", "start": 6840, "end": 7160, "speaker": "A" },
{ "text": "hate", "start": 7160, "end": 7480, "speaker": "A" },
{ "text": "myself", "start": 7480, "end": 7800, "speaker": "A" },
{ "text": "if", "start": 7800, "end": 8000, "speaker": "A" },
{ "text": "I", "start": 8000, "end": 8240, "speaker": "A" },
{ "text": "don't.", "start": 8240, "end": 8640, "speaker": "A" },
{ "text": "So—", "start": 10080, "end": 10480, "speaker": "A" },
{ "text": "off", "start": 12720, "end": 13080, "speaker": "A" },
{ "text": "we", "start": 13080, "end": 13360, "speaker": "A" },
{ "text": "go.", "start": 13360, "end": 13680, "speaker": "A" }
]
}
}
}Output preview
Featured caption example using karaoke_yellow + phrase_karaoke on a talking-head clip.
Caption Examples
Use composition.captions for programmatic caption rendering with preset + mode behavior.
1) One clear request example
This featured request shows the complete shape for a high-readability talking-head caption render.
Source asset
- Talking-head video: talking head runner
Words sample
The words array is the timing source (start / end in milliseconds).
Keep this provider-native whenever possible.
[
{ "text": "Not", "start": 880, "end": 1000, "speaker": "A" },
{ "text": "going", "start": 1000, "end": 1160, "speaker": "A" },
{ "text": "to", "start": 1160, "end": 1320, "speaker": "A" },
{ "text": "pretend", "start": 1320, "end": 1640, "speaker": "A" },
{ "text": "I", "start": 1640, "end": 1760, "speaker": "A" },
{ "text": "want", "start": 1760, "end": 1880, "speaker": "A" },
{ "text": "to", "start": 1880, "end": 2040, "speaker": "A" },
{ "text": "do", "start": 2040, "end": 2200, "speaker": "A" }
]
Featured payload (Karaoke Yellow)
{
"version": "v1",
"output": { "width": 1080, "height": 1920, "fps": 30 },
"assets": [
{ "id": "talking-head", "type": "video", "url": "https://pub-2ad5592bc4ca44abb609acfc0b7c5ceb.r2.dev/reel-forge-website-assets/talking%20head%20runner.mp4" }
],
"composition": {
"timeline": [
{
"id": "video-layer",
"type": "video",
"asset_id": "talking-head",
"time": { "start_seconds": 0, "duration_seconds": 13.8 }
}
],
"captions": {
"provider": "assemblyai",
"preset": "karaoke_yellow",
"mode": "phrase_karaoke",
"max_chars_per_segment": 20,
"correct_text": "Not going to pretend I want to do this. I don't. But I also know I'll feel better after and hate myself if I don't. So - off we go.",
"layout": { "x": "8%", "y": "77%", "width": "84%", "height": "18%" },
"style": { "font_size": 64, "highlight_color": "#FFEA00" },
"words": [
{ "text": "Not", "start": 880, "end": 1000 },
{ "text": "going", "start": 1000, "end": 1160 },
{ "text": "to", "start": 1160, "end": 1320 }
]
}
}
}
Why this request works
- Uses a single base video layer (no split-screen/no extra visual clutter)
- Applies
correct_textto improve punctuation/casing alignment - Uses
phrase_karaokefor natural phrase grouping with active-word progression - Positions captions at a safe lower-third region for social readability
2) Caption presets comparison
Use the same source clip and timing input, then switch captions.preset to compare output style.
Quick preset picks
- Fast default:
tiktok_classic - Most social punch:
bold_outline - Best active-word emphasis:
karaoke_yellow - Editorial/luxury styles:
typewriter,luxury_serif - Stylized/experimental:
neon_glow,handwriting - High-contrast bubble look:
soft_pill
| Preset | Preview | Best for |
|---|---|---|
| TikTok Classic | Preview video | clean social default readability |
| Bold Outline | Preview video | high-impact, thick-stroke emphasis |
| Karaoke Yellow | Preview video | active-word karaoke progression |
| Neon Glow | Preview video | stylized glow look for energetic edits |
| Pill Captions | Preview video | dark rounded bubble with strong contrast |
| Typewriter | Preview video | editorial mono-text presentation |
| Handwriting | Preview video | informal creator voice style |
| Luxury Serif | Preview video | premium editorial aesthetic |