Timeline Request Structure

Timeline requests use a full render manifest shape.

Need the short mental model first? See Timeline Overview timing inference model.

Endpoint

Render: POST https://api.reelforger.com/v1/videos/render
Validate: POST https://api.reelforger.com/v1/videos/validate

Top-level structure

{
  "version": "v1",
  "output": { "width": 1080, "height": 1920, "fps": 30 },
  "assets": [],
  "composition": {}
}

Authoring invariants

These rules are enforced by the contract and are worth treating as non-negotiable when generating manifests:

output.fps is fixed at 30
output.width and output.height must be even integers
composition.duration_seconds cannot exceed 300
assets[].id values must be unique
composition.timeline[].id values must be unique
composition.text_overlays[].id values must be unique when present
every composition.timeline[].asset_id must reference an existing entry in assets[]
caption words[].start and words[].end are in milliseconds, not seconds
metadata must be a flat key/value object with at most 10 keys; keys and string values are capped at 500 characters

Required fields

version
output.width, output.height, output.fps
composition object

At least one visual/audio path must be represented through composition.timeline and corresponding assets.

Optional but common fields

idempotency_key (safe retry dedupe)
composition.auto_stitch (stitch untimed video by layer order; untimed audio in mixed timelines defaults to start_seconds: 0 unless explicitly timed)
composition.text_overlays
composition.captions
webhook_url, webhook_headers, webhook_secret
metadata

Assets and layer linkage

Each timeline layer references an asset via asset_id.
Every asset_id must exist in assets[].
Layer type and asset type should match intended usage.
Asset type must be one of video, audio, or image.

Time rules

image layers require time.start_seconds.
image.time.duration_seconds can be omitted when composition duration is inferable:
- from explicit composition.duration_seconds,
- from max timed end across timeline/text overlays,
- or at render-time when composition.auto_stitch is enabled and media durations are probed.
video/audio layers require time unless composition.auto_stitch is true.
trim.start_seconds is optional for audio/video.
Explicit layer timing always wins when provided.
When composition.auto_stitch: true, untimed video layers are sequenced in composition.timeline order.
In mixed timelines, untimed audio layers default to start_seconds: 0 and are aligned to the stitched video duration unless explicit time is provided.
composition.text_overlays also contribute to inferred composition duration for image-layer timing.

Shared enum values and defaults

Layout

Field	Allowed values	Default / note
`layout.fit`	`cover`, `contain`	Defaults to `cover`
`layout.x`, `layout.y`	typically percent or pixel strings	If expressed as percentages, keep them in sane bounds
`layout.width`, `layout.height`	typically percent or pixel strings	Defaults are full-frame when layout is omitted

Percent guardrails when using % values:

x, y: between -100% and 100%
width, height: between 0% and 100%

Layer visuals

Field	Allowed values	Default / note
`background_mode`	`blurred`, `transparent`, `solid`	Defaults to `blurred`
`motion`	`zoom_in`, `zoom_out`, `pan_left`, `pan_right`, `none`	Defaults to `none`

Text and captions

Field	Allowed values	Default / note
`text_align`	`left`, `center`, `right`, `justify`	Applies inside the text overlay bounding box
`captions.mode`	`word_only`, `phrase`, `phrase_karaoke`	Defaults to `phrase_karaoke`
`captions.provider`	`assemblyai`	Current provider enum

Caption preset values:

tiktok_classic
bold_outline
karaoke_yellow
neon_glow
soft_pill
typewriter
handwriting
luxury_serif

Captions and alignment

If composition.captions.words is provided, ReelForger uses those words/timestamps directly.
If composition.captions.words is omitted, ReelForger may transcribe automatically.
If ReelForger cannot determine a single speech source, you must provide composition.captions.transcription_source_asset_id.
Add composition.captions.correct_text when you need improved punctuation/casing alignment.
correct_text remains valid even when words are auto-transcribed by ReelForger.
Current behavior: auto-transcription runs on the full selected source duration sent for transcription.
If correct_text aligns poorly to the detected words, the render can fail with caption_alignment_failed.
Keep caption placement in safe lower-third regions for social readability.

Caption configuration quick matrix

Field	Required	Notes
`captions.provider`	Yes	Currently `assemblyai`
`captions.preset`	Yes	Choose from the supported preset enum above
`captions.mode`	Yes	`phrase_karaoke` is the default / most social-friendly
`captions.words`	Optional	Supply when you already have timed words in milliseconds
`captions.transcription_source_asset_id`	Conditionally required	Required when words are omitted and speech source is ambiguous
`captions.correct_text`	Optional	Helps punctuation/casing, but can fail alignment if the text is too different
`captions.max_chars_per_segment`	Optional	Chunking control for phrase-based captioning
`captions.time_overrides`	Optional	Apply style/layout overrides to specific time windows

Timeline caption example (omitted words + explicit source)

{
  "version": "v1",
  "output": { "width": 1080, "height": 1920, "fps": 30 },
  "assets": [
    { "id": "talking-head", "type": "video", "url": "https://example.com/talking-head.mp4" }
  ],
  "composition": {
    "timeline": [
      {
        "id": "layer-1",
        "type": "video",
        "asset_id": "talking-head",
        "time": { "start_seconds": 0, "duration_seconds": 12 }
      }
    ],
    "captions": {
      "provider": "assemblyai",
      "preset": "karaoke_yellow",
      "mode": "phrase_karaoke",
      "transcription_source_asset_id": "talking-head"
    }
  }
}

Validate first

Use https://api.reelforger.com/v1/videos/validate with the exact same body before rendering in production. Warnings highlight common readability/layout/timing risks before credits are spent.

Less common but schema-visible fields

motion on video and image layers is supported and can be used for Ken Burns style movement over the layer duration.
style.transform is supported on media layers when you need explicit CSS-like transform control.
video.media_settings.muted is available when the video should contribute no audible output.
video.media_settings.crossfade_seconds is a more advanced video-only setting. Use it cautiously, validate first, and prefer simpler timing/layout patterns unless you specifically need overlapping transitions.
transitions is currently schema-visible but not a recommended primary authoring surface in these docs. Unless you have a validated known-good pattern, prefer explicit timing and layer composition instead.