Timeline Request Structure

Timeline requests use a full render manifest shape.

Need the short mental model first? See Timeline Overview timing inference model.

Endpoint

  • Render: POST https://api.reelforger.com/v1/videos/render
  • Validate: POST https://api.reelforger.com/v1/videos/validate

Top-level structure

{
  "version": "v1",
  "output": { "width": 1080, "height": 1920, "fps": 30 },
  "assets": [],
  "composition": {}
}

Authoring invariants

These rules are enforced by the contract and are worth treating as non-negotiable when generating manifests:

  • output.fps is fixed at 30
  • output.width and output.height must be even integers
  • composition.duration_seconds cannot exceed 300
  • assets[].id values must be unique
  • composition.timeline[].id values must be unique
  • composition.text_overlays[].id values must be unique when present
  • every composition.timeline[].asset_id must reference an existing entry in assets[]
  • caption words[].start and words[].end are in milliseconds, not seconds
  • metadata must be a flat key/value object with at most 10 keys; keys and string values are capped at 500 characters

Required fields

  • version
  • output.width, output.height, output.fps
  • composition object

At least one visual/audio path must be represented through composition.timeline and corresponding assets.

Optional but common fields

  • idempotency_key (safe retry dedupe)
  • composition.auto_stitch (stitch untimed video by layer order; untimed audio in mixed timelines defaults to start_seconds: 0 unless explicitly timed)
  • composition.text_overlays
  • composition.captions
  • webhook_url, webhook_headers, webhook_secret
  • metadata

Assets and layer linkage

  • Each timeline layer references an asset via asset_id.
  • Every asset_id must exist in assets[].
  • Layer type and asset type should match intended usage.
  • Asset type must be one of video, audio, or image.

Time rules

  • image layers require time.start_seconds.
  • image.time.duration_seconds can be omitted when composition duration is inferable:
    • from explicit composition.duration_seconds,
    • from max timed end across timeline/text overlays,
    • or at render-time when composition.auto_stitch is enabled and media durations are probed.
  • video/audio layers require time unless composition.auto_stitch is true.
  • trim.start_seconds is optional for audio/video.
  • Explicit layer timing always wins when provided.
  • When composition.auto_stitch: true, untimed video layers are sequenced in composition.timeline order.
  • In mixed timelines, untimed audio layers default to start_seconds: 0 and are aligned to the stitched video duration unless explicit time is provided.
  • composition.text_overlays also contribute to inferred composition duration for image-layer timing.

Shared enum values and defaults

Layout

FieldAllowed valuesDefault / note
layout.fitcover, containDefaults to cover
layout.x, layout.ytypically percent or pixel stringsIf expressed as percentages, keep them in sane bounds
layout.width, layout.heighttypically percent or pixel stringsDefaults are full-frame when layout is omitted

Percent guardrails when using % values:

  • x, y: between -100% and 100%
  • width, height: between 0% and 100%

Layer visuals

FieldAllowed valuesDefault / note
background_modeblurred, transparent, solidDefaults to blurred
motionzoom_in, zoom_out, pan_left, pan_right, noneDefaults to none

Text and captions

FieldAllowed valuesDefault / note
text_alignleft, center, right, justifyApplies inside the text overlay bounding box
captions.modeword_only, phrase, phrase_karaokeDefaults to phrase_karaoke
captions.providerassemblyaiCurrent provider enum

Caption preset values:

  • tiktok_classic
  • bold_outline
  • karaoke_yellow
  • neon_glow
  • soft_pill
  • typewriter
  • handwriting
  • luxury_serif

Captions and alignment

  • If composition.captions.words is provided, ReelForger uses those words/timestamps directly.
  • If composition.captions.words is omitted, ReelForger may transcribe automatically.
  • If ReelForger cannot determine a single speech source, you must provide composition.captions.transcription_source_asset_id.
  • Add composition.captions.correct_text when you need improved punctuation/casing alignment.
  • correct_text remains valid even when words are auto-transcribed by ReelForger.
  • Current behavior: auto-transcription runs on the full selected source duration sent for transcription.
  • If correct_text aligns poorly to the detected words, the render can fail with caption_alignment_failed.
  • Keep caption placement in safe lower-third regions for social readability.

Caption configuration quick matrix

FieldRequiredNotes
captions.providerYesCurrently assemblyai
captions.presetYesChoose from the supported preset enum above
captions.modeYesphrase_karaoke is the default / most social-friendly
captions.wordsOptionalSupply when you already have timed words in milliseconds
captions.transcription_source_asset_idConditionally requiredRequired when words are omitted and speech source is ambiguous
captions.correct_textOptionalHelps punctuation/casing, but can fail alignment if the text is too different
captions.max_chars_per_segmentOptionalChunking control for phrase-based captioning
captions.time_overridesOptionalApply style/layout overrides to specific time windows

Timeline caption example (omitted words + explicit source)

{
  "version": "v1",
  "output": { "width": 1080, "height": 1920, "fps": 30 },
  "assets": [
    { "id": "talking-head", "type": "video", "url": "https://example.com/talking-head.mp4" }
  ],
  "composition": {
    "timeline": [
      {
        "id": "layer-1",
        "type": "video",
        "asset_id": "talking-head",
        "time": { "start_seconds": 0, "duration_seconds": 12 }
      }
    ],
    "captions": {
      "provider": "assemblyai",
      "preset": "karaoke_yellow",
      "mode": "phrase_karaoke",
      "transcription_source_asset_id": "talking-head"
    }
  }
}

Validate first

Use https://api.reelforger.com/v1/videos/validate with the exact same body before rendering in production. Warnings highlight common readability/layout/timing risks before credits are spent.

Less common but schema-visible fields

  • motion on video and image layers is supported and can be used for Ken Burns style movement over the layer duration.
  • style.transform is supported on media layers when you need explicit CSS-like transform control.
  • video.media_settings.muted is available when the video should contribute no audible output.
  • video.media_settings.crossfade_seconds is a more advanced video-only setting. Use it cautiously, validate first, and prefer simpler timing/layout patterns unless you specifically need overlapping transitions.
  • transitions is currently schema-visible but not a recommended primary authoring surface in these docs. Unless you have a validated known-good pattern, prefer explicit timing and layer composition instead.
ReelForger