Building a Serverless Media Pipeline with AWS Step Functions

Processing media at scale requires orchestration that can handle variable workloads, recover from failures, and optimize costs. Echosaw's media analysis pipeline is built on AWS Step Functions with a serverless architecture that balances scalability, fault tolerance, and cost efficiency. This post explains how we orchestrate the probe → reduce → report workflow using ECS Fargate tasks.

The State Machine Architecture

Our State Machine orchestrates media processing through a series of stages, each implemented as an ECS Fargate task. The state machine manages the entire lifecycle from upload to delivery, with built-in retry logic, error handling, and parallel execution where possible.

The core pipeline consists of three main phases:

Phase 1: Probe — The probe task validates media, extracts metadata using FFmpeg, enforces tier-based duration limits, and extracts location data from QuickTime tags. This stage ensures that only valid, supported media proceeds to expensive AI processing. The probe task runs with 1 vCPU and 2GB memory, with a 5-minute timeout. Phase 2: Reduce — The reducer task aggregates results from parallel AI jobs (transcription, visual recognition, content moderation). It segments transcripts, applies confidence thresholds, and builds a compact "spine" representation that feeds into report generation. This is the most compute-intensive stage, running with 2 vCPU and 4GB memory with a 10-minute timeout. Phase 3: Report — The report task takes the reduced data and generates the final intelligence report using Bedrock/Claude. It produces summaries, extracts insights, and formats the structured output. This stage runs with 1 vCPU and 2GB memory with a 15-minute timeout.

Parallel AI Job Execution

Between probe and reduce, the state machine launches parallel AI jobs using AWS managed services:

Amazon Transcribe for speech-to-text transcription
Amazon Rekognition for visual label detection and content moderation
Custom ECS tasks for thumbnail generation and audio moderation

These jobs run concurrently rather than sequentially, reducing overall processing time. The state machine waits for all parallel branches to complete before proceeding to the reduce stage.

Fault Tolerance and Retry Logic

Each task in the pipeline includes comprehensive retry configuration:

ECS tasks retry on service exceptions with exponential backoff (5s, 10s, 20s intervals, up to 3 attempts)
AWS service integrations (S3, DynamoDB) retry with similar backoff strategies
Catch-all error handlers route failures to a centralized error handler that notifies users and logs details for ops investigation

The state machine itself has a 4-hour timeout, ensuring that stuck workflows don't run indefinitely. All task timeouts are set well below this limit to allow for retry attempts.

Cost Optimization

Serverless architecture enables significant cost optimizations:

Pay-per-use pricing — We only pay for ECS tasks when they're actually running, not for idle infrastructure
Right-sized resources — Each task uses precisely the CPU and memory it needs (probe: 1vCPU/2GB, reducer: 2vCPU/4GB, report: 1vCPU/2GB)
Parallel execution — Running AI jobs concurrently reduces wall-clock time, which reduces overall compute cost
Automatic scaling — ECS Fargate scales automatically based on workload, no manual capacity planning required

We've also implemented Lambda alternatives for short-form media (≤30 minutes) to further optimize costs for smaller jobs. The state machine routes short media to Lambda functions when feature flags are enabled, taking advantage of Lambda's lower cost for short-duration workloads.

Observability and Monitoring

Every task emits structured logs to CloudWatch, including stage timing, success/failure status, and diagnostic information. The state machine itself logs execution history, making it easy to trace exactly what happened for any given media analysis job. This observability is critical for debugging issues and optimizing performance.