Building a Serverless Media Pipeline with AWS Step Functions
Echosaw's media analysis pipeline is built on AWS Step Functions with ECS Fargate tasks. Learn how we orchestrate the probe, reduce, and report workflow at scale.
Processing media at scale requires orchestration that can handle variable workloads, recover from failures, and optimize costs. Echosaw's media analysis pipeline is built on AWS Step Functions with a serverless architecture that balances scalability, fault tolerance, and cost efficiency. This post explains how we orchestrate the probe → reduce → report workflow using ECS Fargate tasks.
The State Machine Architecture
Our State Machine orchestrates media processing through a series of stages, each implemented as an ECS Fargate task. The state machine manages the entire lifecycle from upload to delivery, with built-in retry logic, error handling, and parallel execution where possible.
The core pipeline consists of three main phases:
Phase 1: Probe — The probe task validates media, extracts metadata using FFmpeg, enforces tier-based duration limits, and extracts location data from QuickTime tags. This stage ensures that only valid, supported media proceeds to expensive AI processing. The probe task runs with 1 vCPU and 2GB memory, with a 5-minute timeout. Phase 2: Reduce — The reducer task aggregates results from parallel AI jobs (transcription, visual recognition, content moderation). It segments transcripts, applies confidence thresholds, and builds a compact "spine" representation that feeds into report generation. This is the most compute-intensive stage, running with 2 vCPU and 4GB memory with a 10-minute timeout. Phase 3: Report — The report task takes the reduced data and generates the final intelligence report using Bedrock/Claude. It produces summaries, extracts insights, and formats the structured output. This stage runs with 1 vCPU and 2GB memory with a 15-minute timeout.Parallel AI Job Execution
Between probe and reduce, the state machine launches parallel AI jobs using AWS managed services:
- Amazon Transcribe for speech-to-text transcription
- Amazon Rekognition for visual label detection and content moderation
- Custom ECS tasks for thumbnail generation and audio moderation
Fault Tolerance and Retry Logic
Each task in the pipeline includes comprehensive retry configuration:
- ECS tasks retry on service exceptions with exponential backoff (5s, 10s, 20s intervals, up to 3 attempts)
- AWS service integrations (S3, DynamoDB) retry with similar backoff strategies
- Catch-all error handlers route failures to a centralized error handler that notifies users and logs details for ops investigation
Cost Optimization
Serverless architecture enables significant cost optimizations:
- Pay-per-use pricing — We only pay for ECS tasks when they're actually running, not for idle infrastructure
- Right-sized resources — Each task uses precisely the CPU and memory it needs (probe: 1vCPU/2GB, reducer: 2vCPU/4GB, report: 1vCPU/2GB)
- Parallel execution — Running AI jobs concurrently reduces wall-clock time, which reduces overall compute cost
- Automatic scaling — ECS Fargate scales automatically based on workload, no manual capacity planning required
Observability and Monitoring
Every task emits structured logs to CloudWatch, including stage timing, success/failure status, and diagnostic information. The state machine itself logs execution history, making it easy to trace exactly what happened for any given media analysis job. This observability is critical for debugging issues and optimizing performance.
The probe → reduce → report architecture, orchestrated by Step Functions and implemented with ECS Fargate, provides a robust foundation for media processing at scale. It handles variable workloads gracefully, recovers automatically from transient failures, and optimizes costs through serverless efficiency. This architecture has proven reliable in production, processing millions of minutes of media with consistent performance and predictable costs.