Back to blog
Engineering

Building a Serverless Media Pipeline with AWS Step Functions

Echosaw's media analysis pipeline is built on AWS Step Functions with ECS Fargate tasks. Learn how we orchestrate the probe, reduce, and report workflow at scale.

Echosaw TeamMay 28, 20269 min read

Processing media at scale requires orchestration that can handle variable workloads, recover from failures, and optimize costs. Echosaw's media analysis pipeline is built on AWS Step Functions with a serverless architecture that balances scalability, fault tolerance, and cost efficiency. This post explains how we orchestrate the probe → reduce → report workflow using ECS Fargate tasks.

The State Machine Architecture

Our State Machine orchestrates media processing through a series of stages, each implemented as an ECS Fargate task. The state machine manages the entire lifecycle from upload to delivery, with built-in retry logic, error handling, and parallel execution where possible.

The core pipeline consists of three main phases:

Phase 1: Probe — The probe task validates media, extracts metadata using FFmpeg, enforces tier-based duration limits, and extracts location data from QuickTime tags. This stage ensures that only valid, supported media proceeds to expensive AI processing. The probe task runs with 1 vCPU and 2GB memory, with a 5-minute timeout. Phase 2: Reduce — The reducer task aggregates results from parallel AI jobs (transcription, visual recognition, content moderation). It segments transcripts, applies confidence thresholds, and builds a compact "spine" representation that feeds into report generation. This is the most compute-intensive stage, running with 2 vCPU and 4GB memory with a 10-minute timeout. Phase 3: Report — The report task takes the reduced data and generates the final intelligence report using Bedrock/Claude. It produces summaries, extracts insights, and formats the structured output. This stage runs with 1 vCPU and 2GB memory with a 15-minute timeout.

Parallel AI Job Execution

Between probe and reduce, the state machine launches parallel AI jobs using AWS managed services:

  • Amazon Transcribe for speech-to-text transcription
  • Amazon Rekognition for visual label detection and content moderation
  • Custom ECS tasks for thumbnail generation and audio moderation
These jobs run concurrently rather than sequentially, reducing overall processing time. The state machine waits for all parallel branches to complete before proceeding to the reduce stage.

Fault Tolerance and Retry Logic

Each task in the pipeline includes comprehensive retry configuration:

  • ECS tasks retry on service exceptions with exponential backoff (5s, 10s, 20s intervals, up to 3 attempts)
  • AWS service integrations (S3, DynamoDB) retry with similar backoff strategies
  • Catch-all error handlers route failures to a centralized error handler that notifies users and logs details for ops investigation
The state machine itself has a 4-hour timeout, ensuring that stuck workflows don't run indefinitely. All task timeouts are set well below this limit to allow for retry attempts.

Cost Optimization

Serverless architecture enables significant cost optimizations:

  • Pay-per-use pricing — We only pay for ECS tasks when they're actually running, not for idle infrastructure
  • Right-sized resources — Each task uses precisely the CPU and memory it needs (probe: 1vCPU/2GB, reducer: 2vCPU/4GB, report: 1vCPU/2GB)
  • Parallel execution — Running AI jobs concurrently reduces wall-clock time, which reduces overall compute cost
  • Automatic scaling — ECS Fargate scales automatically based on workload, no manual capacity planning required
We've also implemented Lambda alternatives for short-form media (≤30 minutes) to further optimize costs for smaller jobs. The state machine routes short media to Lambda functions when feature flags are enabled, taking advantage of Lambda's lower cost for short-duration workloads.

Observability and Monitoring

Every task emits structured logs to CloudWatch, including stage timing, success/failure status, and diagnostic information. The state machine itself logs execution history, making it easy to trace exactly what happened for any given media analysis job. This observability is critical for debugging issues and optimizing performance.

The probe → reduce → report architecture, orchestrated by Step Functions and implemented with ECS Fargate, provides a robust foundation for media processing at scale. It handles variable workloads gracefully, recovers automatically from transient failures, and optimizes costs through serverless efficiency. This architecture has proven reliable in production, processing millions of minutes of media with consistent performance and predictable costs.

Ready to bring powerful multimodal AI to your media operations?

Trusted at scale to extract semantic insights, build intelligent timelines, deliver accurate transcripts, analyze audio and visual content, and generate synthetic media — with full control and security. Start with our Starter plan for $9/month — usage-based pricing so you only pay for what you analyze.