YouTube / Video Streaming
VERY COMMONVideo streaming is asked at every FAANG company. It is how YouTube delivers 1 billion hours of video per day to 2 billion monthly users with sub-2-second playback start time. You will design a DAG-based transcoding pipeline that parallelizes FFmpeg workers across 8 resolutions, a CDN layer with 95%+ cache hit ratio serving 46K views per second, and adaptive bitrate streaming that reduces buffering by 90%.
- Design a DAG-based transcoding pipeline with parallel chunk encoding
- Architect CDN delivery for 46K views/sec with 95%+ cache hit ratio
- Implement adaptive bitrate streaming with HLS/DASH protocols
Visual Solutions
Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.
Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for Video Streaming. Everything you need at a glance.
Anti-Patterns
Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.
Failure Modes
What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.
Start simple. Build to staff-level.
“Video streaming for 800M DAU at 46K views/sec. Upload: resumable chunked upload to S3, then DAG-based transcoding splits into 2-second chunks fanning out to parallel FFmpeg workers across 8 resolutions. A 10-min 1080p video transcodes in under 5 minutes. Viewing: HLS adaptive bitrate from CDN edge POPs with 95%+ cache hit ratio. Metadata in MySQL sharded by video_id. Thumbnails in Bigtable. View counts via Kafka, batch-updated every 30 seconds. Storage grows at 25 GB/sec.”
Resumable Upload
EASYWe chose chunked uploads (not single-POST) because large files need fault tolerance. Split into 5-10 MB chunks, track last successful byte offset on the server. On interruption, resume from where we left off. Maximum wasted transfer per drop: 10 MB instead of the full file.
Core Feature DesignThumbnail Generation
EASYWe chose Bigtable (not S3) for thumbnails because we need sub-10ms random reads at scale. Extract 5 candidate frames at 20/40/50/60/80% of video duration using FFmpeg. Store as 5 KB JPEGs. 1 billion videos x 5 thumbnails = 25 TB total.
Core Feature DesignAdaptive Bitrate Streaming
STANDARDWe chose ABR with HLS/DASH (not fixed-quality delivery) so the player adapts to each viewer's bandwidth. The player measures bandwidth every few seconds and picks the highest rendition that fits. Quality switches happen at segment boundaries (every 2-10 seconds) without rebuffering.
Core Feature DesignCDN Edge Caching
STANDARDWe chose a tiered CDN strategy (not uniform caching) because video popularity follows a power law. Cache segments at 100+ edge POPs. Achieve 95%+ cache hit ratio by combining origin pull for long-tail content with push pre-warming for viral videos. Reduces origin bandwidth from 115 TB/sec to under 6 TB/sec.
High Level DesignVideo Chunking and GOP Alignment
STANDARDWe chose GOP-aligned segments (not arbitrary time splits) because each segment must start with an I-frame to be independently decodable. Split video into 2-10 second segments aligned to Group of Pictures boundaries. This enables both ABR quality switching and parallel transcoding.
Core Feature DesignVideo Metadata Store
STANDARDWe chose MySQL sharded by video_id (not user_id) because the hot path is metadata lookup by video_id. Store 10 KB of metadata per video. Cache hot videos in Redis with 1-hour TTL. At 46K reads/sec, a 95% cache hit ratio keeps MySQL at 2,300 QPS.
Database DesignVideo Transcoding Pipeline
TRICKYWe chose chunk-based parallel encoding (not single-file sequential) because a 4-hour single-machine job becomes a 5-minute distributed job. A DAG-based pipeline splits video into GOP-aligned chunks, fans them to parallel FFmpeg workers via a message queue, and encodes each chunk into 8 resolutions.
Core Feature DesignVideo Deduplication
TRICKYWe chose perceptual hashing (not file-hash dedup) because file hashes miss re-encoded copies. Extract keyframes, compute a visual fingerprint, and compare against a fingerprint database. Perceptual hashing catches videos with different codecs, resolutions, or minor edits that are visually identical.
Fault Tolerance