Video Streaming Anti-Patterns
Common design mistakes candidates make. Learn what goes wrong and how to avoid each trap in your interview.
Transcoding Entire Video as One File
We chose chunk-based parallel encoding (not monolithic single-file transcoding) because encoding a full video as one job wastes hours on one worker and forces a complete restart if the job fails at any point. Trade-off: chunked encoding requires GOP-aligned splits and a merge step, but the parallelism and partial-failure recovery make it essential at scale.
Why: Candidates think of transcoding like converting a file format: input one file, output one file. They do not realize that a 60-minute 4K video takes 4+ hours to encode on a single machine. Without chunking, there is no parallelism and no partial recovery. The mental model is 'transcode = one function call,' not 'transcode = distributed pipeline.'
Serving Video from Origin Servers
We chose CDN edge delivery (not origin-direct serving) because a single 1080p stream consumes 5 Mbps. At 46K concurrent streams, that is of egress, which no origin cluster can handle. Trade-off: CDN adds per-GB egress cost, but the alternative is an origin that cannot physically serve the traffic.
Why: Candidates forget that video is not like API responses. They mention 'we will add more servers' without realizing the fundamental problem: latency. A viewer in Tokyo hitting an origin in Virginia adds 150ms per segment fetch, causing rebuffering regardless of origin capacity.
Single-Resolution Upload Only
We chose multi-rendition transcoding with ABR (not serving the original upload resolution to all viewers) because a 4K upload at 20 Mbps is unwatchable on a 2 Mbps mobile connection. Trade-off: transcoding into 8 renditions costs 8x the compute and storage, but it makes the video playable for every viewer on every device.
Why: Candidates store the uploaded file as-is and serve it directly. They skip the transcoding pipeline entirely or transcode to one lower resolution. They do not think about the viewer on a train with 1 Mbps bandwidth trying to watch a 4K upload at 20 Mbps. The gap between upload quality and viewer bandwidth is the entire reason ABR exists.
Synchronous Transcoding Blocking Upload Response
We chose async transcoding with status polling (not synchronous blocking) because a 10-minute video takes 3-5 minutes to transcode. Holding an HTTP connection open for 5 minutes causes load balancer timeouts (60s default), user confusion, and 30K concurrent connections at 100 uploads/sec. Trade-off: async requires a status endpoint and client-side polling logic, but it is the only viable approach for long-running media processing.
Why: Candidates design the upload endpoint as: receive file, transcode, return success. This works for a 30-second clip on a fast machine. But a 60-minute video takes minutes to transcode even with parallel chunking. The HTTP connection times out, the user sees a spinning wheel, and if their browser tab closes, they think the upload failed.
Storing Thumbnails on Regular Filesystem
We chose Bigtable (not a POSIX filesystem or S3) for thumbnails because we need sub-10ms random reads for 5 billion small files. Filesystems hit inode limits at 100-500 million files. S3 optimizes for throughput, not latency. Trade-off: Bigtable costs more per GB than S3, but the latency requirement is non-negotiable for thumbnail serving on video listing pages.
Why: Thumbnails are images, and the instinct is to save them as files in a directory. This works for 10,000 videos. At 1 billion videos with 5 thumbnails each, that is 5 billion files. Most filesystems limit inodes to 100-500 million. Even with directory sharding, random reads across billions of files are slow because the filesystem's metadata cache cannot hold all inode entries.
No Resumable Upload for Large Files
We chose a resumable chunked protocol (not standard multipart/form-data) for uploads because any network interruption during a multi-gigabyte upload forces a restart from byte zero. A 5 GB upload on a 50 Mbps connection takes 13 minutes. Without resume, 3 failures and the user gives up. Trade-off: resumable uploads require per-session state tracking on the server, but the maximum wasted transfer drops from the full file size to 10 MB.
Why: Standard HTTP file upload (multipart/form-data) works for small files. Candidates use the same approach for 5 GB videos. Mobile uploads on spotty connections drop frequently. Without resume, the user uploads 4.9 GB, the connection drops, and they start over.
Same CDN Strategy for All Videos
We chose a popularity-tiered CDN strategy (not uniform caching for all videos) because the top 1% of videos get 80% of views (power-law distribution). Pushing all 1 billion videos to every edge POP is impossible (the storage cost exceeds the CDN budget). Relying only on origin pull for viral videos triggers thundering herd cache misses. Trade-off: tiered caching adds complexity in the form of popularity tracking and promotion logic, but it matches CDN resources to actual demand.
Why: Candidates say 'use CDN' as a one-size-fits-all answer. But the top 1% of videos get 80% of views. Conversely, relying only on origin pull for viral videos means the first million viewers all trigger cache misses, overwhelming the origin.
Real-Time View Counting on Every Request
We chose Kafka async ingestion with batch MySQL updates (not synchronous per-view database increments) because at 46K views/sec, InnoDB row-level locking on a hot video row serializes all writes. We use Redis INCR for the real-time display and batch the MySQL authoritative count every 30 seconds. Trade-off: the MySQL count lags by up to 30 seconds, but viewers see a real-time number from Redis and the database avoids write hotspot meltdown.
Why: The simplest implementation is UPDATE videos SET view_count = view_count + 1 WHERE video_id = ?. This works at 100 views/sec. At 46K views/sec with hot videos receiving thousands of concurrent increments, every view waits in a lock queue, and p99 write latency spikes from 5ms to 500ms+.