Video Streaming Anti-Patterns

Common design mistakes candidates make. Learn what goes wrong and how to avoid each trap in your interview.

Transcoding Entire Video as One File

Very CommonFORMULA

We chose chunk-based parallel encoding (not monolithic single-file transcoding) because encoding a full video as one job wastes hours on one worker and forces a complete restart if the job fails at any point. Trade-off: chunked encoding requires GOP-aligned splits and a merge step, but the parallelism and partial-failure recovery make it essential at scale.

Why: Candidates think of transcoding like converting a file format: input one file, output one file. They do not realize that a 60-minute 4K video takes 4+ hours to encode on a single machine. Without chunking, there is no parallelism and no partial recovery. The mental model is 'transcode = one function call,' not 'transcode = distributed pipeline.'

WRONG: Feed the entire uploaded video to a single FFmpeg worker. A 60-minute 4K video takes 4+ hours on one machine. If the worker crashes at 95% completion, the entire 4-hour job restarts from zero. Meanwhile, 360 other workers sit idle waiting for tasks. The upload-to-playable latency for a 60-minute video exceeds 5 hours.

RIGHT: We split the video into GOP-aligned chunks (10-second segments, ~360 chunks for a 60-minute video). We push each chunk as an independent task to a message queue (SQS or Kafka). 360 FFmpeg workers encode chunks in parallel. If one worker crashes, only that 10-second chunk retries. Total encoding time drops from 4 hours to under 5 minutes. We merge encoded chunks with a lightweight concatenation step.

Serving Video from Origin Servers

Very CommonFORMULA

We chose CDN edge delivery (not origin-direct serving) because a single 1080p stream consumes 5 Mbps. At 46K concurrent streams, that is $46{,}000 \times 5\text{ Mbps} = 230\text{ Gbps}$ of egress, which no origin cluster can handle. Trade-off: CDN adds per-GB egress cost, but the alternative is an origin that cannot physically serve the traffic.

Why: Candidates forget that video is not like API responses. They mention 'we will add more servers' without realizing the fundamental problem: latency. A viewer in Tokyo hitting an origin in Virginia adds 150ms per segment fetch, causing rebuffering regardless of origin capacity.

WRONG: All video segments served from the origin data center. 46K concurrent viewers each pulling 2.5 MB/sec generates 115 TB/sec of origin bandwidth. Viewers 5,000 miles from the origin experience 150ms+ latency per segment, causing constant rebuffering. The origin's network uplink saturates and drops connections.

RIGHT: We deploy a CDN with edge Points of Presence (POPs) in 100+ cities. Popular videos are cached at the edge with a 95%+ hit ratio. For long-tail content, we use origin pull: the edge fetches from origin on the first request and caches for subsequent viewers. We pre-warm edges for viral videos using push-based replication. Total origin bandwidth drops from 115 TB/sec to under 6 TB/sec (5% miss rate).

Single-Resolution Upload Only

Very CommonFORMULA

We chose multi-rendition transcoding with ABR (not serving the original upload resolution to all viewers) because a 4K upload at 20 Mbps is unwatchable on a 2 Mbps mobile connection. Trade-off: transcoding into 8 renditions costs 8x the compute and storage, but it makes the video playable for every viewer on every device.

Why: Candidates store the uploaded file as-is and serve it directly. They skip the transcoding pipeline entirely or transcode to one lower resolution. They do not think about the viewer on a train with 1 Mbps bandwidth trying to watch a 4K upload at 20 Mbps. The gap between upload quality and viewer bandwidth is the entire reason ABR exists.

WRONG: Store the original 4K upload (20 Mbps) and serve it to everyone. A viewer on a 2 Mbps connection gets constant rebuffering: each 10-second segment is

20\text{ Mbps} \times 10\text{s} / 8 = 25\text{ MB}

, taking 100 seconds to download. The video stutters every 10 seconds. Mobile users on metered data burn through 150 MB/minute at 4K instead of 22 MB/minute at 480p.

RIGHT: We transcode every upload into a bitrate ladder of 8-10 renditions: 240p (300 Kbps) through 4K (20 Mbps). We generate an HLS/DASH manifest listing all renditions. The player's ABR algorithm measures bandwidth every few seconds and picks the highest rendition that fits. A 2 Mbps connection gets smooth 720p. A 50 Mbps fiber connection gets crisp 4K.

Synchronous Transcoding Blocking Upload Response

Very CommonFORMULA

We chose async transcoding with status polling (not synchronous blocking) because a 10-minute video takes 3-5 minutes to transcode. Holding an HTTP connection open for 5 minutes causes load balancer timeouts (60s default), user confusion, and 30K concurrent connections at 100 uploads/sec. Trade-off: async requires a status endpoint and client-side polling logic, but it is the only viable approach for long-running media processing.

Why: Candidates design the upload endpoint as: receive file, transcode, return success. This works for a 30-second clip on a fast machine. But a 60-minute video takes minutes to transcode even with parallel chunking. The HTTP connection times out, the user sees a spinning wheel, and if their browser tab closes, they think the upload failed.

WRONG: The upload endpoint blocks until transcoding completes. A 10-minute video takes 3-5 minutes to transcode. The HTTP connection hangs for 5 minutes. The load balancer has a 60-second timeout and drops the connection. The user sees a failure message even though the video is still processing. At 100 uploads/sec, each blocking for 5 minutes means 30,000 concurrent connections held open.

RIGHT: We return 202 Accepted immediately after the upload completes. We push a transcoding job to a message queue (SQS or Kafka). The client polls a status endpoint or receives a webhook callback when processing finishes. The video status moves through stages: uploaded, queued, transcoding, ready. The user sees a progress indicator and can close their browser safely.

Storing Thumbnails on Regular Filesystem

CommonFORMULA

We chose Bigtable (not a POSIX filesystem or S3) for thumbnails because we need sub-10ms random reads for 5 billion small files. Filesystems hit inode limits at 100-500 million files. S3 optimizes for throughput, not latency. Trade-off: Bigtable costs more per GB than S3, but the latency requirement is non-negotiable for thumbnail serving on video listing pages.

Why: Thumbnails are images, and the instinct is to save them as files in a directory. This works for 10,000 videos. At 1 billion videos with 5 thumbnails each, that is 5 billion files. Most filesystems limit inodes to 100-500 million. Even with directory sharding, random reads across billions of files are slow because the filesystem's metadata cache cannot hold all inode entries.

WRONG: Store thumbnails as individual files in /data/thumbs/video_id/. With 5 billion files, the filesystem hits its inode limit. Directory listings with millions of entries take seconds. Random reads require disk seeks because the VFS cache cannot hold metadata for 5 billion files. Latency per thumbnail: 50-200ms.

RIGHT: We use Bigtable with row key = video_id and column qualifiers = thumb_1 through thumb_5. Bigtable handles billions of rows with sub-10ms random reads. Each thumbnail (5 KB) fits within Bigtable's ideal value size. Total storage:

5B \times 5\text{ KB} = 25\text{ TB}

. Alternatively, we can pack multiple thumbnails into a single object in S3 and use byte-range requests.

No Resumable Upload for Large Files

CommonFORMULA

We chose a resumable chunked protocol (not standard multipart/form-data) for uploads because any network interruption during a multi-gigabyte upload forces a restart from byte zero. A 5 GB upload on a 50 Mbps connection takes 13 minutes. Without resume, 3 failures and the user gives up. Trade-off: resumable uploads require per-session state tracking on the server, but the maximum wasted transfer drops from the full file size to 10 MB.

Why: Standard HTTP file upload (multipart/form-data) works for small files. Candidates use the same approach for 5 GB videos. Mobile uploads on spotty connections drop frequently. Without resume, the user uploads 4.9 GB, the connection drops, and they start over.

WRONG: Accept the entire video in a single HTTP POST. A 5 GB upload on a 50 Mbps connection takes 13 minutes. The connection drops at 90% complete (4.5 GB transferred). The entire upload restarts from byte 0. On mobile networks with intermittent connectivity, the failure rate exceeds 30% for files over 1 GB.

RIGHT: We implement a resumable upload protocol: split the file into 5-10 MB chunks on the client. The server tracks the last successful byte offset. On reconnection, the client queries the offset and resumes from there. Maximum wasted transfer per interruption: 10 MB instead of the full file size. Google, Vimeo, and Tus.io all use this pattern.

Same CDN Strategy for All Videos

CommonFORMULA

We chose a popularity-tiered CDN strategy (not uniform caching for all videos) because the top 1% of videos get 80% of views (power-law distribution). Pushing all 1 billion videos to every edge POP is impossible (the storage cost exceeds the CDN budget). Relying only on origin pull for viral videos triggers thundering herd cache misses. Trade-off: tiered caching adds complexity in the form of popularity tracking and promotion logic, but it matches CDN resources to actual demand.

Why: Candidates say 'use CDN' as a one-size-fits-all answer. But the top 1% of videos get 80% of views. Conversely, relying only on origin pull for viral videos means the first million viewers all trigger cache misses, overwhelming the origin.

WRONG: Apply the same origin-pull TTL for every video. A viral video with 10 million views/hour triggers cache misses at every edge POP simultaneously during the first minute. The origin receives a thundering herd: 200+ POPs each requesting the same segment at once. The origin saturates at 50 Gbps and segments start timing out.

RIGHT: We segment the catalog by popularity. Push pre-warm for the top 1% (viral and trending videos): replicate segments to all edge POPs before traffic arrives. Origin pull with long TTL (24h) for the middle 19% (steady traffic). Origin pull with short TTL (1h) for the long tail (80% of videos, rarely watched). We use request coalescing at the edge so multiple simultaneous misses for the same segment result in a single origin fetch.

Real-Time View Counting on Every Request

CommonFORMULA

We chose Kafka async ingestion with batch MySQL updates (not synchronous per-view database increments) because at 46K views/sec, InnoDB row-level locking on a hot video row serializes all writes. We use Redis INCR for the real-time display and batch the MySQL authoritative count every 30 seconds. Trade-off: the MySQL count lags by up to 30 seconds, but viewers see a real-time number from Redis and the database avoids write hotspot meltdown.

Why: The simplest implementation is UPDATE videos SET view_count = view_count + 1 WHERE video_id = ?. This works at 100 views/sec. At 46K views/sec with hot videos receiving thousands of concurrent increments, every view waits in a lock queue, and p99 write latency spikes from 5ms to 500ms+.

WRONG: Execute UPDATE view_count + 1 on every video view. A viral video gets 10K views/sec. Each UPDATE acquires a row lock. 10K transactions/sec contend for the same row. Lock wait timeouts cascade. The MySQL instance's write throughput drops for all videos, not the hot one alone. Read queries also slow down because pending writes hold locks.

RIGHT: We write each view event to a Kafka topic (fire-and-forget, sub-1ms). A stream consumer aggregates counts per video_id over a 30-second tumbling window and issues a single batch UPDATE per video per window. Real-time display uses a Redis INCR counter (handles 100K+ ops/sec per shard). The database sees

\text{unique videos viewed} / 30\text{s}

writes instead of 46K/sec.