YouTube System Design-1

Hello friends,

Hope you’re doing well. In today’s blog, let’s go though the system design of YouTube.

In first part, we shall talk about the capacity estimation of upload engine of YouTube.

Storage Estimation (Per Day):

Let’s assume number of users – 1B

No of users uploading videos = 1B / 1000 [1 in 1000]= 1M

Avg length of video = 10 mins

Total upload mins = 1M * 10= 10^7 mins

Avg 1 hours video = 2 GB, You tube video of same duration = 2 GB/ 10 [90% compression is size] = 200 MB

Avg Size of 1m video = 200 MB/60 = ~3 MB

Total uses/ min = 10^7 * 3 M = 30 TB

Assuming 3 copies for fault tolerance and redundancy (Perf.) = 30 TB * 3 = 90 TB

Assuming we have multiple resolution such as 720p, 480p, 360p, 240p then size almost doubles

 = 90 TB * 2 = 180 TB = ~0.2 PB

Cache Req for video metadata:

If Original Thumbnail image is 1 MB then, thumbnail image can be 10% of that, 10 length * 10 width = 100 times less than original then 1 MB/ 100 =  10 KB

Let’s assume we want to cache 90 days video then 10 KB * 90 days * 1M [number of users uploading videos/ day identified in previous section]

 = 10^9  KB = 1 TB RAM

1 TB RAM in 1 machine is hard and hence for a 16 GB computer we need  1000 GB/ 16 GB = 63 nodes

Assuming 3 copies for fault tolerance and 2 copies redundancy = 63 * 6  = 378 = 378 = ~400 nodes [Each 16 GB]

No of processors:

10^7 mins videos uploaded in a day [identified in 1st section]

We need to come to the unit of how many mb processed in a sec so let’s first convert in an hour

1000/ 60 * 10^4 = ~16 * 10^4 hour

Assuming 1 hour of unprocessed video has avg size 1 GB then 16 * 10^4 GB would be processed in a day

Now let’s convert day to a sec unit,  so (16 * 10^4) / 24* 60 * 60 = (16 * 10^4) / 86400 = 1.85 GB/ sec = 1850 mb/ sec

Considering 3 copies and different resolution this can be bumped to 1850 * 3 * 2 = 11100 mb/ sec

Assuming Read takes around 10 ms + Processing around 20 ms [for locks and index updates] + Write also 20 ms

so it’s 50 ms (milli second) to process 1 MB of data = 50/1000 sec to process 1 MB of data = 0.05 sec per mb

11100 * 0.05 sec / sec = 555 sec of work to do per sec so it means we need ~555 processors to do 555 sec of work parallelly.

Note:

The number arrived for storage, cache/ram and processors are based on certain assumptions. If assumptions are changed, the numbers shall be adjusted accordingly.

Ref: https://interviewready.io/


Posted

in

by

Comments

Leave a comment