Back-of-the-Envelope Estimation

Back-of-the-Envelope Estimation

A back-of-the-envelope calculation is a rough calculation, typically jotted down on any available scrap of paper such as an envelope.

The back-of-envelope calculation is a technique used within software engineering to determine how a system should be designed. The method is most famous from big tech companies and is often expected in system design interviews. The thought is that you should first calculate some rough numbers so that they can drive decisions in designing possible solutions.

The back-of-the-envelope calculation is also known as capacity planning. In general, measuring system performance and performing back-of-the-envelope is important in any deployment to satisfy the service level agreement (SLA). The following are some of the popular reasons to perform back-of-the-envelope calculations:

  • decide whether the database should be partitioned

  • evaluate the feasibility of a proposed architectural design

  • identify potential bottlenecks in the system

The following concepts should be well understood to perform a quick back-of-the-envelope calculation:

  • powers of two's

  • availability numbers

  • latency numbers every programmer should know

Powers of Two

The power of two is a handy reference to perform the back of the envelope.

PowerExact ValueApprox ValueBytes
1010241 Thousand1 KB
201,048,5761 Million1 MB
301,073,741,8241 Billion1 GB
401,099,511,627,7761 Trillion1 TB
501,125,899,906,842,6241 Quadrillion1 PB
No. of zeros in
31 Thousand1 KB
61 Million1 MB
91 Billion1 GB
121 Trillion1 TB
151 Quadrillion1 PB

A byte is a unit of digital information that consists of 8 bits. The data size of the most commonly used data types can be helpful for the back-of-the-envelope.

Data TypeSize (byte)
int/float4
long/double8
character2

Byte Conversions

1 B = 8bits
1 KB = 1000B
1 MB = 1000KB
1 GB = 1000MB

Useful Calculations

x Million users y KB = xy GB

ex: 1M users upload a document of 100KB per day = 100 GB per day.

approx: Million (6 zeros) + KB (3 zeros) = 9 zeros => Billion = GB

x Million users y MB = xy TB

ex: (200M users) (a short video of 2MB per day) = 400TB per day.

approx: Million (6 zeros) + MB (6 zeros) = 12 zeros => Trillion = TB

Availability Numbers

High availability is the ability of a service to remain reachable and not lose data even when a failure occurs. High availability is typically measured in terms of a percentage of uptime over a given period. The uptime is the amount of time that a system or service is available and operational.

Availability (percentage)Downtime per year
993.6 days
99.9952 minutes
99.9995 minutes
99.999931 seconds

In addition to measuring uptime, high availability can also be measured by other metrics such as mean time between failures (MTBF), which measures the average time between system failures, and mean time to repair (MTTR), which measures the average time it takes to restore a system after a failure.

A service-level agreement (SLA) is an explicit or implicit contract between a service provider and the users. The SLA documents the set of services the service provider will offer to the user and defines the service standards the provider is obligated to fulfill.

Useful Conversions

TimeTime (seconds)
1 ns10^-9 seconds
1 µs10^-6 seconds
1 ms10^-3 seconds

Latency Numbers Every Programmer Should Know

The interactive latency tool can be used to visualize the latency of components and operations. The numbers shown are supposed to give a rough idea of the latency of different operations.

OperationTimeUnit
L1 cache reference0.5 nsnano sec
Branch mispredict5 nsnano sec
L2 cache reference7 nsnano sec
Mutex lock/unlock25 nsnano sec
Main memory reference100 nsnano sec
Compress 1K bytes with Zippy10 µsmicro sec
Send 1 KB bytes over 1 Gbps network10 µsmicro sec
Read 4 KB randomly from SSD*150 µsmicro sec
Read 1 MB sequentially from memory250 µsmicro sec
Round trip within the same data center500 µsmicro sec
Read 1 MB sequentially from SSD*1 msmili sec
HDD seek10 msmili sec
Read 1 MB sequentially from 1 Gbps10 msmili sec
Read 1 MB sequentially from the HDD30 msmili sec
Send packet CA->Netherlands->CA150 msmili sec

The latency numbers can be summarized as the following:

  • memory is fast and disks are slow

  • network bandwidth can be saved by using a compression algorithm

  • writes are more expensive than reads

  • globally shared data is expensive

  • at most 7 round trips between inter-region data centers per second are possible

  • approximately 2000 round trips per second can be achieved within a data center

Tips

Back-of-the-envelope estimation is all about the process. Solving the problem is more important than obtaining results. Interviewers may test your problem-solving skills. Here are a few tips to follow:

  • Rounding and Approximation. It is difficult to perform complicated math operations during the interview. For example, what is the result of “99987 / 9.1”? There is no need to spend valuable time to solve complicated math problems. Precision is not expected. Use round numbers and approximation to your advantage. The division question can be simplified as follows: “100,000 / 10”.

  • Write down your assumptions. It is a good idea to write down your assumptions to be referenced later.

  • Label your units. When you write down “5”, does it mean 5 KB or 5 MB? You might confuse yourself with this. Write down the units because “5 MB” helps to remove ambiguity.

  • Commonly asked back-of-the-envelope estimations: QPS, peak QPS, storage, cache, number of servers, etc. You can practice these calculations when preparing for an interview. Practice makes perfect.

Sample Question

After having all the basic knowledge under your belt, you are super ready to solve any question related to capacity estimation. So, let's take a classic question and perform capacity estimation on it.

we will calculate:

  1. QPS (query per second)

  2. storage estimation

  3. cache estimation

  4. number of servers needed

Suppose, there is a famous social networking application and you were asked to do capacity estimation then what would you do?

The first step you should do is to make some assumptions because you do not have the exact metrics of any social network app.

Let's assume:

Total users of social networking applications: 1 Billion

DAU (Daily active users): 25% of total users = 250 Million users

No. of posts per user per day = 2

10% of users post image = 250 Million * 10% = 25 Million

we will store media for 5 years

QPS Estimation

DAU = 250 Million

on average every user sends a read query 5 times a day and a write query 2 times a day

total query made by a user = 7 queries per day

total queries made by active users = 250 Million * 7 queries per day

total queries made by active users per sec = (250 Million * 7 query per day) / 24 hr x 60 mins x 60 secs

total QPS = ~18, 000 QPS

Storage Estimation

every user posts twice a day

Text Post

1 post has 250 characters

total size of 1 post = 250 character x 2 bytes = 500 bytes

total size of 2 posts = 2 x 500 bytes = 1000 bytes = 1 kb

total storage required = 250 million x 1 kb = ~250 GB per day

Image Post

10% of users post image

total users = 10% of 250 Million = 25 Million

size of 1 image = 300 kb

total storage required = 25 Million x 300 kb = 7500 GB = ~ 8 TB per day

we need to store posts for 5 years

no. of days in 5 years = 5 years x 365 days = ~2000 days

total storage required for text post = 250 GB x 2000 days = ~500 TB

total storage required for image post = 8 TB x 2000 days = ~16 PB

Cache Estimation

we keep 5 recent posts of the user in the cache

size of 1 post = 500 bytes

size of 5 post = 5 x 500 bytes = 2500 bytes = ~3 KB

total cache required = 250 Million x 3 KB = ~750 GB

let's say 1 machine has 75 GB of ram

total machine required = 750 GB / 75 GB = 10 Machine

No. of Servers Estimation

let's say our server is 99.999% available

we are processing 1 request per 500 ms

it means 2 requests are processed in 1 sec

we are getting ~18, 000 QPS

we have a server that has 50 threads

50 threads can serve 100 requests per sec

total servers required = 18,000 / 100 = 180 servers


Credits / References:

  1. https://www.youtube.com/watch?v=WZjSFNPS9Lo

  2. https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation

  3. https://systemdesign.one/back-of-the-envelope/


Thanks for reading

I hope you like reading it

Follow me: linkedin.com/in/imazizbohra