Back-of-the-Envelope Estimation
A back-of-the-envelope calculation is a rough calculation, typically jotted down on any available scrap of paper such as an envelope.
The back-of-envelope calculation is a technique used within software engineering to determine how a system should be designed. The method is most famous from big tech companies and is often expected in system design interviews. The thought is that you should first calculate some rough numbers so that they can drive decisions in designing possible solutions.
The back-of-the-envelope calculation is also known as capacity planning. In general, measuring system performance and performing back-of-the-envelope is important in any deployment to satisfy the service level agreement (SLA). The following are some of the popular reasons to perform back-of-the-envelope calculations:
decide whether the database should be partitioned
evaluate the feasibility of a proposed architectural design
identify potential bottlenecks in the system
The following concepts should be well understood to perform a quick back-of-the-envelope calculation:
powers of two's
availability numbers
latency numbers every programmer should know
Powers of Two
The power of two is a handy reference to perform the back of the envelope.
Power | Exact Value | Approx Value | Bytes |
10 | 1024 | 1 Thousand | 1 KB |
20 | 1,048,576 | 1 Million | 1 MB |
30 | 1,073,741,824 | 1 Billion | 1 GB |
40 | 1,099,511,627,776 | 1 Trillion | 1 TB |
50 | 1,125,899,906,842,624 | 1 Quadrillion | 1 PB |
No. of zeros in | ||
3 | 1 Thousand | 1 KB |
6 | 1 Million | 1 MB |
9 | 1 Billion | 1 GB |
12 | 1 Trillion | 1 TB |
15 | 1 Quadrillion | 1 PB |
A byte is a unit of digital information that consists of 8 bits. The data size of the most commonly used data types can be helpful for the back-of-the-envelope.
Data Type | Size (byte) |
int/float | 4 |
long/double | 8 |
character | 2 |
Byte Conversions
1 B = 8bits
1 KB = 1000B
1 MB = 1000KB
1 GB = 1000MB
Useful Calculations
x Million users y KB = xy GB
ex: 1M users upload a document of 100KB per day = 100 GB per day.
approx: Million (6 zeros) + KB (3 zeros) = 9 zeros => Billion = GB
x Million users y MB = xy TB
ex: (200M users) (a short video of 2MB per day) = 400TB per day.
approx: Million (6 zeros) + MB (6 zeros) = 12 zeros => Trillion = TB
Availability Numbers
High availability is the ability of a service to remain reachable and not lose data even when a failure occurs. High availability is typically measured in terms of a percentage of uptime over a given period. The uptime is the amount of time that a system or service is available and operational.
Availability (percentage) | Downtime per year |
99 | 3.6 days |
99.99 | 52 minutes |
99.999 | 5 minutes |
99.9999 | 31 seconds |
In addition to measuring uptime, high availability can also be measured by other metrics such as mean time between failures
(MTBF), which measures the average time between system failures, and mean time to repair
(MTTR), which measures the average time it takes to restore a system after a failure.
A service-level agreement
(SLA) is an explicit or implicit contract between a service provider and the users. The SLA documents the set of services the service provider will offer to the user and defines the service standards the provider is obligated to fulfill.
Useful Conversions
Time | Time (seconds) |
1 ns | 10^-9 seconds |
1 µs | 10^-6 seconds |
1 ms | 10^-3 seconds |
Latency Numbers Every Programmer Should Know
The interactive latency tool can be used to visualize the latency of components and operations. The numbers shown are supposed to give a rough idea of the latency of different operations.
Operation | Time | Unit |
L1 cache reference | 0.5 ns | nano sec |
Branch mispredict | 5 ns | nano sec |
L2 cache reference | 7 ns | nano sec |
Mutex lock/unlock | 25 ns | nano sec |
Main memory reference | 100 ns | nano sec |
Compress 1K bytes with Zippy | 10 µs | micro sec |
Send 1 KB bytes over 1 Gbps network | 10 µs | micro sec |
Read 4 KB randomly from SSD* | 150 µs | micro sec |
Read 1 MB sequentially from memory | 250 µs | micro sec |
Round trip within the same data center | 500 µs | micro sec |
Read 1 MB sequentially from SSD* | 1 ms | mili sec |
HDD seek | 10 ms | mili sec |
Read 1 MB sequentially from 1 Gbps | 10 ms | mili sec |
Read 1 MB sequentially from the HDD | 30 ms | mili sec |
Send packet CA->Netherlands->CA | 150 ms | mili sec |
The latency numbers can be summarized as the following:
memory is fast and disks are slow
network bandwidth can be saved by using a compression algorithm
writes are more expensive than reads
globally shared data is expensive
at most 7 round trips between inter-region data centers per second are possible
approximately 2000 round trips per second can be achieved within a data center
Tips
Back-of-the-envelope estimation is all about the process. Solving the problem is more important than obtaining results. Interviewers may test your problem-solving skills. Here are a few tips to follow:
Rounding and Approximation. It is difficult to perform complicated math operations during the interview. For example, what is the result of “99987 / 9.1”? There is no need to spend valuable time to solve complicated math problems. Precision is not expected. Use round numbers and approximation to your advantage. The division question can be simplified as follows: “100,000 / 10”.
Write down your assumptions. It is a good idea to write down your assumptions to be referenced later.
Label your units. When you write down “5”, does it mean 5 KB or 5 MB? You might confuse yourself with this. Write down the units because “5 MB” helps to remove ambiguity.
Commonly asked back-of-the-envelope estimations: QPS, peak QPS, storage, cache, number of servers, etc. You can practice these calculations when preparing for an interview. Practice makes perfect.
Sample Question
After having all the basic knowledge under your belt, you are super ready to solve any question related to capacity estimation. So, let's take a classic question and perform capacity estimation on it.
we will calculate:
QPS (query per second)
storage estimation
cache estimation
number of servers needed
Suppose, there is a famous social networking application and you were asked to do capacity estimation then what would you do?
The first step you should do is to make some assumptions because you do not have the exact metrics of any social network app.
Let's assume:
Total users of social networking applications: 1 Billion
DAU (Daily active users): 25% of total users = 250 Million users
No. of posts per user per day = 2
10% of users post image = 250 Million * 10% = 25 Million
we will store media for 5 years
QPS Estimation
DAU = 250 Million
on average every user sends a read query 5 times a day and a write query 2 times a day
total query made by a user = 7 queries per day
total queries made by active users = 250 Million * 7 queries per day
total queries made by active users per sec = (250 Million * 7 query per day) / 24 hr x 60 mins x 60 secs
total QPS = ~18, 000 QPS
Storage Estimation
every user posts twice a day
Text Post
1 post has 250 characters
total size of 1 post = 250 character x 2 bytes = 500 bytes
total size of 2 posts = 2 x 500 bytes = 1000 bytes = 1 kb
total storage required = 250 million x 1 kb = ~250 GB per day
Image Post
10% of users post image
total users = 10% of 250 Million = 25 Million
size of 1 image = 300 kb
total storage required = 25 Million x 300 kb = 7500 GB = ~ 8 TB per day
we need to store posts for 5 years
no. of days in 5 years = 5 years x 365 days = ~2000 days
total storage required for text post = 250 GB x 2000 days = ~500 TB
total storage required for image post = 8 TB x 2000 days = ~16 PB
Cache Estimation
we keep 5 recent posts of the user in the cache
size of 1 post = 500 bytes
size of 5 post = 5 x 500 bytes = 2500 bytes = ~3 KB
total cache required = 250 Million x 3 KB = ~750 GB
let's say 1 machine has 75 GB of ram
total machine required = 750 GB / 75 GB = 10 Machine
No. of Servers Estimation
let's say our server is 99.999% available
we are processing 1 request per 500 ms
it means 2 requests are processed in 1 sec
we are getting ~18, 000 QPS
we have a server that has 50 threads
50 threads can serve 100 requests per sec
total servers required = 18,000 / 100 = 180 servers
Credits / References:
Thanks for reading
I hope you like reading it