Back-of-the-envelope estimation involves making quick, rough calculations to estimate system capacity or performance early in the design process. Though informal, it can offer valuable insights into whether a system can meet its expected demands.

DNS

Why Use Back-of-the-Envelope Estimation?


Prerequisites for Effective Estimation

Before performing a back-of-the-envelope estimation, it's essential to gather key information that will impact the accuracy of your calculations.

1. Understand the System Architecture
2. Identify the Critical Metrics
3. Determine the User Behavior
4. Consider the Data Size
5. Estimate the Processing Power
6. Evaluate Network Requirements
7. Plan for Scalability

Power of Two Table: From Byte to Petabyte

Unit Power of 2 Bytes Conversion Description
Byte (B) 20 1 - The smallest unit of digital information storage.
Kilobyte (KB) 210 1,024 1 KB = 210 B 1,024 bytes, often used to measure small text files or low-resolution images.
Megabyte (MB) 220 1,048,576 1 MB = 210 KB 1,024 kilobytes, commonly used to measure medium-sized files like images or MP3s.
Gigabyte (GB) 230 1,073,741,824 1 GB = 210 MB 1,024 megabytes, used for larger files such as videos or software applications.
Terabyte (TB) 240 1,099,511,627,776 1 TB = 210 GB 1,024 gigabytes, often used for data storage in hard drives and databases.
Petabyte (PB) 250 1,125,899,906,842,624 1 PB = 210 TB 1,024 terabytes, typically used in large-scale data centers and cloud storage.

Latency Numbers Every Programmer Should Know

Time Unit Seconds (s) Microseconds (μs) Nanoseconds (ns)
1 Nanosecond 10-9 s 10-3 μs 1 ns
1 Microsecond 10-6 s 1 μs 1,000 ns
1 Millisecond 10-3 s 1,000 μs 1,000,000 ns
Operation Latency Power of 10 Description
L1 Cache Access ~0.5 ns 10-9 seconds Accessing data from the L1 cache, the fastest and closest storage to the CPU cores.
L2 Cache Access ~7 ns 10-9 seconds Accessing data from the L2 cache, which is slightly slower but larger than L1 cache.
RAM Access ~100 ns 10-7 seconds Accessing data from the main memory (RAM), which is slower than the CPU caches.
SSD Random Read ~150 µs 10-6 seconds Random read access from an SSD, faster than HDD but slower than RAM.
HDD Random Read ~10 ms 10-3 seconds Random read access from a traditional hard disk drive (HDD), significantly slower than SSD.
Network Round Trip (within data center) ~500 µs 10-6 seconds Time taken for a round trip within a data center, typically involving multiple switches.
Network Round Trip (between data centers) ~150 ms 10-3 seconds Time taken for a round trip between geographically distant data centers.
Read 1 MB from SSD ~1 ms 10-3 seconds Time to read 1 MB of sequential data from an SSD, faster than random reads.
Read 1 MB from HDD ~20 ms 10-3 seconds Time to read 1 MB of sequential data from an HDD, slower than SSDs.
Send 1 KB over 1 Gbps Network ~10 µs 10-6 seconds Time to send 1 KB of data over a 1 Gbps network, not including additional network overhead.

Twitter QPS and Storage Estimation

Assumptions
QPS Estimate
Description Calculation Result
Daily Active Users (DAU) 300 million * 50% 150 million
Tweets QPS 150 million * 2 tweets / 24 hours / 3600 seconds ~3500
Peak QPS 2 * QPS ~7000

Media Storage Estimation
Description Calculation Result
Average Tweet Size tweet_id (64 bytes) + text (140 bytes) + media (1 MB) 1 MB
Daily Media Storage 150 million * 2 tweets * 10% * 1 MB 30 TB per day
5-Year Media Storage 30 TB * 365 days * 5 years ~55 PB

Estimation of Image Results Page Generation

Assumptions:
Serial Processing:

In serial processing, each thumbnail is generated one after the other. The total time is the sum of the time taken to generate each thumbnail.

Total Time (Serial) = Time per Image * Number of Images

Total Time (Serial) = 100 ms * 30 = 3000 ms = 3 seconds

Parallel Processing:

In parallel processing, all thumbnails are generated simultaneously. The total time is essentially the time taken to generate a single thumbnail, as all are done concurrently.

Total Time (Parallel) = Time per Image

Total Time (Parallel) = 100 ms = 0.1 seconds

Estimation Summary:
Processing Method Total Time
Serial Processing 3 seconds
Parallel Processing 0.1 seconds