If you were asked “How would you deal with scalability” during an interview or as part of designing a data-intensive system, then this post is for you.

Magic scaling sauce

Scalability is an important mechanism when it comes to design a data-intensive systems. There is no one answer on how to scale a system. However, there are important parameters that indicate when it is time to scale (whether scaling-up or scaling-out).

Scaling-up: Adding better spec. e.g. CPU/RAM…
Scaling-out: Adding more running instance of the scaled component.

Latency and response-time

Many confuse latency for response time. The latency is the time a request waits in the queue in order to be served while response-time is measured by a client and it is the interval between dispatching the request until the response is received and parsed.

When a user dispatches a request to retrieve a Facebook feed, many back-end calls take action. Some are processed quickly while other calls take longer. Response-time measures the total time while latency shows higher metrics.


The latency set is 10ms, 20ms, and 130ms while the response time is 130.

Measuring latency


Calculates the arithmetic mean of a set of requests by dividing the sum of the values in the set by their number. This metric is common but not accurate as it doesn’t reflect how many requests experienced high latency.


Calculated by taking a list of requests and sorting them from the fastest to the slowest. The median (a.k.a p50) is the halfway point in the list where half of the responses fall below. For example, if the median value of 300 requests is 200ms, then half of the requests (150 requests) returned in less than 200ms.

The median is a good metric as it tells how long the users typically wait. To better figure-out the latency outliers, higher percentiles are considered. e.g. p95, p90, p99, p99.9. etc..

For example, if the p99.9 of service is 200ms, then 99.9% of the requests returned in less than 200ms.

Usually, percentiles are used in SLAs of the service (service level agreement). e.g. if the 99.9th percentile of the service is passes the 500ms limit, then the SLA is not met and the customers will demand a refund.