Besides load, we have to understand the performance of the system before trying to scale the application. We use performance metrics. These metrics vary in different systems. In batch processing systems, it is throughput (the # of records the system can process per second); in online systems, it’s the response time (the time between request and response, not latency).
When we take the response time, measuring it is simple: put a timer between the time the request is made and the response is received. Yet, talking about response time demands something else. We cannot only look at one single response time. Because each time we make a request, the response might arrive at different times for various reasons: network delay, database crash, garbage collection pause, package lost in the network, waiting for another system’s response, etc. That’s why we cannot take one request-response pair and define the performance. We need to gather a lot of pairs in a certain timeframe and analyze the response time from there by using percentiles.
When we talk about average performance, we usually talk about the arithmetic mean. The average is rarely useful because it doesn’t say anything about how many users are experiencing delays. That’s why we use percentiles.
Understanding percentiles is crucial because they are the required elements of Service Level Agreements (SLAs)—the contracts that define the expected performance and availability of a service—and Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
To calculate average performance (the arithmetic mean), we get all our response times in a certain period. Let’s say we have 100 requests in a second. We order response times from fastest to slowest. When we draw a line in the 50th response time in the ordered list, that is our median response time. If the 50th item is 100ms, it means 50% of our response times are faster than 100ms, and 50% of them are slower than 100ms. The arithmetic mean is called the 50th percentile—p50. Be careful: percentiles are not percentages. Percentile represents a single entry in the ordered list.
Let’s take the 99th item in our previous list as another example. If it is 200ms, then we can say that the 99th percentile (often referred to as p99) will be 200ms. That means 99% of the response times will be faster than 200ms. Amazon uses the 99.9th percentile (p999) in its internal systems. That means 1 out of 1000 response times will be above a certain threshold that they defined.
Now that we understand how we can describe performance and load, we can define SLAs in our systems and track them properly. But still, how can we approach handling the increasing load and prevent breaking our SLAs?
- Source(s): Designing Data-Intensive Applications by Martin Kleppmann