Wednesday, February 17, 2010

Basic Performance Analysis Concepts for Transaction Processing Systems

A transaction processing system can be thought of as a collection of queuing points where requests get in and start queuing for various services (CPU, disk...).
This document summarizes and applies some of the well known formulas in queuing theory to a transaction processing system.

The following terms are used throughout this document:
  • S = Request service time - the amount of time a request utilizes a service
  • R = Request response time - the time it takes for a request to be served (include queuing time and service time)
  • X = System/Resource throughput - the amount of requests per second processed by the system/resource
  • N = The number of requests in the system (it can have different meaning depending on the context where it is applied, for instance N could be the number of users using a system, the number of active users using the system or the number of threads in the system)
Little's Law
Little's Law adapted to a transaction processing systems is

N = X x R

where
  • N - the average number of requests in the system
  • X - the average system throughput in requests per second
  • R - the average response time for a request

This equation can be applied to all layers of a the system and it is very valuable to a performance analyst.
You can use Little's Law to (among others):

Validate test results
If, at a certain layer, your performance tool reports a throughput X=100 requests per seconds and you only had N=10 threads configured in the thread pool (min=max=10), you know that the average response time should be R=10/100 seconds=0.1 seconds.

Calculate the throughput generated by a set of users with think times
Let's consider a system with 10,000 users, with an average think time of 10 seconds and an average response time of 0.5 seconds.
Applying the law at a the most outer layer of the system, where

N=the number of users=10,000, and
R=average think time+application response time=10.5 seconds

we get a value of X=10,000/10.5 requests/second=~952 requests/second

Calculate the response time when the user population increases for a saturated system
Assuming that the per user overhead caused by stateful data is negligible, a well tuned system can handle any number of users without suffering performance degradations (in the sense that it delivers the same throughput).
However the response time from the user's perspective increases linearly once the system reached saturation (working at maximum throughput). Using Little's Law you can calculate the average response time for an arbitrary user population when knowing the saturation throughput of the system under test and the response time at the edge of the system (service time).

Let's consider a system with a saturation throughput of 1000 requests/second and a service time of 0.1 seconds. If we have a user population of 100,000 with a 50 seconds average think time, their activity results in 100,000/50=2000 requests/second which is twice the maximum throughput our system can handle. If the system is tuned well, when running the 100,000 users against it, the throughput generated by them has to be 1000 requests/second which implies, by applying Little's Law, that the response time + think time has to be 100 seconds. This means that the response time as perceived by the user is 50 seconds (100-think time(50)) out of which 49.9 seconds are spent waiting in the system's queue.

If you apply again Little's Law but this time on the system queue itself you find that 1000x49.9=49,900 requests are queued at any one time. Knowing the wait time and the queue size allows you to tune your system accordingly.

Resource Utilization Formulas

The utilization related formulas seem to be derivatives(by applying the right maths) of Little's Law.

One formula for utilization (U) is

U = S x X

where:
  • S - the resource service time in seconds
  • X - the throughput in requests/second

Calculating resource service time
Resource (CPU, disk...) service times are useful statistics for performance regression testing (assuming the same type of resource between tests) as they are normalized with respect to the the number of resources and their utilization.
Let's consider the example of a set of tests run on the same type of CPU:

Test1 where the throughput is 1000 requests/second and the CPU utilization was on average 50% for each of the 8 CPUs.
Test2 where the throughput is 100 requests/second and the CPU utilization was on average 30% for each of the 4 CPUs.

By calculating the CPU service time for each requests we can compare the performance of the application in the two scenarios.

Applying this law for one CPU we get:

S=U/(X/N)=U x N/X where N is the number of CPUs. So for

Test1: S=0.5 x 8/1000=0.004 seconds
Test2: S=0.3 x 4/100=0.012 seconds

So for Test2 the service demand for CPU increased three times compared to Test1.

Another version of the utilization formula is:

R = S/(1-U)

where:
  • R - the response time for a request
  • S - the service time for a request
  • U - the resource utilization
This last formula I don't really use much but it is very useful in helping people understand the relationship between resource/system utilization and the response time.


The chart above shows the relationship between response time and utilization when the service time is 1 second. As you can see the response time start deteriorating rapidly as the utilization goes over 80%.

Amdahl's Law

This is another relatively intuitive law (see Wikipedia's entry)

I = 1/((1-P) + P/N)

where:
  • I - improvement (how much faster the processing will be)
  • P - the percentage of processing that can be parallelized
  • N - the number of streams
Since a picture is worth a thousand words here is below a chart that shows the improvement when P is 40%.


Amdahl's Law is very helpful when:

You need to parallelize a piece of processing in order to improve performance. Knowing the amount of processing that can be parallelized, Amdahl's Law helps you calculate the maximum improvement that can be made.

MI = 1/(1-P)

where:
  • MI - maximum improvement (how much faster the processing can be)
  • P - the percentage of processing that can be parallelized
For example if you can parallelize 20% of the processing, the maximum improvement is going to be MI=1/(1 - 0.2) = 1.25 times faster

You need to calculate what is the overall improvement to a system once you optimized only parts of it. You know the percentage of the whole processing time that each part takes. Amdahl's Law becomes:

I = 1/(SUM(Pi/Si))

where:
  • I - the improvement
  • Pi - the percentage of processing that was improved by a factor of Si
  • Si - the improvement factor for Pi

4 comments:

  1. Nice post, Daniel. I am in the process of building a capacity planning model based on Queue Networks and Little's Law. One thing I am currently thinking of is how to model pools (thread pool, connection pool, etc.) in a system, since the pool might limit the number of requests that actually make it into a subsystem, and it would be useful to calculate an optimal pool size. One possibility would be to consider Little's Law to the pool, with N being the pool size, and X being the throughput of the system the pool is front-ending, so you'd get how long a request is in the pool. The alternative is to have X as before, and have R as the time the request takes to be completed (so if it is a DB connection pool, then R is what it would take for the DB call to finish.) so you'd then be able to get at what the pool size ought to be.

    Your thoughts?

    -Srinivas

    ReplyDelete
  2. Hi Srinivas,
    When it comes to thread pools, their size is usually dictated by the optimum number of threads - for which the system provides the maximum throughput. For connection pools, the size can be calculated, as you mentioned yourself, based on throughput (X) and the average time a connection from the pool is in use - that would be a good starting point, you need to keep an eye on the time clients wait to get a connection from the pool...
    Daniel

    ReplyDelete
  3. Thanks. I am not entirely certain about how to calculate the amount of wait time for clients to get a connection from the pool - I read that there is a way to do this using Erlang series; would you have ideas around this?

    -Srinivas

    ReplyDelete
  4. Hi Srinivas,
    In general, some of the mathematics of queuing theory is above my head - as a software engineer I'm, in many cases, just happy to use the distilled information.
    However, to answer your question, I believe the average wait time for clients is a function of the utilization of the connection pool. I would use the utilization formula to get the average response time. Substracting from the response time the average service time should give you the average wait time.
    Regards,
    Daniel

    ReplyDelete