Using Pressure Stall Information (PSI) to find performance bottleneck
PSI stands for Pressure Stall Information and it’s an alternative to Linux load statistic that gives an insight into how and why your system is busy.
We will use it to identify where is a performance bottleneck in my web application - Moodle. My environment is a single server running Ubuntu 20.04 with PHP, Apache, MySQL and Redis.
I have run a jmeter test that emulates login and forum post by a number of users. While running the tests, I’m capturing load and pressure information:
The results of the first run are:
The load jumps up to 30 and CPU pressure reaches 99%. It means that at some point, 99% of the processing was not happening, because the processes waited for CPU to be available. It’s clearly a bottleneck on the CPU side. The low pressure numbers on the I/O side confirm that (I’m not even showing them on the graph as they are close to 0).
Now - let’s put the data on a slow filesystem. I’m using nbd + trickle for emulating slow storage.
The result is the same:
But this time we see higher utilization of the I/O - pressure number reaches 70%.
Let’s push it further and slow down the storage even more. This time we see a significant drop in the performance (from 2.2 to 0.7 requests per second) .
The CPU pressure is lower and full pressure I/O reaches 99%. This time the bottleneck is clearly on the I/O side.
Note that the information of CPU vs I/O is clearly visible when using PSI. The standard “load” information is not enough to distinguish between those two.
You can see PSI information with tools like “atop”.