Do not trust vmstat IOwait numbers

I’ve been running a benchmark today on my old test box with conventional hard drives (no raid with BBU) and noticed something unusual in the CPU utilization statistics being reported.

The benchmark was run like this:

sysbench --num-threads=64 --max-requests=0 --max-time=600000 --report-interval=10 --test=oltp --db-driver=mysql --oltp-dist-type=special  --oltp-table-size=1000000   --mysql-user=root --mysql-password=password  run

Which means: create 64 threads and hammer the database with queries as quickly as possible. As the test was run on the localhost I would expect the benchmark to completely saturate the system – being either using CPU or being blocked on IO nature of this benchmark so it does not spend a lot on database locks, especially as this system has just 2 cores.

Looking at VMSTAT however I noticed this:

[root@smt1 mysql]# vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
13  0      0 880156  40060 2140824    0    0    12   750  895 1045 32  8 54  6  0
 0  0      0 877116  40060 2141312    0    0     0  1783 2185 23112 44 10 41  5  0
15  0      0 872648  40068 2141960    0    0     0  1747 2204 25743 41 11 46  2  0
 0  0      0 868056  40068 2142604    0    0     0  1803 2164 26224 40 11 44  5  0
17  1      0 863216  40068 2143160    0    0     0  1875 1948 23020 36  9 50  5  0
 0  0      0 858384  40168 2143656    0    0     0  1063 1855 21116 32  9 45 14  0
23  0      0 855848  40176 2144232    0    0     0  1755 2036 23181 36 10 48  6  0
49  0      0 851248  40184 2144648    0    0     0  1679 2313 22832 45 10 40  5  0
10  0      0 846292  40192 2145248    0    0     0  1911 1980 23185 36  9 50  4  0
 0  0      0 844260  40196 2145868    0    0     0  1757 2152 26387 39 11 45  5  0
 0  3      0 839296  40196 2146520    0    0     0  1439 2104 25096 38 10 50  1  0

Looking at last few columns we have IO wait typically being in single digit percents where the “idle” CPU floats in 40-50% range. You can also see it visually in the graph from Percona Cloud Tools
CPU Usage during the test

So the system should be completely saturated with the load but instead we have 50% of CPU idle and iowait is typically in single digits?

Running SHOW PROCESSLIST can give us some good idea in this case:

mysql> show processlist;
+-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+
| Id  | User          | Host            | db     | Command | Time | State          | Info                                                               | Rows_sent | Rows_examined | Rows_read |
+-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+
|   1 | root          | localhost:39624 | NULL   | Sleep   |    0 |                | NULL                                                               |         0 |             0 |         0 |
| ...
|  32 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  33 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  34 | root          | localhost       | sbtest | Execute |    0 | Sorting result | SELECT c from sbtest where id between 365260 and 365359 order by c |         0 |             0 |         0 |
|  35 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  36 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  37 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  38 | root          | localhost       | sbtest | Execute |    0 | Writing to net | DELETE from sbtest where id=496460                                 |         0 |             1 |         1 |
|  39 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
...
|  89 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  90 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
|  91 | root          | localhost       | sbtest | Execute |    0 | NULL           | COMMIT                                                             |         0 |             0 |         0 |
| 268 | root          | localhost       | NULL   | Query   |    0 | NULL           | show processlist                                                   |         0 |             0 |         0 |
+-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+
68 rows in set (0.00 sec)

So most of the queries are stalling in COMMIT stage. Checking setting we can see:

mysql> select @@innodb_flush_log_at_trx_commit;
+----------------------------------+
| @@innodb_flush_log_at_trx_commit |
+----------------------------------+
|                                1 |
+----------------------------------+
1 row in set (0.00 sec)

Which means Percona Server is running in the fully ACID mode, which with slow hard drives without BBU means commit is causing syncing data to the disk which is very expensive.

So we figured out what is really happening… yet if we’re really waiting on disk a lot why iowait does not show that?
To answer that lets see how IOwait is really computed.

IOwait is type of Idle time (and in earlier Linux versions it wasincluded in Idle) – when we have the process/thread which would be runable if it would not be waiting on IO we consider this iowait, if it is not runable because of something else (waiting on mutex, network etc) it is considered idle.

Writes to Innodb log file are serialized – so only one process will be doing log write (and sync) while all others waiting to be able to commit will be waiting on some form of synchronization mechanism to be notified when the commit is complete as part of group commit.

What is the take away? Take iowait% numbers from vmstat/iostat with grain of salt. It is possible there is much more to gain than faster IO subsystem (or proper configuration) than iowait numbers show. Consider the iowait%+idle% of the CPU as “cpu available” which could be potentially put to the good use with improving IO subsystem and minimizing contention

To illustrate that lets take a look at the CPU graph again – now as I’ve set innodb_flush_log_at_trx_commit=2

CPU Usage graph after optimization

As we can see from the graph we have reclaimed much more than 6-7% of idle time we had – the user CPU usage increased to 70% and idle+iowait is now 10-15%; the throughput of the system has increased about 60% – much more than “eliminating iowait” would allow.

The post Do not trust vmstat IOwait numbers appeared first on MySQL Performance Blog.

Do not trust vmstat IOwait numbers

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List