Tracking down io problems on your Linux box
I’m sure everyone has had problem at one time or another trying to figure out why your machine is going so slow, but nothing appears to be using the RAM or CPU at all.
The first option is to top ‘top’ and look for the line which has the ‘wa’
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 96.7%id, 3.3%wa, 0.0%hi, 0.0%si, 0.0%st
mine says 3.3%wa – this is the wait time trying to write to disk. Now from there you can install the package (under most distros) called ‘sysstat’
sysstat – sar, iostat and mpstat – system performance tools for Linux
This contains several tools for trying to track down whats using the disk to write lots.
iotop – simple top-like I/O monitor. This is installed and can show you realtime whats writing to disk at any time and using what load
iostat – Report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).
sar – Collect, report, or save system activity information.
If there’s plenty of cache/buffers, and sar -W 1 0 shows lots of zeroes (and possibly the occasional blip) then the disk is getting thrashed, but it’s not swap.
Running iostat -dx 1 will show you all the partitions and how hard they’re working (look at %util). If %util is consistently at or around 100 for any partition of disk, you can definitively say that the disks are getting thrashed.
If the disk has high %util, but the actual throughput (rsec/s and wsec/s) is pretty low, then it’s possible you’ve got a hardware fault or RAID rebuild going on. A hardware error might show up on a smartctl run (smartctl -a /dev/sda or whatever), looking at things like the reallocated sector count, but SMART isn’t real, well, smart, so don’t trust it too much. A RAID rebuild should show up in your RAID management (you are monitoring your hardware RAID setup, aren’t you?). A software RAID rebuild will be shown in /proc/mdstat. (cat /proc/mdstat )