2. I/O

2.1. fio

with command line options

fio --minimal              \
    --ramp_time=1          \
    --runtime=3            \
    --prio=0               \
    --prioclass=1          \
    --name=sread           \
    --readwrite=read       \
    --blocksize=1k         \
    --direct=1             \
    --numjobs=2            \
    --thread               \
    --ioscheduler=deadline \
    --filename=/dev/sda

with config file

cat <<EOF > sread.fio
; -- start job file --

[sread]
ramp_time=1 
#loops=2 
runtime=5
prio=0 
prioclass=1 
name=sread 
readwrite=read 
blocksize=4k
direct=1 
numjobs=32
thread 
ioscheduler=deadline 
filename=/dev/sda

; -- end job file --
EOF

Run fio against the config file and grab the total IOPS:

fio --minimal sread.fio | cut -d';' -f8 | awk '{ sum+=$1 } END { print sum }'

See http://www.bluestop.org/fio/HOWTO.txt

2.2. ftrace

script:

#!/usr/bin/ksh

dir=/sys/kernel/debug/tracing

echo function > $dir/current_tracer
echo 4096     > $dir/buffer_size_kb
echo 1        > $dir/options/latency-format
echo $$       > $dir/set_ftrace_pid
echo 1        > $dir/tracing_on

exec dd if=/dev/sda of=/dev/null bs=512 count=1 iflag=direct

command:

ksh ftrace.sh; echo 0 > /sys/kernel/debug/tracing/tracing_on

look at the trace:

less -FX /sys/kernel/debug/tracing/trace

Get some particular kernel function trace:

buf=""

for f in SyS_exec SyS_read submit_bio __elv_add_request     \
         queue_unplugged  dio_await_completion io_schedule  \
         __switch_to finish_task_switch dio_bio_complete    \
         dio_complete touch_atime fsnotify SyS_exit_group
do
  buf="$buf: $f|"
done

re=$(print "$buf" | sed 's/|$//') 

grep -E "$re" /sys/kernel/debug/tracing/trace
      
      dd-2013    2.... 60685us+: SyS_execve <-sysenter_do_call
      dd-2013    3.... 101504us+: SyS_read <-sysenter_do_call
      dd-2013    3.... 101799us+: submit_bio <-__blockdev_direct_IO
      dd-2013    3d... 101926us+: __elv_add_request <-blk_flush_plug_list
      dd-2013    3d... 101945us+: queue_unplugged <-blk_flush_plug_list
      dd-2013    3.... 102216us+: dio_await_completion <-__blockdev_direct_IO
      dd-2013    3.... 102225us+: io_schedule <-dio_await_completion
      dd-2013    3d... 102438us!: __switch_to <-__schedule
      dd-2013    0d... 102958us+: finish_task_switch <-__schedule
      dd-2013    0.... 103012us+: dio_bio_complete <-dio_await_completion
      dd-2013    0.... 103105us+: dio_complete <-__blockdev_direct_IO
      dd-2013    0.... 103147us+: touch_atime <-generic_file_aio_read
      dd-2013    0.... 103170us+: fsnotify <-vfs_read
      dd-2013    0.... 114779us+: SyS_exit_group <-sysenter_do_call

2.3. btrace

Live monitoring

blktrace -d /dev/sda -o - | blkparse -i -

or

btrace /dev/sda

See http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html

Trace example:

dd if=/dev/sda of=/dev/null bs=512 count=1 iflag=direct
  8,0    0        1     0.000000000  2579  Q   R 0 + 1 [dd]
  8,0    0        2     0.000044978  2579  G   R 0 + 1 [dd]
  8,0    0        3     0.000055035  2579  P   N [dd]
  8,0    0        4     0.000066489  2579  I   R 0 + 1 [dd]
  8,0    0        5     0.000071797  2579  U   N [dd] 1
  8,0    0        6     0.000082692  2579  D   R 0 + 1 [dd]
  8,0    0        7     0.001489854     0  C   R 0 + 1 [0]

Explaination:

Q - queued
    This notes intent to queue io at the given location.
    No real requests exists yet.
G - get request
    To send any type of request to a block device,
    a struct request container must be allocated first.
P - plug
    When io is queued to a previously empty block device queue,
    Linux will plug the queue in anticipation of future ios being
    added before this data is needed. 
I - inserted
    A request is being sent to the io scheduler for addition to
    the internal queue and later service by the driver.
    The request is fully formed at this time. 
U - unplug
    Some request data already queued in the device, start sending
    requests to the driver. This may happen automatically if a timeout
    period has passed (see next entry) or if a number of requests
    have been added to the queue.
D - issued
    A request that previously resided on the block layer queue or
    in the io scheduler has been sent to the driver.
C - complete
    A previously issued request has been completed. The output will
    detail the sector and size of that request, as well as the
    success or failure of it.

Function trace

Q trace_block_bio_queue    -> generic_make_request_checks    blk-core.c
G trace_block_getrq        -> __get_request                  blk-core.c
P trace_block_plug         -> blk_queue_bio                  blk-core.c
I trace_block_rq_insert    -> __elv_add_request              elevator.c
U trace_block_unplug       -> queue_unplugged                blk-core.c
D trace_block_rq_issue     -> blk_peek_request               blk-core.c
C trace_block_rq_complete  -> blk_update_request             blk-core.c
      dd-2589    1.... 157357us+: SyS_read <-sysenter_do_call
Q     dd-2589    1.... 158025us+: generic_make_request_checks <-generic_make_request
Q     dd-2589    1d... 158082us+: acpi_pm_read <-ktime_get

G     dd-2589    1d... 158129us+: get_request <-blk_queue_bio
G     dd-2589    1d... 158178us+: acpi_pm_read <-ktime_get

P     dd-2589    1.... 158189us+: init_request_from_bio <-blk_queue_bio
P     dd-2589    1d... 158222us+: acpi_pm_read <-ktime_get

I     dd-2589    1d... 158277us+: __elv_add_request <-blk_flush_plug_list
I     dd-2589    1d... 158286us+: acpi_pm_read <-ktime_get

U     dd-2589    1d... 158322us+: queue_unplugged <-blk_flush_plug_list
U     dd-2589    1d... 158332us+: acpi_pm_read <-ktime_get

D     dd-2589    1d... 158363us+: blk_peek_request <-scsi_request_fn
D     dd-2589    1d... 158396us+: acpi_pm_read <-ktime_get

2.4. IO Scheduler

todo