3.4.2. Basic Usage

Oprofile as configured in Yocto is a system-wide profiler (i.e. the version in Yocto doesn't yet make use of the perf_events interface which would allow it to profile specific processes and workloads). It's relies on hardware counter support in the hardware (but can fall back to a timer-based mode), which means that it doesn't take advantage of tracepoints or other event sources for example.

It consists of a kernel module that collects samples and a userspace daemon that writes the sample data to disk.

The 'opcontrol' shell script is used for transparently managing these components and starting and stopping profiles, and the 'opreport' command is used to display the results.

The oprofile daemon should already be running, but before you start profiling, you may need to change some settings and some of these settings may require the daemon not be running. One of these settings is the path the the vmlinux file, which you'll want to set using the --vmlinux option if you want the kernel profiled:

     root@crownbay:~# opcontrol --vmlinux=/boot/vmlinux-`uname -r`
     The profiling daemon is currently active, so changes to the configuration
     will be used the next time you restart oprofile after a --shutdown or --deinit.
            

You can check if vmlinux file: is set using opcontrol --status:

     root@crownbay:~# opcontrol --status
     Daemon paused: pid 1334
     Separate options: library
     vmlinux file: none
     Image filter: none
     Call-graph depth: 6
            

If it's not, you need to shutdown the daemon, add the setting and restart the daemon:

     root@crownbay:~# opcontrol --shutdown
     Killing daemon.

     root@crownbay:~# opcontrol --vmlinux=/boot/vmlinux-`uname -r`
     root@crownbay:~# opcontrol --start-daemon
     Using default event: CPU_CLK_UNHALTED:100000:0:1:1
     Using 2.6+ OProfile kernel interface.
     Reading module info.
     Using log file /var/lib/oprofile/samples/oprofiled.log
     Daemon started.
            

If we get the status again we now see our updated settings:

     root@crownbay:~# opcontrol --status
     Daemon paused: pid 1649
     Separate options: library
     vmlinux file: /boot/vmlinux-3.4.11-yocto-standard
     Image filter: none
     Call-graph depth: 6
            

We're now in a position to run a profile. For that we used 'opcontrol --start':

     root@crownbay:~# opcontrol --start
     Profiler running.
            

In another window, run our wget workload:

     root@crownbay:~# rm linux-2.6.19.2.tar.bz2; wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2; sync
     Connecting to downloads.yoctoproject.org (140.211.169.59:80)
     linux-2.6.19.2.tar.b 100% |*******************************| 41727k  0:00:00 ETA
            

To stop the profile we use 'opcontrol --shudown', which not only stops the profile but shuts down the daemon as well:

     root@crownbay:~# opcontrol --start
     Stopping profiling.
     Killing daemon.
            

Oprofile writes sample data to /var/lib/oprofile/samples, which you can look at if you're interested in seeing how the samples are structured. This is also interesting because it's related to how you dive down to get further details about specific executables in OProfile.

To see the default display output for a profile, simply type 'opreport', which will show the results using the data in /var/lib/oprofile/samples:

     root@crownbay:~# opreport

     WARNING! The OProfile kernel driver reports sample buffer overflows.
     Such overflows can result in incorrect sample attribution, invalid sample
     files and other symptoms.  See the oprofiled.log for details.
     You should adjust your sampling frequency to eliminate (or at least minimize)
     these overflows.
     CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated)
     Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
     CPU_CLK_UNHALT...|
      samples|      %|
     ------------------
       464365 79.8156 vmlinux-3.4.11-yocto-standard
        65108 11.1908 oprofiled
	     CPU_CLK_UNHALT...|
	       samples|      %|
 	     ------------------
 	         64416 98.9372 oprofiled
 	           692  1.0628 libc-2.16.so
        36959  6.3526 no-vmlinux
         4378  0.7525 busybox
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	          2844 64.9612 libc-2.16.so
	          1337 30.5391 busybox
	           193  4.4084 ld-2.16.so
	             2  0.0457 libnss_compat-2.16.so
	             1  0.0228 libnsl-2.16.so
	             1  0.0228 libnss_files-2.16.so
         4344  0.7467 bash
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	          2657 61.1648 bash
	          1665 38.3287 libc-2.16.so
	            18  0.4144 ld-2.16.so
	             3  0.0691 libtinfo.so.5.9
	             1  0.0230 libdl-2.16.so
         3118  0.5359 nf_conntrack
          686  0.1179 matchbox-terminal
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	           214 31.1953 libglib-2.0.so.0.3200.4
	           114 16.6181 libc-2.16.so
	            79 11.5160 libcairo.so.2.11200.2
	            78 11.3703 libgdk-x11-2.0.so.0.2400.8
	            51  7.4344 libpthread-2.16.so
	            45  6.5598 libgobject-2.0.so.0.3200.4
	            29  4.2274 libvte.so.9.2800.2
	            25  3.6443 libX11.so.6.3.0
	            19  2.7697 libxcb.so.1.1.0
	            17  2.4781 libgtk-x11-2.0.so.0.2400.8
	            12  1.7493 librt-2.16.so
	             3  0.4373 libXrender.so.1.3.0
          671  0.1153 emgd
          411  0.0706 nf_conntrack_ipv4
          391  0.0672 iptable_nat
          378  0.0650 nf_nat
          263  0.0452 Xorg
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	           106 40.3042 Xorg
	            53 20.1521 libc-2.16.so
	            31 11.7871 libpixman-1.so.0.27.2
	            26  9.8859 emgd_drv.so
	            16  6.0837 libemgdsrv_um.so.1.5.15.3226
	            11  4.1825 libEMGD2d.so.1.5.15.3226
	             9  3.4221 libfb.so
	             7  2.6616 libpthread-2.16.so
	             1  0.3802 libudev.so.0.9.3
	             1  0.3802 libdrm.so.2.4.0
	             1  0.3802 libextmod.so
	             1  0.3802 mouse_drv.so
     .
     .
     .
           9  0.0015 connmand
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	             4 44.4444 libglib-2.0.so.0.3200.4
	             2 22.2222 libpthread-2.16.so
	             1 11.1111 connmand
	             1 11.1111 libc-2.16.so
	             1 11.1111 librt-2.16.so
            6  0.0010 oprofile-server
     	 CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	             3 50.0000 libc-2.16.so
	             1 16.6667 oprofile-server
	             1 16.6667 libpthread-2.16.so
	             1 16.6667 libglib-2.0.so.0.3200.4
           5 8.6e-04 gconfd-2
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	             2 40.0000 libdbus-1.so.3.7.2
	             2 40.0000 libglib-2.0.so.0.3200.4
	             1 20.0000 libc-2.16.so
            

The output above shows the breakdown or samples by both number of samples and percentage for each executable. Within an executable, the sample counts are broken down further into executable and shared libraries (DSOs) used by the executable.

To get even more detailed breakdowns by function, we need to have the full paths to the DSOs, which we can get by using -f with opreport:

     root@crownbay:~# opreport -f

     CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated)
     Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
     CPU_CLK_UNHALT...|
      samples|      %|

       464365 79.8156 /boot/vmlinux-3.4.11-yocto-standard
       65108 11.1908 /usr/bin/oprofiled
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	         64416 98.9372 /usr/bin/oprofiled
	           692  1.0628 /lib/libc-2.16.so
        36959  6.3526 /no-vmlinux
         4378  0.7525 /bin/busybox
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	          2844 64.9612 /lib/libc-2.16.so
	          1337 30.5391 /bin/busybox
	           193  4.4084 /lib/ld-2.16.so
	             2  0.0457 /lib/libnss_compat-2.16.so
	             1  0.0228 /lib/libnsl-2.16.so
	             1  0.0228 /lib/libnss_files-2.16.so
         4344  0.7467 /bin/bash
	     CPU_CLK_UNHALT...|
	       samples|      %|
	     ------------------
	          2657 61.1648 /bin/bash
	          1665 38.3287 /lib/libc-2.16.so
	            18  0.4144 /lib/ld-2.16.so
	             3  0.0691 /lib/libtinfo.so.5.9
	             1  0.0230 /lib/libdl-2.16.so
     .
     .
     .
            

Using the paths shown in the above output and the -l option to opreport, we can see all the functions that have hits in the profile and their sample counts and percentages. Here's a portion of what we get for the kernel:

     root@crownbay:~# opreport -l /boot/vmlinux-3.4.11-yocto-standard

     CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated)
     Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
     samples  %        symbol name
     233981   50.3873  intel_idle
     15437     3.3243  rb_get_reader_page
     14503     3.1232  ring_buffer_consume
     14092     3.0347  mutex_spin_on_owner
     13024     2.8047  read_hpet
     8039      1.7312  sub_preempt_count
     7096      1.5281  ioread32
     6997      1.5068  add_preempt_count
     3985      0.8582  rb_advance_reader
     3488      0.7511  add_event_entry
     3303      0.7113  get_parent_ip
     3104      0.6684  rb_buffer_peek
     2960      0.6374  op_cpu_buffer_read_entry
     2614      0.5629  sync_buffer
     2545      0.5481  debug_smp_processor_id
     2456      0.5289  ohci_irq
     2397      0.5162  memset
     2349      0.5059  __copy_to_user_ll
     2185      0.4705  ring_buffer_event_length
     1918      0.4130  in_lock_functions
     1850      0.3984  __schedule
     1767      0.3805  __copy_from_user_ll_nozero
     1575      0.3392  rb_event_data_length
     1256      0.2705  memcpy
     1233      0.2655  system_call
     1213      0.2612  menu_select
            

Notice that above we see an entry for the __copy_to_user_ll() function that we've looked at with other profilers as well.

Here's what we get when we do the same thing for the busybox executable:

     CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated)
     Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
     samples  %        image name               symbol name
     349       8.4198  busybox                  retrieve_file_data
     308       7.4306  libc-2.16.so             _IO_file_xsgetn
     283       6.8275  libc-2.16.so             __read_nocancel
     235       5.6695  libc-2.16.so             syscall
     233       5.6212  libc-2.16.so             clearerr
     215       5.1870  libc-2.16.so             fread
     181       4.3667  libc-2.16.so             __write_nocancel
     158       3.8118  libc-2.16.so             __underflow
     151       3.6429  libc-2.16.so             _dl_addr
     150       3.6188  busybox                  progress_meter
     150       3.6188  libc-2.16.so             __poll_nocancel
     148       3.5706  libc-2.16.so             _IO_file_underflow@@GLIBC_2.1
     137       3.3052  busybox                  safe_poll
     125       3.0157  busybox                  bb_progress_update
     122       2.9433  libc-2.16.so             __x86.get_pc_thunk.bx
     95        2.2919  busybox                  full_write
     81        1.9542  busybox                  safe_write
     77        1.8577  busybox                  xwrite
     72        1.7370  libc-2.16.so             _IO_file_read
     71        1.7129  libc-2.16.so             _IO_sgetn
     67        1.6164  libc-2.16.so             poll
     52        1.2545  libc-2.16.so             _IO_switch_to_get_mode
     45        1.0856  libc-2.16.so             read
     34        0.8203  libc-2.16.so             write
     32        0.7720  busybox                  monotonic_sec
     25        0.6031  libc-2.16.so             vfprintf
     22        0.5308  busybox                  get_mono
     14        0.3378  ld-2.16.so               strcmp
     14        0.3378  libc-2.16.so             __x86.get_pc_thunk.cx
     .
     .
     .
            

Since we recorded the profile with a callchain depth of 6, we should be able to see our __copy_to_user_ll() callchains in the output, and indeed we can if we search around a bit in the 'opreport --callgraph' output:

     root@crownbay:~# opreport --callgraph /boot/vmlinux-3.4.11-yocto-standard

       392       6.9639  vmlinux-3.4.11-yocto-standard sock_aio_read
       736      13.0751  vmlinux-3.4.11-yocto-standard __generic_file_aio_write
       3255     57.8255  vmlinux-3.4.11-yocto-standard inet_recvmsg
     785       0.1690  vmlinux-3.4.11-yocto-standard tcp_recvmsg
       1790     31.7940  vmlinux-3.4.11-yocto-standard local_bh_enable
       1238     21.9893  vmlinux-3.4.11-yocto-standard __kfree_skb
       992      17.6199  vmlinux-3.4.11-yocto-standard lock_sock_nested
       785      13.9432  vmlinux-3.4.11-yocto-standard tcp_recvmsg [self]
       525       9.3250  vmlinux-3.4.11-yocto-standard release_sock
       112       1.9893  vmlinux-3.4.11-yocto-standard tcp_cleanup_rbuf
       72        1.2789  vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec

     170       0.0366  vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec
       1491     73.3038  vmlinux-3.4.11-yocto-standard memcpy_toiovec
       327      16.0767  vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec
       170       8.3579  vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec [self]
       20        0.9833  vmlinux-3.4.11-yocto-standard copy_to_user

       2588     98.2909  vmlinux-3.4.11-yocto-standard copy_to_user
     2349      0.5059  vmlinux-3.4.11-yocto-standard __copy_to_user_ll
       2349     89.2138  vmlinux-3.4.11-yocto-standard __copy_to_user_ll [self]
       166       6.3046  vmlinux-3.4.11-yocto-standard do_page_fault
            

Remember that by default OProfile sessions are cumulative i.e. if you start and stop a profiling session, then start a new one, the new one will not erase the previous run(s) but will build on it. If you want to restart a profile from scratch, you need to reset:

     root@crownbay:~# opcontrol --reset