Data … as usual

All things about data by Laurent Leturgez

Monthly Archives: March 2014

Linux monitoring of oracle 12c multi-threaded instances

Oracle 12c comes with a new feature: multithreaded server. In summary, main processes like real time scheduled processes (vktm, lms), or main processes like pmon or dbwn continue to run as processes. For other ones (lgwr, mmon, server processes etc.), they run now in a thread.

This feature has been developed to optimize oracle to be run on new processors with many core and many threads per core (for example SPARC T Processors), but the DBA will have to change many methods he use to analyze problems in a multi-threaded server.

If for some problems, you usually analyze the OS side, top, ps, and other tools have to be used in a different way. Let’s see different tools that can be used to analyze processes and thread in linux (Tools mentioned here has been tested with Oracle Enterprise Linux 6).

For all example above, I used an orcl instance which run in a multi-threaded configuration

  • ps

If I run a simple ps under my config, there are only 6 processes:

[oracle@oel64-12c ~]$ ps -ef | grep [o]rcl
oracle    9871     1  0 21:02 ?        00:00:00 ora_pmon_orcl
oracle    9873     1  0 21:02 ?        00:00:00 ora_psp0_orcl
oracle    9878     1  5 21:02 ?        00:01:35 ora_vktm_orcl
oracle    9882     1  0 21:02 ?        00:00:02 ora_u004_orcl
oracle    9888     1  0 21:02 ?        00:00:11 ora_u005_orcl
oracle    9894     1  0 21:02 ?        00:00:00 ora_dbw0_orcl
 If I want to print all threads that run in these processes, I can run this command:
[oracle@oel64-12c ~]$ ps -eLo pid,pcpu,tid,user,comm,cmd | sed -n -e '1p' -e '/orcl/p'
  PID %CPU   TID USER     COMMAND         CMD
 9871  0.0  9871 oracle   ora_pmon_orcl   ora_pmon_orcl
 9873  0.0  9873 oracle   ora_psp0_orcl   ora_psp0_orcl
 9878  5.2  9878 oracle   ora_vktm_orcl   ora_vktm_orcl
 9882  0.0  9882 oracle   ora_scmn_orcl   ora_u004_orcl
 9882  0.0  9883 oracle   oracle          ora_u004_orcl
 9882  0.0  9884 oracle   ora_gen0_orcl   ora_u004_orcl
 9882  0.0  9885 oracle   ora_mman_orcl   ora_u004_orcl
 9882  0.0  9891 oracle   ora_dbrm_orcl   ora_u004_orcl
 9882  0.0  9895 oracle   ora_lgwr_orcl   ora_u004_orcl
 9882  0.0  9896 oracle   ora_ckpt_orcl   ora_u004_orcl
 9882  0.0  9897 oracle   ora_lg00_orcl   ora_u004_orcl
 9882  0.0  9898 oracle   ora_lg01_orcl   ora_u004_orcl
 9882  0.0  9899 oracle   ora_smon_orcl   ora_u004_orcl
 9882  0.0  9901 oracle   ora_lreg_orcl   ora_u004_orcl
 9888  0.0  9888 oracle   ora_scmn_orcl   ora_u005_orcl
 9888  0.0  9889 oracle   oracle          ora_u005_orcl
 9888  0.0  9890 oracle   ora_diag_orcl   ora_u005_orcl
 9888  0.0  9892 oracle   ora_dia0_orcl   ora_u005_orcl
 9888  0.0  9900 oracle   ora_reco_orcl   ora_u005_orcl
 9888  0.0  9902 oracle   ora_mmon_orcl   ora_u005_orcl
 9888  0.0  9903 oracle   ora_mmnl_orcl   ora_u005_orcl
 9888  0.0  9904 oracle   ora_d000_orcl   ora_u005_orcl
 9888  0.0  9905 oracle   ora_s000_orcl   ora_u005_orcl
 9888  0.0  9906 oracle   ora_n000_orcl   ora_u005_orcl
 9888  1.3  9931 oracle   oracle_9931_orc ora_u005_orcl
 9888  0.0  9932 oracle   ora_tmon_orcl   ora_u005_orcl
 9888  0.0  9933 oracle   ora_tt00_orcl   ora_u005_orcl
 9888  0.0  9934 oracle   ora_smco_orcl   ora_u005_orcl
 9888  0.0  9938 oracle   ora_fbda_orcl   ora_u005_orcl
 9888  0.0  9939 oracle   ora_aqpc_orcl   ora_u005_orcl
 9888  0.0  9944 oracle   ora_p000_orcl   ora_u005_orcl
 9888  0.0  9945 oracle   ora_p001_orcl   ora_u005_orcl
 9888  0.0  9946 oracle   ora_p002_orcl   ora_u005_orcl
 9888  0.0  9947 oracle   ora_p003_orcl   ora_u005_orcl
 9888  0.0  9948 oracle   ora_p004_orcl   ora_u005_orcl
 9888  0.0  9949 oracle   ora_p005_orcl   ora_u005_orcl
 9888  0.0  9950 oracle   ora_p006_orcl   ora_u005_orcl
 9888  0.0  9951 oracle   ora_p007_orcl   ora_u005_orcl
 9888  0.0  9952 oracle   ora_cjq0_orcl   ora_u005_orcl
 9888  0.0  9996 oracle   ora_qm02_orcl   ora_u005_orcl
 9888  0.0  9998 oracle   ora_q002_orcl   ora_u005_orcl
 9888  0.0  9999 oracle   ora_q003_orcl   ora_u005_orcl
 9888  0.0 16009 oracle   ora_w000_orcl   ora_u005_orcl
 9894  0.0  9894 oracle   ora_dbw0_orcl   ora_dbw0_orcl
16414  0.0 16414 oracle   sed             sed -n -e 1p -e /orcl/p
 First column is the PID, second column is the CPU percent burn by the thread, third column is the thread Id, next column is the thread owner, the second last column is the oracle thread name and the last is the process name.
With ps, you can have a static view and possibly identify problematic processes
  • top

With top, you can see you processes or threads in a more dynamic fashion. Top option used to see threads is -H, but you have to mention which processes you want to analyze with -p parameter followed by pids. The main drawback of this command is that -p is limited to 20 pids but for a mid size multi-threaded instance, it’s ok.

[oracle@oel64-12c ~]$ top -p $(pgrep -d',' orcl$) -H
top - 21:55:37 up 1 day,  2:03,  6 users,  load average: 1.10, 1.04, 1.13
Tasks:  43 total,   0 running,  43 sleeping,   0 stopped,   0 zombie
Cpu0  : 25.8%us, 46.5%sy,  0.0%ni, 14.5%id, 10.3%wa,  0.0%hi,  3.0%si,  0.0%st
Cpu1  : 16.3%us, 46.8%sy,  0.0%ni, 28.9%id,  7.8%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4055296k total,  3052640k used,  1002656k free,    20360k buffers
Swap:  8388604k total,  1475216k used,  6913388k free,  2302200k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9878 oracle    -2   0 1489m  17m  15m S  6.7  0.4   3:03.07 ora_vktm_orcl
 9892 oracle    20   0 3457m 340m 252m S  0.5  8.6   0:01.27 ora_dia0_orcl
 9902 oracle    20   0 3457m 340m 252m S  0.5  8.6   0:02.00 ora_mmon_orcl
 9871 oracle    20   0 1489m  21m  19m S  0.0  0.5   0:00.33 ora_pmon_orcl
 9873 oracle    20   0 1489m  17m  15m S  0.0  0.4   0:01.17 ora_psp0_orcl
 9882 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.07 ora_scmn_orcl
 9883 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.00 oracle
 9884 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.25 ora_gen0_orcl
 9885 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.25 ora_mman_orcl
 9891 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.24 ora_dbrm_orcl
 9895 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.33 ora_lgwr_orcl
 9896 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:01.08 ora_ckpt_orcl
 9897 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.12 ora_lg00_orcl
 9898 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.03 ora_lg01_orcl
 9899 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.07 ora_smon_orcl
 9901 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.13 ora_lreg_orcl
 9888 oracle    20   0 3457m 340m 252m S  0.0  8.6   0:00.40 ora_scmn_orcl
 9889 oracle    20   0 3457m 340m 252m S  0.0  8.6   0:00.01 oracle
  • pidstat

pidstat is a command which appears in OEL6. It runs like a vmstat or mpstat with an interval and a counter, but it gives information of how evolve cpu, io, memory consumption for a specific process.

For example, to see cpu consumption every second for the process with pid 9888

[oracle@oel64-12c ~]$ pidstat -p 9888 -u 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:03:14 PM       PID    %usr %system  %guest    %CPU   CPU  Command
10:03:15 PM      9888    0.00    0.00    0.00    0.00     0  ora_scmn_orcl
10:03:16 PM      9888    1.00    0.00    0.00    1.00     1  ora_scmn_orcl
10:03:17 PM      9888    0.00    0.00    0.00    0.00     1  ora_scmn_orcl
10:03:18 PM      9888    1.00    1.00    0.00    2.00     1  ora_scmn_orcl
Please note that the process name is ora_u005_orcl but it’s printed with the command name which is, in fact, the thread name.

So if you want to see every thread in this process, you need to use -t option:

[oracle@oel64-12c ~]$ pidstat -p 9888 -u -t 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:08:42 PM      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
10:08:43 PM      9888         -    0.00    0.00    0.00    0.00     1  ora_scmn_orcl
10:08:43 PM         -      9888    0.00    0.00    0.00    0.00     1  |__ora_scmn_orcl
10:08:43 PM         -      9889    0.00    0.00    0.00    0.00     1  |__oracle
10:08:43 PM         -      9890    0.00    0.00    0.00    0.00     0  |__ora_diag_orcl
10:08:43 PM         -      9892    0.00    0.00    0.00    0.00     0  |__ora_dia0_orcl
10:08:43 PM         -      9900    0.00    0.00    0.00    0.00     0  |__ora_reco_orcl
10:08:43 PM         -      9902    0.00    0.00    0.00    0.00     1  |__ora_mmon_orcl
10:08:43 PM         -      9903    0.00    0.00    0.00    0.00     0  |__ora_mmnl_orcl
10:08:43 PM         -      9904    0.00    0.00    0.00    0.00     0  |__ora_d000_orcl
10:08:43 PM         -      9905    0.00    0.00    0.00    0.00     0  |__ora_s000_orcl
10:08:43 PM         -      9906    0.00    0.00    0.00    0.00     0  |__ora_n000_orcl
10:08:43 PM         -      9932    0.00    0.00    0.00    0.00     0  |__ora_tmon_orcl
10:08:43 PM         -      9933    0.00    0.00    0.00    0.00     0  |__ora_tt00_orcl
10:08:43 PM         -      9934    0.00    0.00    0.00    0.00     1  |__ora_smco_orcl
10:08:43 PM         -      9938    0.00    0.00    0.00    0.00     1  |__ora_fbda_orcl
10:08:43 PM         -      9939    0.00    0.00    0.00    0.00     1  |__ora_aqpc_orcl
10:08:43 PM         -      9944    0.00    0.00    0.00    0.00     0  |__ora_p000_orcl
10:08:43 PM         -      9945    0.00    0.00    0.00    0.00     0  |__ora_p001_orcl
10:08:43 PM         -      9946    0.00    0.00    0.00    0.00     1  |__ora_p002_orcl
10:08:43 PM         -      9947    0.00    0.00    0.00    0.00     0  |__ora_p003_orcl
10:08:43 PM         -      9948    0.00    0.00    0.00    0.00     0  |__ora_p004_orcl
10:08:43 PM         -      9949    0.00    0.00    0.00    0.00     1  |__ora_p005_orcl
10:08:43 PM         -      9950    0.00    0.00    0.00    0.00     1  |__ora_p006_orcl
10:08:43 PM         -      9951    0.00    0.00    0.00    0.00     1  |__ora_p007_orcl
10:08:43 PM         -      9952    0.00    0.00    0.00    0.00     1  |__ora_cjq0_orcl
10:08:43 PM         -      9996    0.00    0.00    0.00    0.00     0  |__ora_qm02_orcl
10:08:43 PM         -      9998    0.00    0.00    0.00    0.00     1  |__ora_q002_orcl
10:08:43 PM         -      9999    0.00    0.00    0.00    0.00     0  |__ora_q003_orcl
10:08:43 PM         -     16009    0.00    0.00    0.00    0.00     1  |__ora_w000_orcl
10:08:43 PM         -     21462    0.00    1.00    0.00    1.00     1  |__ora_vkrm_orcl
10:08:43 PM         -     22117    0.00    0.00    0.00    0.00     1  |__ora_w001_orcl
10:08:43 PM         -     22128    0.00    0.00    0.00    0.00     0  |__ora_w002_orcl
10:08:43 PM         -     22689    0.00    0.00    0.00    0.00     1  |__ora_w003_orcl
10:08:43 PM         -     22703    0.00    0.00    0.00    0.00     0  |__ora_w004_orcl
10:08:43 PM         -     22713    0.00    0.00    0.00    0.00     0  |__ora_w005_orcl
There are other interesting options to monitor IO (-d), page faults and memory (-r), CPU utilization seen above (-u), switching activities (-w).
For example:
[oracle@oel64-12c ~]$ pidstat -p 9888 -w -t 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:57:54 PM      TGID       TID   cswch/s nvcswch/s  Command
10:57:55 PM      9888         -      1.00      1.00  ora_scmn_orcl
10:57:55 PM         -      9888      1.00      1.00  |__ora_scmn_orcl
10:57:55 PM         -      9889      0.00      0.00  |__oracle
10:57:55 PM         -      9890      1.00      0.00  |__ora_diag_orcl
10:57:55 PM         -      9892      1.00      0.00  |__ora_dia0_orcl
10:57:55 PM         -      9900      1.00      0.00  |__ora_reco_orcl
10:57:55 PM         -      9902      1.00      0.00  |__ora_mmon_orcl
10:57:55 PM         -      9903      1.00      1.00  |__ora_mmnl_orcl
10:57:55 PM         -      9904      1.00      0.00  |__ora_d000_orcl
10:57:55 PM         -      9905      1.00      0.00  |__ora_s000_orcl
10:57:55 PM         -      9906      1.00      0.00  |__ora_n000_orcl
10:57:55 PM         -      9932      1.00      0.00  |__ora_tmon_orcl
10:57:55 PM         -      9933      1.00      0.00  |__ora_tt00_orcl
10:57:55 PM         -      9934      1.00      1.00  |__ora_smco_orcl
10:57:55 PM         -      9938      1.00      0.00  |__ora_fbda_orcl
10:57:55 PM         -      9939      1.00      0.00  |__ora_aqpc_orcl
10:57:55 PM         -      9944      0.00      0.00  |__ora_p000_orcl
10:57:55 PM         -      9945      0.00      0.00  |__ora_p001_orcl
10:57:55 PM         -      9946      0.00      0.00  |__ora_p002_orcl
10:57:55 PM         -      9947      0.00      0.00  |__ora_p003_orcl
10:57:55 PM         -      9948      0.00      0.00  |__ora_p004_orcl
10:57:55 PM         -      9949      0.00      0.00  |__ora_p005_orcl
10:57:55 PM         -      9950      0.00      0.00  |__ora_p006_orcl
10:57:55 PM         -      9951      0.00      0.00  |__ora_p007_orcl
10:57:55 PM         -      9952      1.00      0.00  |__ora_cjq0_orcl
10:57:55 PM         -      9996      0.00      0.00  |__ora_qm02_orcl
10:57:55 PM         -      9998      0.00      0.00  |__ora_q002_orcl
10:57:55 PM         -      9999      1.00      0.00  |__ora_q003_orcl
10:57:55 PM         -     21462     96.00      3.00  |__ora_vkrm_orcl
10:57:55 PM         -     22713      1.00      0.00  |__ora_w005_orcl
10:57:55 PM         -     29708      0.00      0.00  |__ora_q001_orcl
10:57:55 PM         -     29709      0.00      0.00  |__oracle_29709_or
10:57:55 PM         -     26975      1.00      0.00  |__ora_w004_orcl
  • gdb (for debug)

If you want to trace system calls made by threads, you can use linux debugger (gdb). I don’t have a deep knowledge of gdb, but you can attach gdb to a process with the -p option.

[oracle@oel64-12c ~]$ gdb -p 9888
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9888

.../...
After this, you have a command which prints threads information (LWP (for Light Weight Process ???) indicates the Thread Id:
(gdb) info threads
  31 Thread 0x7f5a89ff6700 (LWP 29709)  0x0000003abe00e75d in read () from /lib64/libpthread.so.0  <<< my session is located here and is waiting for a command (read syscall)
  30 Thread 0x7f5a81ff2700 (LWP 29708)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  29 Thread 0x7f5b10beb700 (LWP 9889)  0x0000003abdcdf343 in poll () from /lib64/libc.so.6
  28 Thread 0x7f5b0ea2a700 (LWP 9890)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  27 Thread 0x7f5b07fff700 (LWP 9892)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  26 Thread 0x7f5afbfff700 (LWP 9900)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  25 Thread 0x7f5af3fff700 (LWP 9902)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  24 Thread 0x7f5aebfff700 (LWP 9903)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  23 Thread 0x7f5ae3fff700 (LWP 9904)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  22 Thread 0x7f5adbfff700 (LWP 9905)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  21 Thread 0x7f5ad3fff700 (LWP 9906)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  20 Thread 0x7f5acbfff700 (LWP 9932)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  19 Thread 0x7f5ac3fff700 (LWP 9933)  0x0000003abe00ef3d in nanosleep () from /lib64/libpthread.so.0
  18 Thread 0x7f5ab3fff700 (LWP 9934)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  17 Thread 0x7f5aabfff700 (LWP 9938)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  16 Thread 0x7f5aa3fff700 (LWP 9939)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  15 Thread 0x7f5a99ffe700 (LWP 9944)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  14 Thread 0x7f5a97ffd700 (LWP 9945)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  13 Thread 0x7f5a95ffc700 (LWP 9946)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  12 Thread 0x7f5a93ffb700 (LWP 9947)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  11 Thread 0x7f5a91ffa700 (LWP 9948)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  10 Thread 0x7f5a8fff9700 (LWP 9949)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  9 Thread 0x7f5a8dff8700 (LWP 9950)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  8 Thread 0x7f5a8bff7700 (LWP 9951)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  7 Thread 0x7f5a9bfff700 (LWP 9952)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  6 Thread 0x7f5a83ff3700 (LWP 9996)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  5 Thread 0x7f5a87ff5700 (LWP 9998)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  4 Thread 0x7f5a85ff4700 (LWP 9999)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  3 Thread 0x7f5abbfff700 (LWP 21462)  0x0000003abe00ef3d in nanosleep () from /lib64/libpthread.so.0
  2 Thread 0x7f5a79fee700 (LWP 22713)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
* 1 Thread 0x7f5b10f2a9e0 (LWP 9888)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
 Next, you can select a specific thread with the gdb “thread” command:
(gdb) thread 31
[Switching to thread 31 (Thread 0x7f5a89ff6700 (LWP 29709))]#0  0x0000003abe00e75d in read () from /lib64/libpthread.so.0
(gdb) info threads
* 31 Thread 0x7f5a89ff6700 (LWP 29709)  0x0000003abe00e75d in read () from /lib64/libpthread.so.0
  30 Thread 0x7f5a81ff2700 (LWP 29708)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  29 Thread 0x7f5b10beb700 (LWP 9889)  0x0000003abdcdf343 in poll () from /lib64/libc.so.6
  28 Thread 0x7f5b0ea2a700 (LWP 9890)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  27 Thread 0x7f5b07fff700 (LWP 9892)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  26 Thread 0x7f5afbfff700 (LWP 9900)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
 .../...
Next, you can use breakpoints, watchpoint etc. to debug oracle calls etc.
If you are interested by tracing oracle system calls with gdb, Frits Hoogland have written many articles on this subject: