Data … as usual

All things about data by Laurent Leturgez

Category Archives: Internals

Profiling Java Application with Systemtap

A couple of days ago, I read this tweet from Tanel Poder :

I’m not a JVM internals geek but I was sure there was a way to do the job without restarting the JVM, and I found some cool stuff with Systemtap.

To do this, you have to install two packages on your linux distribution: systemtap and systemtap-runtime-java (and configure correctly your user environment):

[root@spark ~]# yum install systemtap systemtap-runtime-java

Please note that I used a CentOS 7.4 distribution.

Then, and for the demo, I wrote a very small piece of Java that do these steps:

  1. Prints the JVM PID
  2. Wait for a key to be pressed. During this time, you will have to execute the systemtap script I will described later.
  3. Execute a loop ten times, each loop with print a message and wait one second, and this last step is executed in a method name “loop_and_wait”.

Here’s the sample code:


package com.premiseo;

import java.lang.*;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

class Example {
   public static void loop_and_wait(int n) throws InterruptedException{
           System.out.println("Waiting "+n+"ms... Tick");
           Thread.sleep(n);
        }

   public static void main(String[] args) {

      System.out.println("PID = "+java.lang.management.ManagementFactory.getRuntimeMXBean().getName().split("@")[0]);
      System.out.println("Press any key when ready ...");

      try {
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        String next = in.readLine();
      }
      catch (IOException ioe) {
        ioe.printStackTrace();
      }

      try {
        for (int i=0;i<10;i++) {
           loop_and_wait(1000);
        }
      }
      catch (InterruptedException ie) {
        ie.printStackTrace();
      }
   }
}

Then, compile and execute … very basic I said 😉

[spark@spark java]$ javac -cp $CLASSPATH:. com/premiseo/Example.java
[spark@spark java]$ java -cp $CLASSPATH:. com.premiseo.Example
PID = 9928
Press any key when ready ...

Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick

Now, to answer to Tanel, I used a short systemtap script that will profile the program and specially the loop_and_wait method. I will count the number of times the loop_and_wait method has been called, and I will account the time spent in this method execution.

To do that, I had to write two probes related to:

  • the full name of the class, including the package name: com.premiseo.Example
  • the class name where the method is defined: Example
  • the method name I want to profile: loop_and_wait

The first one will be executed when the program will start to execute the targeted method (java(“com.premiseo.Example”).class(“Example”).method(“loop_and_wait”)), the second one will be executed when the method will return (java(“com.premiseo.Example”).class(“Example”).method(“loop_and_wait”).return)

The related systemtap script is given below:

#!/usr/bin/env stap

global counter,timespent,t

probe begin {
  printf("Press Ctrl+C to stop profiling\n")
  counter=0
  timespent=0
}

probe java("com.premiseo.Example").class("Example").method("loop_and_wait")
{
  counter++
  t=gettimeofday_ms()
}

probe java("com.premiseo.Example").class("Example").method("loop_and_wait").return
{
  timespent+=gettimeofday_ms()-t
}

probe end {
   printf("Number of calls for loop_and_wait method: %ld \n",    counter)
   printf("Time Spent in method loop_and_wait: %ld msecs \n", timespent)
}

Execution of this systemtap script gave the following result (click the image for full size):

Is it dynamic? Yes, no need to restart the running JVM process you want to target. If you want to target a specific JVM process id, you can use the stap’s “-x” option, add the modify your probe definition like this:

probe java("com.premiseo.Example").class("Example").method("loop_and_wait")
{
  if (pid() == target())
    counter++
    t=gettimeofday_ms()
}

There’s a limitation, you cannot use wilcards in the java probe definition (java(“com.premiseo.Example”).class(“Example”).method(“loop*”) … for example). That would have been useful to profile a set of methods in the same class … but not possible currently.

If you want to read more about this kind of stuff, please read the following websites:

And … that’s all for today !! 😉

 

 

Install and configure DTrace on Oracle Linux

Dtrace is one of the best tool to perform dynamic tracing of program execution.

Dtrace has been initially released on Solaris and now it’s ported on Linux.

In this post, I will describe very shortly how to install and configure Dtrace port on an Oracle Linux 6 box with UEK4 Kernel.

First, download dtrace-util and dtrace-util-devel packages. These packages are available at this URL : http://www.oracle.com/technetwork/server-storage/linux/downloads/linux-dtrace-2800968.html. You just have to download the correct releases depending on your UEK kernel version, and your Oracle Linux Distribution.

In my case, I chose “DTrace utilities, Oracle Linux 6 (x86_64)” for UEK4 kernel.

[root@oel6 dtrace]# ls
dtrace-utils-0.5.1-3.el6.x86_64.rpm  dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm

Then, I used yum to install both packages, but before you have to configure (if not already done) your yum repo for UEK4 :

[public_ol6_UEKR4]
name=Latest Unbreakable Enterprise Kernel Release 4 for Oracle Linux $releasever ($basearch)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/UEKR4/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=$uekr4

Now, install the packages:

[root@oel6 dtrace]# yum localinstall dtrace-utils-*
Loaded plugins: auto-update-debuginfo, refresh-packagekit, security, ulninfo
Setting up Local Package Process
Examining dtrace-utils-0.5.1-3.el6.x86_64.rpm: dtrace-utils-0.5.1-3.el6.x86_64
Marking dtrace-utils-0.5.1-3.el6.x86_64.rpm to be installed
Examining dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm: dtrace-utils-devel-0.5.1-3.el6.x86_64
Marking dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package dtrace-utils.x86_64 0:0.5.1-3.el6 will be installed
--> Processing Dependency: dtrace-modules-shared-headers for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf.so.1(LIBDTRACE_CTF_1.0)(64bit) for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf.so.1()(64bit) for package: dtrace-utils-0.5.1-3.el6.x86_64
---> Package dtrace-utils-devel.x86_64 0:0.5.1-3.el6 will be installed
--> Processing Dependency: libdtrace-ctf-devel > 0.4.0 for package: dtrace-utils-devel-0.5.1-3.el6.x86_64
--> Running transaction check
---> Package dtrace-modules-shared-headers.x86_64 0:0.5.3-2.el6 will be installed
---> Package libdtrace-ctf.x86_64 0:0.5.0-3.el6 will be installed
---> Package libdtrace-ctf-devel.x86_64 0:0.5.0-3.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================================================================================================================
 Package                                                           Arch                                       Version                                            Repository                                                                  Size
==================================================================================================================================================================================================================================================
Installing:
 dtrace-utils                                                      x86_64                                     0.5.1-3.el6                                        /dtrace-utils-0.5.1-3.el6.x86_64                                           786 k
 dtrace-utils-devel                                                x86_64                                     0.5.1-3.el6                                        /dtrace-utils-devel-0.5.1-3.el6.x86_64                                      76 k
Installing for dependencies:
 dtrace-modules-shared-headers                                     x86_64                                     0.5.3-2.el6                                        public_ol6_UEKR4                                                            30 k
 libdtrace-ctf                                                     x86_64                                     0.5.0-3.el6                                        public_ol6_UEKR4                                                            28 k
 libdtrace-ctf-devel                                               x86_64                                     0.5.0-3.el6                                        public_ol6_UEKR4                                                            15 k

Transaction Summary
==================================================================================================================================================================================================================================================
Install       5 Package(s)

Total size: 935 k
Total download size: 73 k
Installed size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/3): dtrace-modules-shared-headers-0.5.3-2.el6.x86_64.rpm                                                                                                                                                                |  30 kB     00:00
(2/3): libdtrace-ctf-0.5.0-3.el6.x86_64.rpm                                                                                                                                                                                |  28 kB     00:00
(3/3): libdtrace-ctf-devel-0.5.0-3.el6.x86_64.rpm  
.../...

 

Installing those packages is not sufficient, you have to install a package containing the kernel modules for dtrace, and as the version of this package depends on your kernel, you have to run the yum command below:

[root@oel6 dtrace]# yum install dtrace-modules-`uname -r`
Loaded plugins: auto-update-debuginfo, refresh-packagekit, security, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package dtrace-modules-4.1.12-37.5.1.el6uek.x86_64 0:0.5.2-1.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================================================================================================================
 Package                                                                     Arch                                           Version                                                Repository                                                Size
==================================================================================================================================================================================================================================================
Installing:
 dtrace-modules-4.1.12-37.5.1.el6uek                                         x86_64                                         0.5.2-1.el6                                            public_ol6_UEKR4                                         1.2 M

Transaction Summary
==================================================================================================================================================================================================================================================
Install       1 Package(s)

Total download size: 1.2 M
Installed size: 6.1 M
Is this ok [y/N]: y

Once the packages are installed, you have to load a bunch of modules into the kernel.

You can do that manually by running the command below:

[root@oel6 dtrace]# modprobe -a dtrace profile systrace sdt dt_test

Or, you can configure your Linux box to load those module during startup. In my case, as I run that in a OL6 box, I configured a file in /etc/sysconfig/modules/dtrace.modules and change its permissions.

[root@oel6 dtrace]# cat > /etc/sysconfig/modules/dtrace.modules
#!/bin/sh

if [ ! -c /dev/dtrace/dtrace ] ; then
        exec /sbin/modprobe -a dtrace profile systrace sdt dt_test  >/dev/null 2>&1
fi

[root@oel6 dtrace]# chmod 755 /etc/sysconfig/modules/dtrace.modules

After a reboot, dtrace runs fine with the root user:

[root@oel6 dtrace]# dtrace -l | wc -l
670

 
But not with the oracle user:

[oracle@oel6 ~]$ dtrace -l
dtrace: failed to initialize dtrace: DTrace requires additional privileges

To fix that, we have to set two small tricks:
1) Create a dtrace unix group and assign this group to the oracle user (or any user you want to grant dtrace utilization)

[root@oel6 ~]# id -a oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),48(apache),54322(dba)
[root@oel6 ~]# groupadd dtrace
[root@oel6 ~]# usermod -a -G dtrace oracle
[root@oel6 ~]# id -a oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),48(apache),54322(dba),54325(dtrace)

2) Configure the /dev/dtrace/dtrace Unix device to have the correct group ownership:

[root@oel6 ~]# cat /etc/udev/rules.d/10-dtrace.rules
kernel=="dtrace/dtrace", GROUP="dtrace" MODE="0660"

After a last reboot, that works fine even for my oracle user, and I can trace pmon and every process of my Oracle instances:

[oracle@oel6 ~]$ dtrace -l | wc -l
670

[oracle@oel6 ~]$ ps -ef | grep pmon
oracle    3436     1  0 18:16 ?        00:00:00 ora_pmon_orcl11
oracle    3519  3381  0 18:16 pts/0    00:00:00 grep pmon

[oracle@oel6 ~]$ cat test.d
#!/usr/sbin/dtrace -qs
syscall:::entry
/pid == $1/
{
  @num[probefunc] = count();
}

[oracle@oel6 ~]$ ./test.d 3436


  mmap                                                              1
  munmap                                                            1
  newfstat                                                          1
  getrusage                                                         4
  poll                                                              4
  times                                                             6
  close                                                            19
  open                                                             19
  read                                                             19

That’s all for today 🙂 .
 

How Oracle 12.2 manage Editions.

Recently, I wrote a blog post about suspected new editions: Enteprise Core edition and Standard Core Edition.

This morning, I received a mention from twitter coming from Franck Pachot:

In GV$INSTANCE definition, we can see that edition is encoded in X$KSUXSINST view (in the column KSUXSEDITION). The corresponding code is this one:

  • 2 = PO = Personal Ed.
  • 4 = SE = Standard Ed.
  • 8 = EE = Enterprise Ed.
  • 16 = XE = eXpress Ed.
  • 32 = CS = Standard Core Ed.
  • 64 = CE = Enterprise Core Ed.
  • 128 = HP = Enterprise Ed. High Perf
  • 256 = XP = Enteprise Ed. Extreme Perf

How Oracle use these codes to determine Oracle Edition ?

When, you relink your Oracle kernel, you use ins_rdbms.mk makefile located in $ORACLE_HOME/rdbms/lib and you run a make command with specific targets.

For example, if you wan to link Core Enterprise Edition, you will run this (See my previous blog post):

$ make -f ins_rdbms.mk edition_coreenterprise ioracle
Deploying Oracle Database Core Enterprise Edition
mv -f /u01/app/oracle/product/12.2.0/dbhome_1/lib/libvsn12.a /u01/app/oracle/product/12.2.0/dbhome_1/lib/libvsn12_backup.a.dbl
cp /u01/app/oracle/product/12.2.0/dbhome_1/lib/libvsn12_cee.a.dbl /u01/app/oracle/product/12.2.0/dbhome_1/lib/libvsn12.a
chmod 755 /u01/app/oracle/product/12.2.0/dbhome_1/bin

In fact, it uses a library which is copied as $ORACLE_HOME/lib/libvsn12.a and then this library is linked to Oracle kernel.

There are many libraries in the 12.2 kernel:

$ cd $ORACLE_HOME/lib
$ ls libvsn* -l
-rw-r--r-- 1 oracle oinstall 10774 Nov 28 17:47 libvsn12.a
-rw-r--r-- 1 oracle oinstall 10742 Oct  4 00:46 libvsn12.a.default
-rw-r--r-- 1 oracle oinstall 10774 Nov 28 17:40 libvsn12_backup.a.dbl
-rw-r--r-- 1 oracle oinstall 10790 Oct  4 00:46 libvsn12_cee.a.dbl
-rw-r--r-- 1 oracle oinstall 10790 Oct  4 00:46 libvsn12_cse.a.dbl
-rw-r--r-- 1 oracle oinstall 10774 Oct  4 00:46 libvsn12_hp.a.dbl
-rw-r--r-- 1 oracle oinstall 10766 Oct  4 00:46 libvsn12_std.a.dbl
-rw-r--r-- 1 oracle oinstall 10774 Oct  4 00:46 libvsn12_xp.a.dbl

After a closer look inside those libraries, I found they are containing two object files (.o)

$ ar -t libvsn12_hp.a.dbl
vsnhp.o
vsnfhp.o
$ ar -t libvsn12_cee.a.dbl
vsncee.o
vsnfcee.o
$ ar -t libvsn12_std.a.dbl
vsnstd.o
vsnfstd.o

etc...

After extracting these files from the library, and reading the ELF section, we can see:

  • the vsnXXX.o file contains the banner
  • the vsnfXXX.o file contains other things but I don’t know what yet.
$ ar -x libvsn12_cee.a.dbl
$ objdump -s vsncee.o

vsncee.o:     file format elf64-x86-64

Contents of section .comment:

.../...

Contents of section .rodata:
 0000 4f726163 6c652044 61746162 61736520  Oracle Database
 0010 31326320 456e7465 72707269 73652045  12c Enterprise E
 0020 64697469 6f6e202d 20436f72 65202564  dition - Core %d
 0030 2e25642e 25642e25 642e2564 20257300  .%d.%d.%d.%d %s.
 0040 00000000 40000000 2d203634 62697420  ....@...- 64bit
 0050 50726f64 75637469 6f6e0000 00000000  Production......

$ objdump -s vsnfcee.o

vsnfcee.o:     file format elf64-x86-64

Contents of section .comment:

.../...

Contents of section text.unlikely:
 0000 90909090 90909090 90909090 90909090  ................
 0010 66905548 89e5b840 00000048 89ec5dc3  f.UH...@...H..].
Contents of section .eh_frame:
 0000 14000000 00000000 01000178 100c0708  ...........x....
 0010 90010000 00000000 2c000000 1c000000  ........,.......
 0020 00000000 00000000 10000000 00000000  ................
 0030 04030000 000e1004 03000000 0c061086  ................
 0040 02040900 0000c600                    ........

If we compare these sections (text.unlikely and .eh_frame) between many Edition library files, we can see the following points:

  • Eh_frame section contains exception unwinding and source language information. They shouldn’t be very different:
$ objdump -s -j .eh_frame vsnfcee.o vsnfxp.o vsnfhp.o vsnfstd.o vsnfcse.o

vsnfcee.o:     file format elf64-x86-64

Contents of section .eh_frame:
 0000 14000000 00000000 01000178 100c0708  ...........x....
 0010 90010000 00000000 2c000000 1c000000  ........,.......
 0020 00000000 00000000 10000000 00000000  ................
 0030 04030000 000e1004 03000000 0c061086  ................
 0040 02040900 0000c600                    ........

vsnfxp.o:     file format elf64-x86-64

Contents of section .eh_frame:
 0000 14000000 00000000 01000178 100c0708  ...........x....
 0010 90010000 00000000 2c000000 1c000000  ........,.......
 0020 00000000 00000000 10000000 00000000  ................
 0030 04030000 000e1004 03000000 0c061086  ................
 0040 02040900 0000c600                    ........

vsnfhp.o:     file format elf64-x86-64

Contents of section .eh_frame:
 0000 14000000 00000000 01000178 100c0708  ...........x....
 0010 90010000 00000000 2c000000 1c000000  ........,.......
 0020 00000000 00000000 10000000 00000000  ................
 0030 04030000 000e1004 03000000 0c061086  ................
 0040 02040900 0000c600                    ........

vsnfstd.o:     file format elf64-x86-64

Contents of section .eh_frame:
 0000 14000000 00000000 01000178 100c0708  ...........x....
 0010 90010000 00000000 2c000000 1c000000  ........,.......
 0020 00000000 00000000 10000000 00000000  ................
 0030 04030000 000e1004 03000000 0c061086  ................
 0040 02040900 0000c600                    ........

vsnfcse.o:     file format elf64-x86-64

Contents of section .eh_frame:
 0000 14000000 00000000 01000178 100c0708  ...........x....
 0010 90010000 00000000 2c000000 1c000000  ........,.......
 0020 00000000 00000000 10000000 00000000  ................
 0030 04030000 000e1004 03000000 0c061086  ................
 0040 02040900 0000c600                    ........

Indeed, these sections are identical.

  • Now let’s see the .text.unlikely section:
$ objdump -s -j text.unlikely vsnfcee.o vsnfxp.o vsnfhp.o vsnfstd.o vsnfcse.o

vsnfcee.o:     file format elf64-x86-64

Contents of section text.unlikely:
 0000 90909090 90909090 90909090 90909090  ................
 0010 66905548 89e5b840 00000048 89ec5dc3  f.UH...@...H..].

vsnfxp.o:     file format elf64-x86-64

Contents of section text.unlikely:
 0000 90909090 90909090 90909090 90909090  ................
 0010 66905548 89e5b800 01000048 89ec5dc3  f.UH.......H..].

vsnfhp.o:     file format elf64-x86-64

Contents of section text.unlikely:
 0000 90909090 90909090 90909090 90909090  ................
 0010 66905548 89e5b880 00000048 89ec5dc3  f.UH.......H..].

vsnfstd.o:     file format elf64-x86-64

Contents of section text.unlikely:
 0000 90909090 90909090 90909090 90909090  ................
 0010 66905548 89e5b804 00000048 89ec5dc3  f.UH.......H..].

vsnfcse.o:     file format elf64-x86-64

Contents of section text.unlikely:
 0000 90909090 90909090 90909090 90909090  ................
 0010 66905548 89e5b820 00000048 89ec5dc3  f.UH... ...H..].

At the first look, they seem identical … but they are not, the section highlighted in red is different, and as my platform is linux and so little-endian platform, we have to read the highlighted blocks like this:

  • vsnfstd.o : 0x00 04 = 4
  • vsnfcse.o 0x00 20 = 32
  • vsnfcee.o : 0x00 40 = 64
  • vsnfhp.o : 0x00 80 = 128
  • vsnfxp.o : 0x01 00 = 256

So Oracle have just some libraries that encore the edition, and this code seems to enable some option at the runtime depending on the value included in the library (which is much more secure than having a list of enabled options embedded in a library).

Another thing to mention, in my instance, no trace of Enterprise Edition nor Express and Personal Edition, even if they are coded in GV$INSTANCE view’s code.

UPDATE: It seems that Stefan Koehler (@OracleSK) has the same conclusion but using another method:

Tracing Oracle binaries and tools

Oracle is well known for its instrumentation: wait interface, log files and tracing capabilities.

Usually, lots of them are documented but the way you can trace binaries and tools are usually not. More, reading trace files is not an easy thing to do but some times you can find clues about the problem you face.

In this post, I will talk about how to enable tracing when you use some binaries.

RMAN

RMAN is the Oracle utility used to backup and recover your databases. To enable tracing is RMAN, you have to switch rman is “debug” mode.

This mode can be enabled directly in rman:

 


[oracle@oel7 ~]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Feb 24 15:45:13 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL (DBID=1413029491)

RMAN> debug on

RMAN-03036: Debugging set to level=9[MEDIUM], types=ALL

DBGSQL: TARGET> begin dbms_rcvman.setDebugOn; dbms_rcvman.dumpPkgState('Debug On'); end;
DBGSQL: sqlcode = 0
DBGRCVMAN: ENTERING dumpPkgState Debug On

.../...

Or (much better) it can be enabled with the debug option in the rman command, and by specifying a trace file :


[oracle@oel7 ~]$ rman target / debug trace=tracefile.trc

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Feb 24 15:22:10 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

RMAN-06005: connected to target database: ORCL (DBID=1413029491)

RMAN> disconnect

All the debug stuff is then located in your “tracefile.trc” file.

You can enable tracing on specific component in rman :

  • SQL: for tracing SQL queries generated by rman
  • RESTORE: for tracing restore operation only
  • RECOVER: for tracing recovery operation only
  • RESYNC: for tracing synchronization operation only
  • DUPLICATE: for tracing duplicate operation only

And for each or all of these component, you can set a verbosity level from 1 (lowest) to 10 (highest). 9 is the default verbosity level. Sometimes, you can change the verbosity level without any change in your tracefile.

Here are some examples (point that many components can be traced at the same time):


[oracle@oel7 ~]$ rman target / debug recover level=10 trace=tracefile.trc

[oracle@oel7 ~]$ rman target / debug sql level=1 trace=tracefile.trc

RMAN> debug restore,sql level=10

DGMGRL

DGMGRL is the tool used to control dataguard broker and operates some changes in your dataguard configuration: add config, initiate a switchover or a failover etc.

Dgmgrl can be switched to debug mode to get more information about a failed command.


[oracle@dg1 ~]$ dgmgrl -debug sys/oracle@prod_lille
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
[W000 02/24 16:14:15.62] Connecting to database using prod_lille.
[W000 02/24 16:14:15.65] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/24 16:14:15.65] Broker version is '11.2.0.4.0'
Connected.

This option can be very useful specially when you perform a switchover operation, all operations are logged and you can follow every connection, disconnection, tns errors, instance startup etc.

Below is an example of a successful switchover in debug mode:


DGMGRL> switchover to prod_arras;
Performing switchover NOW, please wait...
Operation requires a connection to instance "prod" on database "prod_arras"
Connecting to instance "prod"...
[W000 02/24 16:31:42.31] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dg2.localdomain)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=dg1.localdomain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=prod_arras_DGB)(INSTANCE_NAME=prod))).
[W000 02/24 16:31:42.33] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/24 16:31:42.33] Broker version is '11.2.0.4.0'
Connected.
New primary database "prod_arras" is opening...
Operation requires startup of instance "prod" on database "prod_lille"
Starting instance "prod"...
[W000 02/24 16:31:50.20] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dg1.localdomain)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=prod_lille_DGMGRL)(INSTANCE_NAME=prod)(SERVER=DEDICATED))).
[W000 02/24 16:31:50.21] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0

ORACLE instance started.
[W000 02/24 16:31:51.52] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dg1.localdomain)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=prod_lille_DGMGRL)(INSTANCE_NAME=prod)(SERVER=DEDICATED))).
[W000 02/24 16:31:51.54] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/24 16:31:51.55] Broker version is '11.2.0.4.0'
alter database mount

Database mounted.
[W000 02/24 16:31:59.85] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dg1.localdomain)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=prod_lille_DGMGRL)(INSTANCE_NAME=prod)(SERVER=DEDICATED))).
[W000 02/24 16:31:59.87] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/24 16:31:59.87] Broker version is '11.2.0.4.0'
Switchover succeeded, new primary is "prod_arras"

EXPDP / IMPDP

Datapump export and import operation can be traced too. But for these tools, Oracle published a note on how to enable tracing a datapump operation. The MOS note is “Export/Import DataPump Parameter TRACE – How to Diagnose Oracle Data Pump (Doc ID 286496.1)”.

To trace a datapump process, you can do it at different levels, client process (expdp/impdp), master process (DM), worker process (DW) and you can trace different services. That’s why Oracle used a TRACE parameter. This one in coded with a hexadecimal value that code for a type of services you want to trace, and this service will generate trace in different processes (DM, DW, etc.).

The Trace value depends on which service you want to trace:


-- Summary of Data Pump trace levels:
-- ==================================

  Trace   DM   DW  ORA  Lines
  level  trc  trc  trc     in
  (hex) file file file  trace                                         Purpose
------- ---- ---- ---- ------ -----------------------------------------------
  10300    x    x    x  SHDW: To trace the Shadow process (API) (expdp/impdp)
  20300    x    x    x  KUPV: To trace Fixed table
  40300    x    x    x  'div' To trace Process services
  80300    x            KUPM: To trace Master Control Process (MCP)      (DM)
 100300    x    x       KUPF: To trace File Manager
 200300    x    x    x  KUPC: To trace Queue services
 400300         x       KUPW: To trace Worker process(es)                (DW)
 800300         x       KUPD: To trace Data Package
1000300         x       META: To trace Metadata Package
------- 'Bit AND'
1FF0300    x    x    x  'all' To trace all components          (full tracing)

Here’s an example with full tracing:


[oracle@oel7 ~]$ expdp system/oracle directory=dpdir dumpfile=test.dmp logfile=test.log tables=HR.REGIONS REUSE_DUMPFILES=YES trace=FFF0300

In the trace files generated, you will find lots of very interesting stuff (for example below, the trace file of the expdp process)


*** 2016-02-24 18:25:33.740
*** SESSION ID:(135.7145) 2016-02-24 18:25:33.740
*** CLIENT ID:() 2016-02-24 18:25:33.740
*** SERVICE NAME:(SYS$USERS) 2016-02-24 18:25:33.740
*** MODULE NAME:(ude@oel7.localdomain (TNS V1-V3)) 2016-02-24 18:25:33.740
*** CLIENT DRIVER:() 2016-02-24 18:25:33.740
*** ACTION NAME:() 2016-02-24 18:25:33.740

SHDW: *** GET_STATUS_VERSION call ***
SHDW: Version = 9999
KUPP:18:25:33.741: Input trace/debug flags: 0FFF0300 = 268370688
KUPP:18:25:33.741: Current trace/debug flags: 0FFF0300 = 268370688
SHDW:18:25:33.741: Current user = SYSTEM
SHDW:18:25:33.741: Current schema = SYSTEM
SHDW:18:25:33.741: Current language = AMERICAN_AMERICA.AL32UTF8
SHDW:18:25:33.742: Current session address = 00000001B7D050F8
SHDW:18:25:33.742: *** OPEN call ***
SHDW:18:25:33.742: operation = EXPORT
SHDW:18:25:33.742: job_mode = table
SHDW:18:25:33.742: remote_link =
SHDW:18:25:33.742: job_name =
SHDW:18:25:33.742: version =
SHDW:18:25:33.742: compression = 2
KUPV:18:25:33.743: Open request for job: SYSTEM.SYS_EXPORT_TABLE_01
KUPV:18:25:33.745: Master Table create statement: CREATE TABLE "SYSTEM"."SYS_EXPORT_TABLE_01" (abort_step NUMBER,ancestor_process_order NUMBER,base_object_name VARCHAR2(30),base_object_schema VARCHAR2(30),base_object_type VARCHAR2(30),base_process_order NUMBER,block_size NUMBER,cluster_ok NUMBER,completed_bytes NUMBER,completed_rows NUMBER,completion_time DATE,control_queue VARCHAR2(30),creation_level NUMBER,cumulative_time NUMBER,data_buffer_size NUMBER,data_io NUMBER,dataobj_num NUMBER,db_version VARCHAR2(60),degree NUMBER,domain_process_order NUMBER,dump_allocation NUMBER,dump_fileid NUMBER,dump_length NUMBER,dump_orig_length NUMBER,dump_position NUMBER,duplicate NUMBER,elapsed_time NUMBER,error_count NUMBER,extend_size NUMBER,file_max_size NUMBER,file_name VARCHAR2(4000),file_type NUMBER,flags NUMBER,grantor VARCHAR2(30),granules NUMBER,guid RAW(16),in_progress CHAR(1),instance VARCHAR2(60),instance_id NUMBER,is_default NUMBER,job_mode VARCHAR2(21),job_version VARCHAR2(60),last_file NUMBER,last_update DA
TE,load_method NUMBER,metadata_buffer_size NUMBER,metadata_io NUMBER,name VARCHAR2(30),object_int_oid VARCHAR2(32),object_long_name VARCHAR2(4000),object_name VARCHAR2(200),object_number NUMBER,object_path_seqno NUMBER,object_row NUMBER,object_schema VARCHAR2(30),object_tablespace VARCHAR2(30),object_type VARCHAR2(30),object_type_path VARCHAR2(200),old_value VARCHAR2(4000),operation VARCHAR2(8),option_tag VARCHAR2(30),orig_base_object_schema VARCHAR2(30),original_object_name VARCHAR2(128),original_object_schema VARCHAR2(30),packet_number NUMBER,parallelization NUMBER,parent_process_order NUMBER,partition_name VARCHAR2(30),phase NUMBER,platform VARCHAR2(101),process_name VARCHAR2(30),process_order NUMBER,processing_state CHAR(1),processing_status CHAR(1),property NUMBER,queue_tabnum NUMBER,remote_link VARCHAR2(128),scn NUMBER,seed NUMBER,service_name VARCHAR2(64),size_estimate NUMBER,start_time DATE,state VARCHAR2(12),status_queue VARCHAR2(30),subpartition_name VARCHAR2(30),target_xml_clob CLOB,tde_rewrapped_
key RAW(2000),template_table VARCHAR2(30),timezone VARCHAR2(64),total_bytes NUMBER,trigflag NUMBER,unload_method NUMBER,user_directory VARCHAR2(4000),user_file_name VARCHAR2(4000),user_name VARCHAR2(30),value_n NUMBER,value_t VARCHAR2(4000),version NUMBER,work_item VARCHAR2(21),xml_clob CLOB) SEGMENT CREATION IMMEDIATE INITRANS 10
KUPV:18:25:33.780: (Re)Creating job: SYSTEM.SYS_EXPORT_TABLE_01

.../...

SRVCTL

srvctl is the program that control clusterware resources. You can use it in a RAC configuration, or with Oracle Restart.

If, for example, a ressource doesn’t start in your cluster, or you would like to have more information about the process. You can set a specific environment variable named SRVM_TRACE to the value you want and then run your command. srvctl just look if the variable is defined and then it traces. The debug information is written in the standard output.

See demonstration below:


[oracle@rac1 ~]$ env|grep SRVM
[oracle@rac1 ~]$ srvctl status database -db cdb
Instance cdb1 is running on node rac1
Instance cdb2 is not running on node rac2
[oracle@rac1 ~]$ export SRVM_TRACE=laurent
[oracle@rac1 ~]$ srvctl status database -db cdb
[main] [ 2016-02-24 18:43:08.601 CET ] [OPSCTLDriver.setInternalDebugLevel:325] tracing is true at level 2 to file null
[main] [ 2016-02-24 18:43:08.607 CET ] [OPSCTLDriver.main:162] SRVCTL arguments : args[0]=status args[1]=database args[2]=-db args[3]=cdb
[main] [ 2016-02-24 18:43:08.608 CET ] [SRVMContext.init:114] Performing SRVM Context init. Init Counter=1
[main] [ 2016-02-24 18:43:08.622 CET ] [Library.load:194] library.load
[main] [ 2016-02-24 18:43:08.623 CET ] [sPlatform.isHybrid:66] osName=Linux osArch=amd64 JVM=64 rc=false
[main] [ 2016-02-24 18:43:08.624 CET ] [sPlatform.isHybrid:66] osName=Linux osArch=amd64 JVM=64 rc=false
[main] [ 2016-02-24 18:43:08.624 CET ] [Library.load:271] Loading library /u01/app/oracle/product/12.1.0/dbhome_1/lib/libsrvmhas12.so
[main] [ 2016-02-24 18:43:08.635 CET ] [SRVMContext.init:131] SRVM Context init-ed
[main] [ 2016-02-24 18:43:08.648 CET ] [Version.isPre:610] version to be checked 12.1.0.2.0 major version to check against 10
[main] [ 2016-02-24 18:43:08.648 CET ] [Version.isPre:621] isPre.java: Returning FALSE
[main] [ 2016-02-24 18:43:08.651 CET ] [OCR.loadLibrary:328]
Inside constructor of OCR
[main] [ 2016-02-24 18:43:08.652 CET ] [OCR.loadLibrary:336] Going to load the ocr library

.../...

[oracle@rac1 ~]$ unset SRVM_TRACE
[oracle@rac1 ~]$ srvctl status database -db cdb
Instance cdb1 is running on node rac1
Instance cdb2 is not running on node rac2

OPATCH

Opatch can be traced with an environment variable positioned like srvctl. This variable is OPATCH_DEBUG and it has to be set to TRUE. As for srvctl, information is written to standard output:


[oracle@rac1 ~]$ export OPATCH_DEBUG=TRUE
[oracle@rac1 ~]$ $ORACLE_HOME/OPatch/opatch lsinv
OPatch was not able to set FMW_COMPONENT_HOME by itself.
Machine Info: Linux rac1.localdomain 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
_osArch is amd64
_javaVMSpecVersion is 1.0
_javaVMSpecVendor is Sun Microsystems Inc.
_javaVMSpecName is Java Virtual Machine Specification
_javaVMVendor is Sun Microsystems Inc.
_javaJRESpecVersion is 1.6
_javaJRESpecVendor is Sun Microsystems Inc.
_javaJRESpecName is Java Platform API Specification
_javaSupportedClassVersion is 50.0
OPatch compiled with major version: 50, minor version: 0
_osArch (from OCM API) is
/u01/app/oracle/product/12.1.0/dbhome_1/jdk/bin/java -Xms150m -Xmx256m -XX:MaxPermSize=128M -cp /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/lib/emocmclnt.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/OraInstaller.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/OraPrereq.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/OraCheckPoint.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/share.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/orai18n-mapping.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/xmlparserv2.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/emCfg.jar:/u01/app/oracle/product/12.1.0/dbhome_1/oui/jlib/ojmisc.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/jlib/opatch.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/jlib/opatchsdk.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/automation.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/apache-commons/commons-cli-1.0.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/jlib/oracle.opatch.classpath.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/jaxb/activation.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/jaxb/jaxb-api.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/jaxb/jaxb-impl.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/jaxb/jsr173_1.0_api.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/OsysModel.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/osysmodel-utils.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/CRSProductDriver.jar:/u01/app/oracle/product/12.1.0/dbhome_1/OPatch/oplan/jlib/oracle.oplan.classpath.jar -DOPatch.ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 -DOPatch.DEBUG=true -DOPatch.RUNNING_DIR=/u01/app/oracle/product/12.1.0/dbhome_1/OPatch -DOPatch.MW_HOME= -DOPatch.WL_HOME= -DOPatch.COMMON_COMPONENTS_HOME= -DOPatch.OUI_LOCATION= -DOPatch.FMW_COMPONENT_HOME= -DOPatch.OPATCH_CLASSPATH= -DOPatch.WEBLOGIC_CLASSPATH= oracle/opatch/OPatch lsinv -invPtrLoc /u01/app/oracle/product/12.1.0/dbhome_1/oraInst.loc
Oracle Interim Patch Installer version 12.1.0.1.3
Copyright (c) 2016, Oracle Corporation. All rights reserved.

OPatchSession::parse() on "lsinv","-invPtrLoc","/u01/app/oracle/product/12.1.0/dbhome_1/oraInst.loc",

Argument is "lsinv"
Add commands for Help
add command "apply"

.../...

Oracle Kernel Tracing

Oracle Kernel tracing can be done by enabling some events.

Event is a number between 10000 and 10999. It’s very similar to an error code … but it’s not a real one.

The most popular events are 10046 (SQL Trace) and 10053 (CBO debug trace) …but there are plenty of other ones and you can find details in oraus.msg file (or you can plan with oerr utility).

Events can be positioned by many ways :

  • With an ALTER SESSION (It’s not recommended to use ALTER SYSTEM when you enable events)
SQL> alter session set events '10046 trace name context forever, level 12';
SQL> alter session set events '10046 trace name context off';
  • With an oradebug command. This method offers the possibility to set an event in another session

For example, if you want to enable 10053 event in the session that have the OS PID 5683 (You can use SETMYPID for debugging your own session, or SETORAPID to target a session with a specific Oracle PID)


SQL> oradebug setospid 5683; 
SQL> oradebug event 10053 trace name context forever, level 1;
SQL> oradebug event 10053 trace name context off;
  • With DBMS_SYSTEM.SET_EV. Like ORADEBUG, you can set an event in another session

 


SQL> -- Enable SQL Trace in session with SID 6278 and Serial 71077
SQL> execute dbms_system.set_ev(si=>6278, se=>71077, ev=>10046, le=>12, nm=>'');
SQL> -- Disable SQL Trace in session with SID 6278 and Serial 71077
SQL> execute dbms_system.set_ev(si=>6278, se=>71077, ev=>10046, le=>0, nm=>'');

SQL> -- Enable CBO debug in session with SID 6278 and Serial 71077
SQL> execute dbms_system.set_ev(si=>6278, se=>71077, ev=>10053, le=>1, nm=>'');
SQL> -- Disable CBO debug in session with SID 6278 and Serial 71077
SQL> execute dbms_system.set_ev(si=>6278, se=>71077, ev=>10046, le=>0, nm=>'');

For all of these commands, the trace file will be available in the folder referenced by user_dump_dest.

Starting with Oracle 11.2, Oracle come with a new way to trace its components. This is named UTS (Unified Tracing System).

UTS offers the possibility to trace some components (like with events) but you can combine it with filters (for example, trace CBO computation when a specific SQL_ID is parsed etc.)

Note: Events are still available and they have’nt been replaced by UTS … UTS completes them.

The component list can be get with “oradebug doc component” command.

I will not write some detailed stuff about UTS because very interesting guys in the community have already written blog posts and articles on this.

You can visit them by following these URLs :

 

Last but not least, tracing everything without knowing what you are doing can be dangerous (Filesystem full filling, database hangs and/or crash), so it’s recommended to use tracing very carefully.

 

Thanks to my friend Pierre Labrousse who helped me for parts of this post.

SIMD Extensions in and out Oracle 12.1.0.2

First of all, I would like to thank Tanel Pöder from Enkitec Accenture for its review of this post and some precious information he gave me.

—-

Recently I posted a link on twitter which explains basics of SIMD Programming (https://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html), and I had a reply which asked me if it was Oracle 12c style, and the answer is … yes and no.

What is a SIMD extension?

A SIMD Extension is a CPU instruction that computes many data in only one instruction (Single Instruction Multiple Data). Imagine, you have 2 arrays of 4 integers, and you want to compute a sum of those 2 arrays. A classical way will be to loop on each value and add them one by one and to get the result in another array. This operation will produce 4 operations.

Now Imagine, your arrays are now located in a vector of 4 integers, those 2 vectors are in fact specific registers and with only one CPU instruction, you will add those 2 vectors by producing only one vector. You reduce CPU instructions by 4 … for the same result.

If it’s not clear, don’t go away … I have written small C sample code to demonstrate this.

A bit of history

SIMD extensions are not quite recent. They have been created in 1970 with vector programming.

In 1996, SIMD extensions have been widely deployed with MMX extensions (which are SIMD extensions), then Alvitec systems with motorola processors and IBM Power systems have developed more powerful instructions. Then Intel reveals its new SSE extensions in 1999 that have been improved by other extension SSE2, SSE3, SSSE3, SSE4 and now AVX, AVX2 and AVX512 extensions.

So Oracle is not using a specific extension but those which are available on your platform, because all CPUs are not offering the same extensions. For example, modern processors have AVX extensions, but most recent extension (AVX-512) are only available in Xeon Phi Knights Landing and Xeon Skylake microarchitectures (broadwell successors).

Data Structures

SIMD extensions are based on data structures or vectors.

A vector is an array data structure (don’t be confused with an array datatype) which have a fixed length and which is, in fact, a succession of scalars of one type.

For example, if you have a vector of 64 bits (8 bytes), you can put in it 2 integers because an integer has a 4 bytes size (in x86-64 arch), 8 chars (1 bytes) but only one double (8 bytes long).

Those data structures are located is CPU registers dedicated for those SIMD instructions.

Let’s take an example, you want to process the sum of two vectors in a processor which uses only MMX instructions (old one 😉 ) have 8 registers (MM0 through MM7). Each register holds 64 bits.

First vector content is 1,2 and second one is 1,2. First vector is copied from memory to MM0 register and the second in MM1, and then the CPU launch the SIMD instruction that will produce in MM0 the sum on MM1 and MM0, and then MM0 is copied in memory as a result.

Now imagine, your vector doesn’t hold 64 bits but 128, 256, 512 or 1024 … you will put in it more data and those data will be computed with only one operation …

It’s one of the key of SIMD evolution, MMX uses 64 bits registers (MM0 to MM7), SSE (1/2/3 and 4) uses 128 bits registers (XMM), AVX (1/2) uses 256 bits registers (YMM), and AVX-512 uses 512 bits registers (ZMM).

For Intel processors, vector datatypes are __m64, __mm128, __mm256, and __mm512 (each vector will contain floating point value aka float), you have the equivalent for double precision values (__mm128d, __mm256d, __mm512d) and for other types : int, short, char (__mm128i, __mm256i, __mm512i).

Note: Note that all those types are automatically aligned on a 8, 16, 32 or 64 bytes boundaries.

Now computing data

Now you know how will be computed your data, you can perform operation on it. You can add, multiply your vectors, perform bit shifting etc.

You have the choice to do “classical” operations, or you can use Intel’s intrinsics which are functions which computes a specific operation (basic mathematics, bit shifting, comparisons etc.). All of Intel’s Intrinsics are available at this URL: https://software.intel.com/sites/landingpage/IntrinsicsGuide/. On this page you can also see performance information of each function on different processors.

 Examples

For all examples above, I used C langage.

Compiling “SIMD aware” programs (with GCC)

If you want to compile SIMD aware program, you have to include “immintrin.h” header file which is available with GCC. This header will test which extension you have, and you have used for you compilation. (Just find this file and open it). Depending on your CPU and compilation, it will include another header file:

  • mmintrin.h for MMX instructions and datatypes:
  • xmmintrin.h for SSE
  • emmintrin.h for SSE2
  • pmmintrin.h for SSE3
  • tmmintrin.h for SSSE3
  • smmintrin.h for SSE4.1 and SSE4.2
  • avxintrin.h for AVX

When you compile your program, some extensions are not included by default. Indeed if your CPU supports AVX extensions, if you don’t give the correct option to the compiler, AVX won’t be used.

Main options are:

  • O3: this option enable vectorization loops optimization.
  • msse4.1: this option enable SSE4.1 extension
  • msse4.2: this option enable SSE4.2 extension
  • mavx: this option enable AVX extension
  • mavx2: this option enable AVX2 extension

Other options are available here: https://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/i386-and-x86_002d64-Options.html

To demonstrate this, I used a small program:


#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>

void print_extensions () {
#ifdef __MMX__
printf("MMX ... OK\n");
#else
printf("MMX ... KO\n");
#endif

#ifdef __SSE__
printf("SSE ... OK\n");
#else
printf("SSE ... KO\n");
#endif

#ifdef __SSE2__
printf("SSE2 ... OK\n");
#else
printf("SSE2 ... KO\n");
#endif

#ifdef __SSE3__
printf("SSE3 ... OK\n");
#else
printf("SSE3 ... KO\n");
#endif

#ifdef __SSSE3__
printf("SSSE3 ... OK\n");
#else
printf("SSSE3 ... KO\n");
#endif

#if defined (__SSE4_2__) || defined (__SSE4_1__)
printf("SSE4_1/2 ... OK\n");
#else
printf("SSE4_1/2 ... KO\n");
#endif

#if defined (__AES__) || defined (__PCLMUL__)
printf("AES/PCLMUL ... OK\n");
#else
printf("AES/PCLMUL ... KO\n");
#endif

#ifdef __AVX__
printf("AVX ... OK\n");
#else
printf("AVX ... KO\n");
#endif
}

int main(int argc, char** argv) {
print_extensions();
return 0;
}

If you run it with only O3 optimization, you will get this result:


macbook-laurent:simd $ sysctl -a | egrep 'cpu.*features'
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ENFSTRG RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM
machdep.cpu.extfeatures: SYSCALL 1GBPAGE EM64T LAHF RDTSCP TSCI

macbook-laurent:simd $ cc -O3 -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... KO
AES/PCLMUL ... KO
AVX ... KO

If you run with correct options, your program can use AVX or SSE4 extensions:

macbook-laurent:simd $ cc -O3 -msse4.2 -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... OK
AES/PCLMUL ... KO
AVX ... KO
macbook-laurent:simd $ cc -O3 -mavx -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... OK
AES/PCLMUL ... KO
AVX ... OK

Note that if you enable AVX extension, SSE4 extensions are enabled by default.

Example of SSE2 usage in a basic operation (sum)

The C code above will show you how to perform a sum of two arrays of 16 integers each without using Intel intrinsics:


void func2_sse() {
int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
int b[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
__m128i* aptr;
__m128i* bptr;
int i;
int loopcnt=0;
printf("sizeof(__m128i)=%lu\n",sizeof(__m128i));
printf("sizeof(a)=%lu\n",sizeof(a));

// Above, we cast integer arrays to vectors of integers

aptr=(__m128i*)a;
bptr=(__m128i*)b;

// and now we compute the sum
for (i=0;i<sizeof(a)/sizeof(__m128i);i++) {
loopcnt++;
bptr[i]=aptr[i]+bptr[i];
}

int* c=(int*)bptr;

printf("loopcount = %d\nresult= ",loopcnt);
for (i=0;i<16;i++) {
printf("%d ",c[i]);
}
printf("\n");
}

and the result, my sum has been computed in only 4 loops:


SSE
--------------------
sizeof(__m128i)=16
sizeof(a)=64
loopcount = 4
result= 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Same example with AVX extension:


void func2_avx() {
 int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
 int b[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
 __m256i* aptr;
 __m256i* bptr;
 int i;
 int loopcnt=0;
 printf("sizeof(__m256i)=%lu\n",sizeof(__m256i));
 printf("sizeof(a)=%lu\n",sizeof(a));
 aptr=(__m256i*)a;
 bptr=(__m256i*)b;

 for (i=0;i<sizeof(a)/sizeof(__m256i);i++) {
 loopcnt++;
 bptr[i]=aptr[i]+bptr[i];
 }

 int* c=(int*)bptr;

 printf("loopcount = %d\nresult= ",loopcnt);
 for (i=0;i<16;i++) {
 printf("%d ",c[i]);
 }
 printf("\n");
}

and the result, my sum has been computed in only 2 loops:


AVX
--------------------
sizeof(__m256i)=32
sizeof(a)=64
loopcount = 2
result= 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

 Now, let’s compare two data sets with SIMD extension

Next code sample concerns a vector where we want to search the value 10. To do that, we use a comparison function and a function to build a 256bits (AVX) vector full of the value we search. The comparison function works with 32bits packets (useful to compare integers) and returns 0xFFFFFFFF if both values are equal, 0x0 otherwise. As it’s an AVX function, our initial vector composed by 16 values is processed in only 2 CPU cycles.

void func2_compare_32bitsPack() {
    int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
    __m256i* aptr;
    __m256i b;
    int i;
    int loopcnt=0;
    aptr=(__m256i*)a;
    // b is a vector full off int(32bits) equal to 10 (the value we search)
    b=_mm256_set1_epi32(10);

    for (i=0;i<sizeof(a)/sizeof(__m256i);i++) {
        loopcnt++;
        // comparison intrinsic function: packed by 32 bits(specific for int: if equal set 0xFFFFFFFF, 0x0 otherwise)
        aptr[i]=_mm256_cmpeq_epi32(aptr[i],b);
    }

    // print results
    int* c=(int*)aptr;

    printf("loopcount = %d\nresult= ",loopcnt);
    for (i=0;i<16;i++) {
        printf("0x%x   ",c[i]);
    }
    printf("\n");
}

And the result:


macbook-laurent:simd $ ./simd
Comparison
loopcount = 2
result= 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xffffffff 0x0 0x0 0x0 0x0 0x0 0x0

It becomes easy to identify that the value 10 is located at the index 10 in our initial array.

Ok, and how SIMD extensions are used in Oracle 12c In Memory ?

If you have read my last post on how to activate SSE4 extensions on VirtualBox guests (https://laurent-leturgez.com/2015/04/14/enable-simd-sse4-extension-in-oracle-virtualbox/) , and Tanel Pöder’s post (https://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/), you have noticed that Oracle can run IM with only SSE2 extension (default), but if your CPUs have SSE4, or AVX extensions, Oracle will use some specific libraries that uses SSE4 (libshpksse4212.so) and AVX (libshpkavx12.so).

If we have a look at functions in those libraries, we will see that every function starts with “kdzk”

[oracle@oel64-112 ~]$ readelf -a /u01/app/oracle/product/12.1.0/dbhome_1/lib/libshpksse4212.a | grep FUNC
 6: 0000000000000030 256 FUNC LOCAL DEFAULT 3 kdzk_overload_opc_name
 23: 0000000000000130 80 FUNC LOCAL DEFAULT 3 kdzk_flag_name
 26: 0000000000000180 112 FUNC LOCAL DEFAULT 3 kdzk_enc_name
 31: 00000000000001f0 320 FUNC LOCAL DEFAULT 3 kdzk_datawidth_name
 64: 0000000000002b70 544 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_1bit
 65: 0000000000002d90 544 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_1bit
 66: 0000000000002fb0 544 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_1bit
 67: 00000000000031d0 592 FUNC LOCAL DEFAULT 3 kdzk_le_dict_1bit
 68: 0000000000003420 592 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_1bit
 69: 0000000000003670 544 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_1bit
 70: 0000000000003890 224 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_1bit
 71: 0000000000003970 576 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_1bit
 72: 0000000000003bb0 576 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_1bit
 73: 0000000000003df0 992 FUNC LOCAL DEFAULT 3 kdzk_ge_le_dict_1bit
 74: 00000000000041d0 512 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_1bit_null
 75: 00000000000043d0 192 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_1bit_null
 76: 0000000000004490 512 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_1bit_null
 77: 0000000000004690 512 FUNC LOCAL DEFAULT 3 kdzk_le_dict_1bit_null
 78: 0000000000004890 464 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_1bit_null
 79: 0000000000004a60 512 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_1bit_null
 80: 0000000000004c60 192 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_1bit_null
 81: 0000000000004d20 528 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_1bit_null
 82: 0000000000004f30 192 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_1bit_null
 83: 0000000000004ff0 528 FUNC LOCAL DEFAULT 3 kdzk_ge_le_dict_1bit_null
 84: 0000000000005200 848 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_2bit_selecti
 85: 0000000000005550 960 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_2bit
 89: 0000000000005910 848 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_2bit_selecti
 90: 0000000000005c60 1056 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_2bit
 91: 0000000000006080 848 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_2bit_selecti
 92: 00000000000063d0 1008 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_2bit
 93: 00000000000067c0 848 FUNC LOCAL DEFAULT 3 kdzk_le_dict_2bit_selecti
 94: 0000000000006b10 1024 FUNC LOCAL DEFAULT 3 kdzk_le_dict_2bit
 95: 0000000000006f10 848 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_2bit_selecti
 96: 0000000000007260 1056 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_2bit
 97: 0000000000007680 848 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_2bit_selecti
 98: 00000000000079d0 960 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_2bit
 99: 0000000000007d90 928 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_2bit_sele
 100: 0000000000008130 1328 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_2bit
 101: 0000000000008660 928 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_2bit_sele
 102: 0000000000008a00 1296 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_2bit
 103: 0000000000008f10 928 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_2bit_sele
 104: 00000000000092b0 1328 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_2bit</pre>

kdzk is the Oracle component that manages compression:


SQL> oradebug doc components

.../...

Components in library ADVCMP:
--------------------------
 ADVCMP_MAIN Archive Compression (kdz)
 ADVCMP_COMP Archive Compression: Compression (kdzc, kdzh, kdza)
 ADVCMP_DECOMP Archive Compression: Decompression (kdzd, kdzs)
 ADVCMP_DECOMP_HPK Archive Compression: HPK (kdzk)
 ADVCMP_DECOMP_PCODE Archive Compression: Pcode (kdp)

An interesting thing to see is that, even you use an Oracle Kernel without any SSE4 nor AVX extension active (so your process doesn’t use libshpksse4212.so nor libshpkavx12.so library), you use kdz functions when you query and filter a table which is managed in Memory.

In a session I run the statements above:


SQL> select segment_name,BYTES,BYTES_NOT_POPULATED from v$im_segments

SEGMENT_NAME         BYTES         BYTES_NOT_POPULATED
-------------------- ------------- -------------------
S                         37748736                   0

SQL> select spid from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

SPID
------------------------
3619

SQL> select count(*) from s where amount_sold>1700;

Just before launching the command, I attach my process and run gdb to catch every call to kdz functions:


[oracle@oel64-112 ~]$ pmap -x 3619 | egrep 'sse|avx'

[oracle@oel64-112 ~]$ gdb -pid 3619
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
.../...
Loaded symbols for /u01/app/oracle/product/12.1.0/dbhome_1/lib/libnque12.so
0x000000362ea0e740 in __read_nocancel () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.4.x86_64 libaio-0.3.107-10.el6.x86_64 numactl-2.0.7-8.el6.x86_64
(gdb) rbreak ^kdz

.../...

(gdb) commands
Type commands for breakpoint(s) 1-2165, one per line.
End with a line saying just "end".
>continue
>end

If you study the output, you will see that a lot of functions are called, and in the list, you will find some interesting functions: kdzdcol_get_minval, kdzdcol_get_maxval, kdzk_build_vector etc. Oracle clearly uses vectors to process IM compression units.

In my opinion, it’s normal to use functions related to compression because the kernel manipulates “Compression Units”, and it should integrates SIMD functions in its libraries.

A last curiosity with Oracle 12c (12.1.0.2)

Ok now you had a look to your installation, your machine is “AVX enabled”, and Oracle processes uses the AVX compatible library (libshpkavx212.so), everything is OK and you think you will use all this stuff.

But if you use objdump on this library, and you search for AVX registers, you won’t find anything:


[oracle@oel64-112 ~]$ grep -i ymm objdump_out.1 | wc -l
0

Tanel Pöder gave me the answer !!! Oracle database code is compiled to be compatible with Redhat/Oracle Linux 5, so it must be compatible with kernel 2.6.18. But linux scheduler can work with YMM registers from version 2.6.30 onwards.

You can use new instructions without the kernel knowing about us, but you can’t use registers that are not yet supported by the kernel.

I think next version of Oracle will improve this, maybe in 12.2.

To conclude, there is not Oracle 12c style for SIMD instructions. Oracle has developed functions that uses SIMD instructions, for Intel CPUs, they uses SSE, SSE2, SSE3, SSE4 or AVX depending on the CPU architecture, on IBM AIX these libraries use VMX extension (SIMD instruction on Power) etc.

Sources:

http://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/

http://en.wikipedia.org/wiki/Data_structure_alignment

http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

http://en.wikipedia.org/wiki/SIMD

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

https://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html

https://laurent-leturgez.com/2015/04/14/enable-simd-sse4-extension-in-oracle-virtualbox/