Data … as usual

All things about data by Laurent Leturgez

Category Archives: 12c

Oracle 12c, VMWare Fusion and the perl binary’s segmentation fault

Recently I have installed a new Virtual Machine in VM Ware Fusion 7 Pro under Oracle Enterprise Linux 6.6 for running Oracle 12c Grid Infrastructure and database (12.1.0.2). Mac OS X is a 10.9.5 (Mavericks). As usual for my tests environment, I deploy the oracle-rdbms-server-12cR1-preinstall rpm package to be sure everything will be correctly configured on my system. During the installation process, there was a first error during link edition of rman (“error invoking target irman ioracle of makefile /u01/app/oracle/product/12.1.0/grid/rdbms/lib/ins_rdbms.mk”). This problem can be fixed by copying the libjavavm12.a library located in $ORACLE_HOME/javavm/jdk/jdk7/lib to $ORACLE_HOME/lib directory, and then retry the link edition by clicking on “Retry” button. So, if you don’t use Grid Infrastructure here or if you install only the database kernel … you won’t have any problem. Maybe, you will encountered the problems detailed above if you create a mutitenant (or single tenant) database (because it needs to run catcon.pl script … but we will see why later). In my case, I was deploying Grid Infrastructure and at the end of the installation process, you have to run root.sh script to configure the CRS stack. At this step, the root.sh script fails with an error on  roothas.pl script and a cute “Segmentation fault (core dumped)” error:


Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
/u01/app/oracle/product/12.1.0/grid/crs/config/rootconfig.sh: line 131: 4562 Segmentation fault (core dumped) $ROOTSCRIPT $ROOTSCRIPT_ARGS
The command '/u01/app/oracle/product/12.1.0/grid/perl/bin/perl -I/u01/app/oracle/product/12.1.0/grid/perl/lib -I/u01/app/oracle/product/12.1.0/grid/crs/install /u01/app/oracle/product/12.1.0/grid/crs/install/roothas.pl ' execution failed

After analyzing the error, I get the cause of my problem which is located on the perl binary delivered with Oracle. If I run a simple “perl -v”, I got the error.


[oracle@oel6 bin]$ pwd
/u01/app/oracle/product/12.1.0/grid/perl/bin
[oracle@oel6 bin]$ ./perl -v
Segmentation fault (core dumped)

If we go deeper, gdb shows us an error located on PerlIO functions and specifically on PerlIO_default_layers function:


[oracle@oel6 bin]$ gdb perl
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /u01/app/oracle/product/12.1.0/grid/perl/bin/perl...(no debugging symbols found)...done.
(gdb) r
Starting program: /u01/app/oracle/product/12.1.0/grid/perl/bin/perl
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x0000000000531399 in PerlIO_default_layers ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.7.x86_64 libgcc-4.4.7-11.el6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64
(gdb) bt
#0 0x0000000000531399 in PerlIO_default_layers ()
#1 0x000000000053070e in PerlIO_resolve_layers ()
#2 0x000000000053050f in PerlIO_openn ()
#3 0x0000000000530206 in PerlIO_fdopen ()
#4 0x0000000000530198 in PerlIO_stdstreams ()
#5 0x0000000000530150 in Perl_PerlIO_stdin ()
#6 0x0000000000435f09 in S_parse_body ()
#7 0x00000000004343bb in perl_parse ()
#8 0x000000000041cf13 in main ()

IMHO, it’s a specific problem on WMWare fusion because on a VM hosted on Virtualbox there’s no problem, and more :

  • If I copy the working perl binary from the virtualbox VM to the VMWare VM … execution failed on VMWare
  • If I copy the failing perl binary from the VMWare VM to the virtualbox VM … it works fine

At the time I write this post, I have asked to a friend of mine if he can reproduced the problem on a VM hosted on an ESX server, I will update this post as soon as I will get his results. I made a lot of different tests (Downgrading VMWare fusion to Fusion 6 Pro, Installing OEL 6.4, OEL 7.0, Redhat 6.5 and 7, creating my vmdk’s on another disk, changing scsi driver etc … always the same problem). I read on many blogs, people have faced the same problem and have symlinked the failing perl to the system perl binary which runs fine. Doing this can produce different issues because of the INC directory which is not the same, you can encountered some errors due to version compatibility between modules in $OH/perl/lib directory and the perl binary. Indeed Oracle 12cR1 uses a perl v5.14.1 and system perl binary is 5.10 on OEL/RHEL 6, and 5.16 on OEL/RHEL7. I workarounded the problem by recreating the perl binary from sources, but you need to be careful of the $OH/perl/lib directory, because it contains all the perl modules needed by Oracle. See above, the different steps to compile perl binary and replace it in $ORACLE_HOME:


export ORACLE_HOME=/u01/app/oracle/product/12.1.0/grid
cd ~
rm -rf perl
mkdir perl
cd perl/
curl -O http://www.cpan.org/src/5.0/perl-5.14.1.tar.gz
tar -xvzf perl-5.14.1.tar.gz

cd $ORACLE_HOME
mv perl/ perl.OLD
mkdir perl
cd /home/oracle/perl/perl-5.14.1
./Configure -des -Dprefix=$ORACLE_HOME/perl -Doptimize=-O3 -Dusethreads -Duseithreads -Duserelocatableinc ; make clean ; make ; make install
cd $ORACLE_HOME/perl
rm -rf lib/ man/
cp -r ../perl.OLD/lib/ .
cp -r ../perl.OLD/man/ .
cp ../perl.OLD/bin/dbilogstrip bin/
cp ../perl.OLD/bin/dbiprof bin/
cp ../perl.OLD/bin/dbiproxy bin/
cp ../perl.OLD/bin/ora_explain bin/

Once these steps are done and you have a running perl binary, you can safely run your root.sh script, create multitenant databases, and all the stuff that needs $ORACLE_HOME/bin/perl.

UPDATE: I have just downloaded the last release of Oracle Virtualbox (5.0) on my Macbook and the same problems occurs. They can be resolved by using the same method described in this post.

UPDATE2 : I would like to thank Deiby Gomez for his trick. Indeed, in my original post I forgot to keep some perl binaries that could be useful 😉 : dbilogstrip, dbiprof, dbiproxy and ora_explain. The script above has been modified.

UPDATE3:  Danny Bryant from Enkitec has worked closely with Oracle VM and virtual box team and they discovered the origin of the bug. This one should be fixed in a next release. But, if you use Virtualbox 5.0, Danny published a workaround, you can read it following this link: http://dbaontap.com/2016/01/13/vbox5/

 

SIMD Extensions in and out Oracle 12.1.0.2

First of all, I would like to thank Tanel Pöder from Enkitec Accenture for its review of this post and some precious information he gave me.

—-

Recently I posted a link on twitter which explains basics of SIMD Programming (https://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html), and I had a reply which asked me if it was Oracle 12c style, and the answer is … yes and no.

What is a SIMD extension?

A SIMD Extension is a CPU instruction that computes many data in only one instruction (Single Instruction Multiple Data). Imagine, you have 2 arrays of 4 integers, and you want to compute a sum of those 2 arrays. A classical way will be to loop on each value and add them one by one and to get the result in another array. This operation will produce 4 operations.

Now Imagine, your arrays are now located in a vector of 4 integers, those 2 vectors are in fact specific registers and with only one CPU instruction, you will add those 2 vectors by producing only one vector. You reduce CPU instructions by 4 … for the same result.

If it’s not clear, don’t go away … I have written small C sample code to demonstrate this.

A bit of history

SIMD extensions are not quite recent. They have been created in 1970 with vector programming.

In 1996, SIMD extensions have been widely deployed with MMX extensions (which are SIMD extensions), then Alvitec systems with motorola processors and IBM Power systems have developed more powerful instructions. Then Intel reveals its new SSE extensions in 1999 that have been improved by other extension SSE2, SSE3, SSSE3, SSE4 and now AVX, AVX2 and AVX512 extensions.

So Oracle is not using a specific extension but those which are available on your platform, because all CPUs are not offering the same extensions. For example, modern processors have AVX extensions, but most recent extension (AVX-512) are only available in Xeon Phi Knights Landing and Xeon Skylake microarchitectures (broadwell successors).

Data Structures

SIMD extensions are based on data structures or vectors.

A vector is an array data structure (don’t be confused with an array datatype) which have a fixed length and which is, in fact, a succession of scalars of one type.

For example, if you have a vector of 64 bits (8 bytes), you can put in it 2 integers because an integer has a 4 bytes size (in x86-64 arch), 8 chars (1 bytes) but only one double (8 bytes long).

Those data structures are located is CPU registers dedicated for those SIMD instructions.

Let’s take an example, you want to process the sum of two vectors in a processor which uses only MMX instructions (old one 😉 ) have 8 registers (MM0 through MM7). Each register holds 64 bits.

First vector content is 1,2 and second one is 1,2. First vector is copied from memory to MM0 register and the second in MM1, and then the CPU launch the SIMD instruction that will produce in MM0 the sum on MM1 and MM0, and then MM0 is copied in memory as a result.

Now imagine, your vector doesn’t hold 64 bits but 128, 256, 512 or 1024 … you will put in it more data and those data will be computed with only one operation …

It’s one of the key of SIMD evolution, MMX uses 64 bits registers (MM0 to MM7), SSE (1/2/3 and 4) uses 128 bits registers (XMM), AVX (1/2) uses 256 bits registers (YMM), and AVX-512 uses 512 bits registers (ZMM).

For Intel processors, vector datatypes are __m64, __mm128, __mm256, and __mm512 (each vector will contain floating point value aka float), you have the equivalent for double precision values (__mm128d, __mm256d, __mm512d) and for other types : int, short, char (__mm128i, __mm256i, __mm512i).

Note: Note that all those types are automatically aligned on a 8, 16, 32 or 64 bytes boundaries.

Now computing data

Now you know how will be computed your data, you can perform operation on it. You can add, multiply your vectors, perform bit shifting etc.

You have the choice to do “classical” operations, or you can use Intel’s intrinsics which are functions which computes a specific operation (basic mathematics, bit shifting, comparisons etc.). All of Intel’s Intrinsics are available at this URL: https://software.intel.com/sites/landingpage/IntrinsicsGuide/. On this page you can also see performance information of each function on different processors.

 Examples

For all examples above, I used C langage.

Compiling “SIMD aware” programs (with GCC)

If you want to compile SIMD aware program, you have to include “immintrin.h” header file which is available with GCC. This header will test which extension you have, and you have used for you compilation. (Just find this file and open it). Depending on your CPU and compilation, it will include another header file:

  • mmintrin.h for MMX instructions and datatypes:
  • xmmintrin.h for SSE
  • emmintrin.h for SSE2
  • pmmintrin.h for SSE3
  • tmmintrin.h for SSSE3
  • smmintrin.h for SSE4.1 and SSE4.2
  • avxintrin.h for AVX

When you compile your program, some extensions are not included by default. Indeed if your CPU supports AVX extensions, if you don’t give the correct option to the compiler, AVX won’t be used.

Main options are:

  • O3: this option enable vectorization loops optimization.
  • msse4.1: this option enable SSE4.1 extension
  • msse4.2: this option enable SSE4.2 extension
  • mavx: this option enable AVX extension
  • mavx2: this option enable AVX2 extension

Other options are available here: https://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/i386-and-x86_002d64-Options.html

To demonstrate this, I used a small program:


#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>

void print_extensions () {
#ifdef __MMX__
printf("MMX ... OK\n");
#else
printf("MMX ... KO\n");
#endif

#ifdef __SSE__
printf("SSE ... OK\n");
#else
printf("SSE ... KO\n");
#endif

#ifdef __SSE2__
printf("SSE2 ... OK\n");
#else
printf("SSE2 ... KO\n");
#endif

#ifdef __SSE3__
printf("SSE3 ... OK\n");
#else
printf("SSE3 ... KO\n");
#endif

#ifdef __SSSE3__
printf("SSSE3 ... OK\n");
#else
printf("SSSE3 ... KO\n");
#endif

#if defined (__SSE4_2__) || defined (__SSE4_1__)
printf("SSE4_1/2 ... OK\n");
#else
printf("SSE4_1/2 ... KO\n");
#endif

#if defined (__AES__) || defined (__PCLMUL__)
printf("AES/PCLMUL ... OK\n");
#else
printf("AES/PCLMUL ... KO\n");
#endif

#ifdef __AVX__
printf("AVX ... OK\n");
#else
printf("AVX ... KO\n");
#endif
}

int main(int argc, char** argv) {
print_extensions();
return 0;
}

If you run it with only O3 optimization, you will get this result:


macbook-laurent:simd $ sysctl -a | egrep 'cpu.*features'
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ENFSTRG RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM
machdep.cpu.extfeatures: SYSCALL 1GBPAGE EM64T LAHF RDTSCP TSCI

macbook-laurent:simd $ cc -O3 -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... KO
AES/PCLMUL ... KO
AVX ... KO

If you run with correct options, your program can use AVX or SSE4 extensions:

macbook-laurent:simd $ cc -O3 -msse4.2 -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... OK
AES/PCLMUL ... KO
AVX ... KO
macbook-laurent:simd $ cc -O3 -mavx -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... OK
AES/PCLMUL ... KO
AVX ... OK

Note that if you enable AVX extension, SSE4 extensions are enabled by default.

Example of SSE2 usage in a basic operation (sum)

The C code above will show you how to perform a sum of two arrays of 16 integers each without using Intel intrinsics:


void func2_sse() {
int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
int b[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
__m128i* aptr;
__m128i* bptr;
int i;
int loopcnt=0;
printf("sizeof(__m128i)=%lu\n",sizeof(__m128i));
printf("sizeof(a)=%lu\n",sizeof(a));

// Above, we cast integer arrays to vectors of integers

aptr=(__m128i*)a;
bptr=(__m128i*)b;

// and now we compute the sum
for (i=0;i<sizeof(a)/sizeof(__m128i);i++) {
loopcnt++;
bptr[i]=aptr[i]+bptr[i];
}

int* c=(int*)bptr;

printf("loopcount = %d\nresult= ",loopcnt);
for (i=0;i<16;i++) {
printf("%d ",c[i]);
}
printf("\n");
}

and the result, my sum has been computed in only 4 loops:


SSE
--------------------
sizeof(__m128i)=16
sizeof(a)=64
loopcount = 4
result= 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Same example with AVX extension:


void func2_avx() {
 int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
 int b[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
 __m256i* aptr;
 __m256i* bptr;
 int i;
 int loopcnt=0;
 printf("sizeof(__m256i)=%lu\n",sizeof(__m256i));
 printf("sizeof(a)=%lu\n",sizeof(a));
 aptr=(__m256i*)a;
 bptr=(__m256i*)b;

 for (i=0;i<sizeof(a)/sizeof(__m256i);i++) {
 loopcnt++;
 bptr[i]=aptr[i]+bptr[i];
 }

 int* c=(int*)bptr;

 printf("loopcount = %d\nresult= ",loopcnt);
 for (i=0;i<16;i++) {
 printf("%d ",c[i]);
 }
 printf("\n");
}

and the result, my sum has been computed in only 2 loops:


AVX
--------------------
sizeof(__m256i)=32
sizeof(a)=64
loopcount = 2
result= 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

 Now, let’s compare two data sets with SIMD extension

Next code sample concerns a vector where we want to search the value 10. To do that, we use a comparison function and a function to build a 256bits (AVX) vector full of the value we search. The comparison function works with 32bits packets (useful to compare integers) and returns 0xFFFFFFFF if both values are equal, 0x0 otherwise. As it’s an AVX function, our initial vector composed by 16 values is processed in only 2 CPU cycles.

void func2_compare_32bitsPack() {
    int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
    __m256i* aptr;
    __m256i b;
    int i;
    int loopcnt=0;
    aptr=(__m256i*)a;
    // b is a vector full off int(32bits) equal to 10 (the value we search)
    b=_mm256_set1_epi32(10);

    for (i=0;i<sizeof(a)/sizeof(__m256i);i++) {
        loopcnt++;
        // comparison intrinsic function: packed by 32 bits(specific for int: if equal set 0xFFFFFFFF, 0x0 otherwise)
        aptr[i]=_mm256_cmpeq_epi32(aptr[i],b);
    }

    // print results
    int* c=(int*)aptr;

    printf("loopcount = %d\nresult= ",loopcnt);
    for (i=0;i<16;i++) {
        printf("0x%x   ",c[i]);
    }
    printf("\n");
}

And the result:


macbook-laurent:simd $ ./simd
Comparison
loopcount = 2
result= 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xffffffff 0x0 0x0 0x0 0x0 0x0 0x0

It becomes easy to identify that the value 10 is located at the index 10 in our initial array.

Ok, and how SIMD extensions are used in Oracle 12c In Memory ?

If you have read my last post on how to activate SSE4 extensions on VirtualBox guests (https://laurent-leturgez.com/2015/04/14/enable-simd-sse4-extension-in-oracle-virtualbox/) , and Tanel Pöder’s post (https://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/), you have noticed that Oracle can run IM with only SSE2 extension (default), but if your CPUs have SSE4, or AVX extensions, Oracle will use some specific libraries that uses SSE4 (libshpksse4212.so) and AVX (libshpkavx12.so).

If we have a look at functions in those libraries, we will see that every function starts with “kdzk”

[oracle@oel64-112 ~]$ readelf -a /u01/app/oracle/product/12.1.0/dbhome_1/lib/libshpksse4212.a | grep FUNC
 6: 0000000000000030 256 FUNC LOCAL DEFAULT 3 kdzk_overload_opc_name
 23: 0000000000000130 80 FUNC LOCAL DEFAULT 3 kdzk_flag_name
 26: 0000000000000180 112 FUNC LOCAL DEFAULT 3 kdzk_enc_name
 31: 00000000000001f0 320 FUNC LOCAL DEFAULT 3 kdzk_datawidth_name
 64: 0000000000002b70 544 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_1bit
 65: 0000000000002d90 544 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_1bit
 66: 0000000000002fb0 544 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_1bit
 67: 00000000000031d0 592 FUNC LOCAL DEFAULT 3 kdzk_le_dict_1bit
 68: 0000000000003420 592 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_1bit
 69: 0000000000003670 544 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_1bit
 70: 0000000000003890 224 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_1bit
 71: 0000000000003970 576 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_1bit
 72: 0000000000003bb0 576 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_1bit
 73: 0000000000003df0 992 FUNC LOCAL DEFAULT 3 kdzk_ge_le_dict_1bit
 74: 00000000000041d0 512 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_1bit_null
 75: 00000000000043d0 192 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_1bit_null
 76: 0000000000004490 512 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_1bit_null
 77: 0000000000004690 512 FUNC LOCAL DEFAULT 3 kdzk_le_dict_1bit_null
 78: 0000000000004890 464 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_1bit_null
 79: 0000000000004a60 512 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_1bit_null
 80: 0000000000004c60 192 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_1bit_null
 81: 0000000000004d20 528 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_1bit_null
 82: 0000000000004f30 192 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_1bit_null
 83: 0000000000004ff0 528 FUNC LOCAL DEFAULT 3 kdzk_ge_le_dict_1bit_null
 84: 0000000000005200 848 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_2bit_selecti
 85: 0000000000005550 960 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_2bit
 89: 0000000000005910 848 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_2bit_selecti
 90: 0000000000005c60 1056 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_2bit
 91: 0000000000006080 848 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_2bit_selecti
 92: 00000000000063d0 1008 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_2bit
 93: 00000000000067c0 848 FUNC LOCAL DEFAULT 3 kdzk_le_dict_2bit_selecti
 94: 0000000000006b10 1024 FUNC LOCAL DEFAULT 3 kdzk_le_dict_2bit
 95: 0000000000006f10 848 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_2bit_selecti
 96: 0000000000007260 1056 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_2bit
 97: 0000000000007680 848 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_2bit_selecti
 98: 00000000000079d0 960 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_2bit
 99: 0000000000007d90 928 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_2bit_sele
 100: 0000000000008130 1328 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_2bit
 101: 0000000000008660 928 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_2bit_sele
 102: 0000000000008a00 1296 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_2bit
 103: 0000000000008f10 928 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_2bit_sele
 104: 00000000000092b0 1328 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_2bit</pre>

kdzk is the Oracle component that manages compression:


SQL> oradebug doc components

.../...

Components in library ADVCMP:
--------------------------
 ADVCMP_MAIN Archive Compression (kdz)
 ADVCMP_COMP Archive Compression: Compression (kdzc, kdzh, kdza)
 ADVCMP_DECOMP Archive Compression: Decompression (kdzd, kdzs)
 ADVCMP_DECOMP_HPK Archive Compression: HPK (kdzk)
 ADVCMP_DECOMP_PCODE Archive Compression: Pcode (kdp)

An interesting thing to see is that, even you use an Oracle Kernel without any SSE4 nor AVX extension active (so your process doesn’t use libshpksse4212.so nor libshpkavx12.so library), you use kdz functions when you query and filter a table which is managed in Memory.

In a session I run the statements above:


SQL> select segment_name,BYTES,BYTES_NOT_POPULATED from v$im_segments

SEGMENT_NAME         BYTES         BYTES_NOT_POPULATED
-------------------- ------------- -------------------
S                         37748736                   0

SQL> select spid from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

SPID
------------------------
3619

SQL> select count(*) from s where amount_sold>1700;

Just before launching the command, I attach my process and run gdb to catch every call to kdz functions:


[oracle@oel64-112 ~]$ pmap -x 3619 | egrep 'sse|avx'

[oracle@oel64-112 ~]$ gdb -pid 3619
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
.../...
Loaded symbols for /u01/app/oracle/product/12.1.0/dbhome_1/lib/libnque12.so
0x000000362ea0e740 in __read_nocancel () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.4.x86_64 libaio-0.3.107-10.el6.x86_64 numactl-2.0.7-8.el6.x86_64
(gdb) rbreak ^kdz

.../...

(gdb) commands
Type commands for breakpoint(s) 1-2165, one per line.
End with a line saying just "end".
>continue
>end

If you study the output, you will see that a lot of functions are called, and in the list, you will find some interesting functions: kdzdcol_get_minval, kdzdcol_get_maxval, kdzk_build_vector etc. Oracle clearly uses vectors to process IM compression units.

In my opinion, it’s normal to use functions related to compression because the kernel manipulates “Compression Units”, and it should integrates SIMD functions in its libraries.

A last curiosity with Oracle 12c (12.1.0.2)

Ok now you had a look to your installation, your machine is “AVX enabled”, and Oracle processes uses the AVX compatible library (libshpkavx212.so), everything is OK and you think you will use all this stuff.

But if you use objdump on this library, and you search for AVX registers, you won’t find anything:


[oracle@oel64-112 ~]$ grep -i ymm objdump_out.1 | wc -l
0

Tanel Pöder gave me the answer !!! Oracle database code is compiled to be compatible with Redhat/Oracle Linux 5, so it must be compatible with kernel 2.6.18. But linux scheduler can work with YMM registers from version 2.6.30 onwards.

You can use new instructions without the kernel knowing about us, but you can’t use registers that are not yet supported by the kernel.

I think next version of Oracle will improve this, maybe in 12.2.

To conclude, there is not Oracle 12c style for SIMD instructions. Oracle has developed functions that uses SIMD instructions, for Intel CPUs, they uses SSE, SSE2, SSE3, SSE4 or AVX depending on the CPU architecture, on IBM AIX these libraries use VMX extension (SIMD instruction on Power) etc.

Sources:

http://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/

http://en.wikipedia.org/wiki/Data_structure_alignment

http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

http://en.wikipedia.org/wiki/SIMD

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

https://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html

https://laurent-leturgez.com/2015/04/14/enable-simd-sse4-extension-in-oracle-virtualbox/

Enable SIMD SSE4 extension in Oracle VirtualBox

If like me you are using Virtualbox for your Oracle labs, maybe you have seen than SSE extensions are activated, but neither SSE4 (1 and 2) nor AVX extensions are activated in your VMs. But you have a modern CPU in your laptop and you cannot use these extensions in your VM (specially Oracle 12c with in Memory option) :

[oracle@oel64-112 ~]$ grep flags /proc/cpuinfo | uniq
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse lahf_lm

Only SSE, SSE2, and SSSE3 are active and this can be seen in details in your Oracle utilization:


SQL> select display_name,value from v$mystat ms, v$statname n where ms.statistic#=n.statistic#
 2 and display_name in ('IM scan CUs columns accessed',
 3 'IM scan segments minmax eligible',
 4 'IM scan CUs pruned');

DISPLAY_NAME                                                          VALUE
---------------------------------------------------------------- ----------
IM scan CUs columns accessed                                              0
IM scan CUs pruned                                                        0
IM scan segments minmax eligible                                          0

SQL> select count(*) from s where amount_sold>1700;

COUNT(*)
----------
 2095

SQL> select display_name,value from v$mystat ms, v$statname n where ms.statistic#=n.statistic#
 2 and display_name in ('IM scan CUs columns accessed',
 3 'IM scan segments minmax eligible',
 4 'IM scan CUs pruned');

DISPLAY_NAME                                                          VALUE
---------------------------------------------------------------- ----------
IM scan CUs columns accessed                                              4
IM scan CUs pruned                                                        5
IM scan segments minmax eligible                                          9

SQL> select spid from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

SPID
------------------------
23693

 

[oracle@oel64-112 ~]$ pmap -x 23693 | awk {'print $6;'} | grep lib | uniq
libpthread-2.12.so
libaio.so.1.0.1
libc-2.12.so
libm-2.12.so
libnuma.so.1
libnsl-2.12.so
librt-2.12.so
libnque12.so
libnss_files-2.12.so
libdl-2.12.so
libons.so
libocrutl12.so
libocrb12.so
libocr12.so
libskgxn2.so
libhasgen12.so
libdbcfg12.so
libclsra12.so
libipc1.so
libmql1.so
libskjcx12.so
libskgxp12.so
libcell12.so
libodmd12.so

Our server process doesn’t use libshpksse4212.so nor  libshpkavx12.so librairies. (nevertheless, SIMD extensions are used because we can see IM CU pruning). More details about this here: http://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/ 

This is because Oracle VirtualBox doesn’t support officially  SSE4_1, SSE4_2, and AVX extension.

But if you read VirtualBox User Manual, we can see that, Starting with VirtualBox 4.3.8, SSE4 extensions can be activated on you VM guests. (This is experimental).

To do this, you have to execute those commands with VirtualBox CLI :


$ VBoxManage setextradata "OEL6.4 Oracle DB (192.168.99.8)" VBoxInternal/CPUM/SSE4.1 1

$ VBoxManage setextradata "OEL6.4 Oracle DB (192.168.99.8)" VBoxInternal/CPUM/SSE4.2 1

$ VBoxManage getextradata "OEL6.4 Oracle DB (192.168.99.8)" enumerate
Key: GUI/LastCloseAction, Value: PowerOffRestoringSnapshot
Key: GUI/LastGuestSizeHint, Value: 720,400
Key: GUI/LastNormalWindowPosition, Value: 10,31,720,442
Key: GUI/MiniToolBarAlignment, Value: bottom
Key: GUI/SaveMountedAtRuntime, Value: yes
Key: GUI/ShowMiniToolBar, Value: yes
Key: VBoxInternal/CPUM/SSE4.1, Value: 1
Key: VBoxInternal/CPUM/SSE4.2, Value: 1

Note: If you want to get the list of your VM, you can use this command: VBoxManage list vms

Now start your VM, and let’s check:


[oracle@oel64-112 ~]$ grep flags /proc/cpuinfo | uniq
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 sse4_1 sse4_2 lahf_lm

and now on the oracle server process side:


$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Apr 14 11:13:40 2015

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select spid from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

SPID
------------------------
3363


$ pmap -x 3363 | awk {'print $6;'} | grep lib | uniq
libpthread-2.12.so
libaio.so.1.0.1
libc-2.12.so
libm-2.12.so
libnuma.so.1
libnsl-2.12.so
librt-2.12.so
>>>>>  libshpksse4212.so <<<<<
libnque12.so
libnss_files-2.12.so
libdl-2.12.so
libons.so
libocrutl12.so
libocrb12.so
libocr12.so
libskgxn2.so
libhasgen12.so
libdbcfg12.so
libclsra12.so
libipc1.so
libmql1.so
libskjcx12.so
libskgxp12.so
libcell12.so
libodmd12.so

Ok, our database server is now using SSE4 SIMD extensions, but what about AVX ?

AVX extensions are not yet supported on VirtualBox at the moment, and it’s not announced to be (even in experimental mode) with VirtualBox 5.

So for the moment, if you want to use AVX extensions in your guests VM, you need to use VMWare Fusion or parallels (for Mac OS Users) but they are not free tools. (I didn’t search any hypervisor  on Windows or Linux that supports AVX extensions), if you know one … let me know.

 

Oracle Database 12c CDB$VIEW function

In Oracle 12c multitenant container, there are some new views that complement the USER_, ALL_ and DBA_Views, the CDB_ views.

For example, CDB_TABLES, CDB_OBJECTS, CDB_VIEWS etc. Those views give information for all containers. For example, CDB_TABLES references all tables information in all containers (CDB$ROOT, PDB$SEED and all plugabble databases). Those views have a new column CON_ID that reference the container id, for example:

[SYS@CDB$ROOT | SID:CDB]> select con_id,count(*) from cdb_tables group by con_id;

    CON_ID   COUNT(*)
---------- ----------
         2       2316
         3       2316
         1       2324
 If you have a closer look to the view definition, you will find that every CDB_ view uses a specific function named CDB$VIEW:
[SYS@CDB$ROOT | SID:CDB]> select dbms_metadata.get_ddl('VIEW','CDB_TABLES') from dual;

DBMS_METADATA.GET_DDL('VIEW','CDB_TABLES')
--------------------------------------------------------------------------------
  CREATE OR REPLACE FORCE NONEDITIONABLE VIEW "SYS"."CDB_TABLES"  CONTAINER_DATA
 ("OWNER", "TABLE_NAME", "TABLESPACE_NAME", "CLUSTER_NAME", "IOT_NAME", "STATUS"
, "PCT_FREE", "PCT_USED", "INI_TRANS", "MAX_TRANS", "INITIAL_EXTENT", "NEXT_EXTE
NT", "MIN_EXTENTS", "MAX_EXTENTS", "PCT_INCREASE", "FREELISTS", "FREELIST_GROUPS
", "LOGGING", "BACKED_UP", "NUM_ROWS", "BLOCKS", "EMPTY_BLOCKS", "AVG_SPACE", "C
HAIN_CNT", "AVG_ROW_LEN", "AVG_SPACE_FREELIST_BLOCKS", "NUM_FREELIST_BLOCKS", "D
EGREE", "INSTANCES", "CACHE", "TABLE_LOCK", "SAMPLE_SIZE", "LAST_ANALYZED", "PAR
TITIONED", "IOT_TYPE", "TEMPORARY", "SECONDARY", "NESTED", "BUFFER_POOL", "FLASH
_CACHE", "CELL_FLASH_CACHE", "ROW_MOVEMENT", "GLOBAL_STATS", "USER_STATS", "DURA
TION", "SKIP_CORRUPT", "MONITORING", "CLUSTER_OWNER", "DEPENDENCIES", "COMPRESSI
ON", "COMPRESS_FOR", "DROPPED", "READ_ONLY", "SEGMENT_CREATED", "RESULT_CACHE",
"CLUSTERING", "ACTIVITY_TRACKING", "DML_TIMESTAMP", "HAS_IDENTITY", "CONTAINER_D
ATA", "CON_ID") AS
  SELECT "OWNER","TABLE_NAME","TABLESPACE_NAME","CLUSTER_NAME","IOT_NAME","STATU
S","PCT_FREE","PCT_USED","INI_TRANS","MAX_TRANS","INITIAL_EXTENT","NEXT_EXTENT",
"MIN_EXTENTS","MAX_EXTENTS","PCT_INCREASE","FREELISTS","FREELIST_GROUPS","LOGGIN
G","BACKED_UP","NUM_ROWS","BLOCKS","EMPTY_BLOCKS","AVG_SPACE","CHAIN_CNT","AVG_R
OW_LEN","AVG_SPACE_FREELIST_BLOCKS","NUM_FREELIST_BLOCKS","DEGREE","INSTANCES","
CACHE","TABLE_LOCK","SAMPLE_SIZE","LAST_ANALYZED","PARTITIONED","IOT_TYPE","TEMP
ORARY","SECONDARY","NESTED","BUFFER_POOL","FLASH_CACHE","CELL_FLASH_CACHE","ROW_
MOVEMENT","GLOBAL_STATS","USER_STATS","DURATION","SKIP_CORRUPT","MONITORING","CL
USTER_OWNER","DEPENDENCIES","COMPRESSION","COMPRESS_FOR","DROPPED","READ_ONLY","
SEGMENT_CREATED","RESULT_CACHE","CLUSTERING","ACTIVITY_TRACKING","DML_TIMESTAMP"
,"HAS_IDENTITY","CONTAINER_DATA","CON_ID" FROM CDB$VIEW("SYS"."DBA_TABLES")
After searching a long time, I didn’t find any definition of this function, so I decided to search more about its behaviour.
First of all, this function seems “to transform” the view or the table given in parameter by adding a CON_ID table. We will see later that it’s not a CBO transformation as we know it but a “low level” transformation performed before SQL parsing.
CDB$VIEW function can be used with static views (DBA_TABLES, DBA_OBJECTS etc.), dictionary tables (OBJ$, USER$, FILE$ etc.), on dynamic performance views and X$ fixed tables, but on those two last items, the result is a little bit different than other.
[SYS@CDB$ROOT | SID:CDB]> select count(*) from cdb$view("DBA_TABLES");

  COUNT(*)
----------
      6956

[SYS@CDB$ROOT | SID:CDB]> select count(*) from cdb$view("OBJ$");

  COUNT(*)
----------
    272326
In a multitenant database, each container has its own dictionary, for example, CDB$ROOT has its OBJ$ table, the PDB$SEED has its own one and every pluggable database has its own OBJ$ table. So, the goal of this function is to give a global view from all containers.
But, as the OBJ$ table has the same name in every container, and as there’s no method to access a specific dictionary table of a pluggable database from the root container, the CDB$VIEW function is used to aggregate data from a specific view or table executed in each container:
[SYS@CDB$ROOT | SID:CDB]> show con_id;

CON_ID
------------------------------
1
[SYS@CDB$ROOT | SID:CDB]> select count(*) from obj$;

  COUNT(*)
----------
     90847

[SYS@CDB$ROOT | SID:CDB]>  select con_id,count(*) from cdb$view("OBJ$") group by con_id;

    CON_ID   COUNT(*)
---------- ----------
         1      90847
         2      90716
         3      90763
Now, let’s have a closer look to the generated plan:
[SYS@CDB$ROOT | SID:CDB]> select count(*) from obj$;

  COUNT(*)
----------
     90847

Execution Plan
----------------------------------------------------------
Plan hash value: 3951003077

------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |    91   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |        |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| I_OBJ1 | 90910 |    91   (0)| 00:00:01 |
------------------------------------------------------------------------
[SYS@CDB$ROOT | SID:CDB]>  select count(*) from cdb$view("OBJ$");

  COUNT(*)
----------
    272326

Execution Plan
----------------------------------------------------------
Plan hash value: 2345629731

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name             | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                  |     1 |     2 (100)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE           |                  |     1 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR          |                  |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000         |     1 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |                  |     1 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX PARTITION LIST ALL|                  | 90910 |     2 (100)| 00:00:01 |     1 |   254 |  Q1,00 | PCWC |            |
|   6 |       FIXED TABLE FULL    | X$CDBVW$c2fad3da | 90910 |     2 (100)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------
There are two interesting things to note:
  • CDB$VIEW transform the statement to query an X$CDBVW$c2fad3da fixed table
  • This fixed table is read in parallel

About the transformation, I noticed many things.

First, if we query another table or view, another X$CDBVW$ is generated:

[SYS@CDB$ROOT | SID:CDB]> select count(*) from cdb$view("TAB$");

  COUNT(*)
----------
      7098

Execution Plan
----------------------------------------------------------
Plan hash value: 111784239

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name             | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                  |     1 |     0   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE           |                  |     1 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR          |                  |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000         |     1 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |                  |     1 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX PARTITION LIST ALL|                  |  2372 |     0   (0)| 00:00:01 |     1 |   254 |  Q1,00 | PCWC |            |
|   6 |       FIXED TABLE FULL    | X$CDBVW$00303d4d |  2372 |     0   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------
So, it seems that there’s a hash added at the end of this X$CDBVW$. It’s interesting to see that the fixed table X$CDBVW$ exists in the instance but have no rows recorded in:
[SYS@CDB$ROOT | SID:CDB]> desc X$CDBVW$
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(8)
 INDX                                               NUMBER
 INST_ID                                            NUMBER
 CON_ID                                             NUMBER

[SYS@CDB$ROOT | SID:CDB]> select * from X$CDBVW$;

no rows selected
Second thing to note is this transformation is not a CBO transformation, indeed if you have a closer look to a 10053 trace file, the query has already been rewrited before parsing:
Stmt: ******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "SYS"."X$CDBVW$00303d4d" "TAB$"
Objects referenced in the statement
  X$CDBVW$00303d4d[TAB$] 4, type = 1
Objects in the hash table
  Hash table Object 4, type = 1, ownerid = 795352840147398986:
    Dynamic Sampling Directives at location 1:
       dirid = 11259849835960924452, state = 5, flags = 1, loc = 1 {E(4)[30]}
Return code in qosdInitDirCtx: ENBLD
Concerning the parallel execution, I noticed partition pruning is made upon the CON_ID column:
[SYS@CDB$ROOT | SID:CDB]> select count(*) from cdb$view("TAB$") where con_id=1;

  COUNT(*)
----------
      2372

Execution Plan
----------------------------------------------------------
Plan hash value: 2579428923

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |     1 |    13 |     0   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE              |                  |     1 |    13 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR             |                  |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)       | :TQ10000         |     1 |    13 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE           |                  |     1 |    13 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX PARTITION LIST SINGLE|                  |    24 |   312 |     0   (0)| 00:00:01 |     1 |     1 |  Q1,00 | PCWC |            |
|   6 |       FIXED TABLE FULL       | X$CDBVW$00303d4d |    24 |   312 |     0   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------------
 To sum up, cdb$view function creates a fixed table on the fly and load data taken from all containers, the CON_ID column in this fixed table references the container id from where data are read. This fixed table is partitioned by CON_ID and reading this table is made by default in parallel mode (partition pruning is made on the CON_ID column).
There are two hidden parameters used to change this behaviour:
  • _partition_cdb_view_enabled: this parameter seems to disable the partitioning of the X$CDBVW$ view
[SYS@CDB$ROOT | SID:CDB]> alter session set "_partition_cdb_view_enabled"=FALSE;

Session altered.

[SYS@CDB$ROOT | SID:CDB]> select count(*) from cdb$view("TAB$");

  COUNT(*)
----------
      7098

Execution Plan
----------------------------------------------------------
Plan hash value: 4209588238

------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |     1 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                  |     1 |            |          |
|   2 |   FIXED TABLE FULL| X$CDBVW$00303d4d |  2372 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------
  • _px_cdb_view_enabled: this parameter seems to disable parallel scan of the X$CDBVW$ view
[SYS@CDB$ROOT | SID:CDB]> alter session set "_px_cdb_view_enabled"=FALSE;

Session altered.

[SYS@CDB$ROOT | SID:CDB]> select count(*) from cdb$view("TAB$");

  COUNT(*)
----------
      7098

Execution Plan
----------------------------------------------------------
Plan hash value: 3630495286

------------------------------------------------------------------------------------------------
| Id  | Operation           | Name             | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                  |     1 |     0   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE     |                  |     1 |            |          |       |       |
|   2 |   PARTITION LIST ALL|                  |  2372 |     0   (0)| 00:00:01 |     1 |   254 |
|   3 |    FIXED TABLE FULL | X$CDBVW$00303d4d |  2372 |     0   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------

On the CBO side, I tried to have a closer look on how statistics are evaluated for this particular view.

It seems that the CBO uses the cardinality of the dictionary table available in the CDB$ROOT container and ignore the statistics of the other dictionary table (taken from the other containers):

[SYS@CDB$ROOT | SID:CDB]> select con_id,table_name,num_rows from cdb_tables where table_name='TAB$';

    CON_ID TABLE_NAME        NUM_ROWS
---------- --------------- ----------
         1 TAB$                  2372
         2 TAB$                  1462
         3 TAB$                  2363

[SYS@CDB$ROOT | SID:CDB]> alter session set events '10053 trace name context forever, level 1';

Session altered.

[SYS@CDB$ROOT | SID:CDB]> select /* PARSE_ME */ count(*) from cdb$view("TAB$");

  COUNT(*)
----------
      7098
 But If we have a closer look to the generated 10053 trace file, and the cardinalities seems to be a little bit different from reality:
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: X$CDBVW$00303d4d  Alias: TAB$
  #Rows: 2372  #Blks:  1487  AvgRowLen:  134.00  ChainCnt:  0.00
  FixedTabRowCost:  1000.00
=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for X$CDBVW$00303d4d
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for X$CDBVW$00303d4d[TAB$]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Table: X$CDBVW$00303d4d  Alias: TAB$
    Card: Original: 2372.000000  Rounded: 2372  Computed: 2372.00  Non Adjusted: 2372.00
  Access Path: TableScan
    Cost:  0.06  Resp: 0.06  Degree: 0
      Cost_io: 0.00  Cost_cpu: 2372000
      Resp_io: 0.00  Resp_cpu: 2372000
  Best:: AccessPath: TableScan
         Cost: 0.06  Degree: 1  Resp: 0.06  Card: 2372.00  Bytes: 0

***************************************
 .../...
============
Plan Table
============
-----------------------------------------------------+-----------------------------------+-------------------------+---------------+
| Id  | Operation                  | Name            | Rows  | Bytes | Cost  | Time      |  TQ  |IN-OUT|PQ Distrib | Pstart| Pstop |
-----------------------------------------------------+-----------------------------------+-------------------------+---------------+
| 0   | SELECT STATEMENT           |                 |       |       |     1 |           |      |      |           |       |       |
| 1   |  SORT AGGREGATE            |                 |     1 |       |       |           |      |      |           |       |       |
| 2   |   PX COORDINATOR           |                 |       |       |       |           |      |      |           |       |       |
| 3   |    PX SEND QC (RANDOM)     | :TQ10000        |     1 |       |       |           |:Q1000| P->S |QC (RANDOM)|       |       |
| 4   |     SORT AGGREGATE         |                 |     1 |       |       |           |:Q1000| PCWP |           |       |       |
| 5   |      PX PARTITION LIST ALL |                 |  2372 |       |     0 |           |:Q1000| PCWC |           | 1     | 254   |
| 6   |       FIXED TABLE FULL     | X$CDBVW$00303d4d|  2372 |       |     0 |           |:Q1000| PCWP |           |       |       |
-----------------------------------------------------+-----------------------------------+-------------------------+---------------+
Now imagine, you have consolidated a multitenant database with 200 pdbs, each pdb host an ERP with tons of table, you will have a small difference between evaluation and real row count.
To conclude this part, If you perform the same analysis but you query the CDB_TABLES view, this one will be rewrite in CDB$VIEW(“DBA_TABLES”). As the DBA_TABLES is a view and is not analyzed, Oracle seems to set the cardinality to 10000 rows. I have made some tests on many views and it’s always the same cardinality : 10000 rows and 0 block.
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: X$CDBVW$73c6ce97  Alias: DBA_TABLES  (NOT ANALYZED)
  #Rows: 10000  #Blks:  0  AvgRowLen:  464.00  ChainCnt:  0.00
  FixedTabRowCost:  1000.00
=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$F5BB74E1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for X$CDBVW$73c6ce97
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for X$CDBVW$73c6ce97[DBA_TABLES]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Table: X$CDBVW$73c6ce97  Alias: DBA_TABLES
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  Access Path: TableScan
    Cost:  0.25  Resp: 0.25  Degree: 0
      Cost_io: 0.00  Cost_cpu: 10000000
      Resp_io: 0.00  Resp_cpu: 10000000
  Best:: AccessPath: TableScan
         Cost: 0.25  Degree: 1  Resp: 0.25  Card: 10000.00  Bytes: 0

***************************************
To conclude this part, even if statistics on CDB$VIEW(“DBA_TABLES”) are always set to 10000 rows, it is safe to use CDB_ views directly instead on CDB$VIEW function on a dictionary table, even if I didn’t find any more accurate method.
The last researches I’ve made is on how the Oracle kernel will transform the query before parsing.
As we saw previously, the CBO receives an unparsed query which is already rewritten with the X$CDBVW$ view, so the transformation might be performed at a low level.
In Oracle 12c there’s a new object file fitted in libserver.a library (you can get this file in the 12.1 PSU2 files directory). This object file is named kpdbcv.o and it contains many function named kpdbcv* (I think this means Kernel PDB Cdb View).
[oracle@oel63 libserver12.a]$ pwd
/home/oracle/patch/17552800/files/lib/libserver12.a
[oracle@oel63 libserver12.a]$ readelf -a kpdbcv.o | grep FUNC
     7: 0000000000000270   448 FUNC    LOCAL  DEFAULT    4 kpdbcvRwtToInlineView
    10: 00000000000008f0  1216 FUNC    LOCAL  DEFAULT    4 kpdbcvLoadFragDescr
    13: 0000000000001c60    96 FUNC    LOCAL  DEFAULT    4 kpdbcvLoadBaseInfo
    14: 00000000000003b0    80 FUNC    LOCAL  DEFAULT    3 kpdbcvBuildColNames
    15: 0000000000009d50  1680 FUNC    LOCAL  DEFAULT    3 kpdbcvGetSysCtxCurrUser
    16: 0000000000001940  7616 FUNC    LOCAL  DEFAULT    3 kpdbcvFetchCbkCon
    17: 0000000000003700    96 FUNC    LOCAL  DEFAULT    3 kpdbcvFetchCbkSys
    18: 0000000000004ff0  1680 FUNC    LOCAL  DEFAULT    3 kpdbcvInsBaseViewStats
    19: 0000000000005680  1744 FUNC    LOCAL  DEFAULT    3 kpdbcvGetBaseViewRowCnt
    20: 0000000000006bc0  1792 FUNC    LOCAL  DEFAULT    3 kpdbcvGetBaseViewStats
    21: 00000000000072c0  1136 FUNC    LOCAL  DEFAULT    3 kpdbcvGetFxdViewStats
    22: 0000000000007730   544 FUNC    LOCAL  DEFAULT    3 kpdbcvFxdViewStatsCbk
    23: 0000000000007b30    80 FUNC    LOCAL  DEFAULT    3 kpdbcvGetConIdCnt
    24: 0000000000007b80   448 FUNC    LOCAL  DEFAULT    3 kpdbcvConIdMorph
    25: 0000000000007d40    80 FUNC    LOCAL  DEFAULT    3 kpdbcvGatherBind
    26: 0000000000007d90  2304 FUNC    LOCAL  DEFAULT    3 kpdbcvPrintPredText
    27: 0000000000008690   304 FUNC    LOCAL  DEFAULT    3 kpdbcvVisitPreds
    28: 00000000000087c0   384 FUNC    LOCAL  DEFAULT    3 kpdbcvVisitOpns
    29: 0000000000008940   384 FUNC    LOCAL  DEFAULT    3 kpdbcvReplaceBind
    30: 0000000000008ac0  2944 FUNC    LOCAL  DEFAULT    3 kpdbcvIsValidWhere
    31: 0000000000009640  1648 FUNC    LOCAL  DEFAULT    3 kpdbcvPredicateToText
    37: 0000000000000010    32 FUNC    GLOBAL DEFAULT    3 kpdbcdbvwcbk
    38: 0000000000000030    16 FUNC    GLOBAL DEFAULT    3 kpdbcvComDataCbk
    40: 0000000000000010   608 FUNC    GLOBAL DEFAULT    4 kpdbcvXformFrodef
    52: 0000000000000430  1216 FUNC    GLOBAL DEFAULT    4 kpdbcvLoadPartDescr
    73: 0000000000000db0   320 FUNC    GLOBAL DEFAULT    4 kpdbcvIsPartitioned
    88: 0000000000000050   784 FUNC    GLOBAL DEFAULT    3 kpdbcvIsParallelizable
    93: 0000000000000360    64 FUNC    GLOBAL DEFAULT    3 kpdbcvGetDOP
    99: 0000000000000f00  3232 FUNC    GLOBAL DEFAULT    4 kpdbcvLoad
   135: 0000000000000400  5440 FUNC    GLOBAL DEFAULT    3 kpdbcvFetchStart
   201: 0000000000003760  4560 FUNC    GLOBAL DEFAULT    3 kpdbcvFetch
   216: 0000000000004930   304 FUNC    GLOBAL DEFAULT    3 kpdbcvCleanup
   217: 0000000000001bb0   160 FUNC    GLOBAL DEFAULT    4 kpdbcvIsCDBViewOwner
   219: 0000000000004a70  1408 FUNC    GLOBAL DEFAULT    3 kpdbcvDelALLBaseViewStats
   221: 0000000000005d50  3696 FUNC    GLOBAL DEFAULT    3 kpdbcvInsALLBaseViewStats
   222: 0000000000007950    48 FUNC    GLOBAL DEFAULT    3 kpdbcvCDBViewStatsCbk
   223: 0000000000007980    48 FUNC    GLOBAL DEFAULT    3 kpdbcvComDataStatsCbk
   224: 00000000000079b0   384 FUNC    GLOBAL DEFAULT    3 kpdbcvGetParamValue
   241: 0000000000009cb0   160 FUNC    GLOBAL DEFAULT    3 kpdbcvAllocate
Depending on the context (cdb view partitoning enabled or not, parallel scan enabled or not), you will have a different call stack, but with gdb I found that the common point is the first function called which is kpdbcvXformFrodef function.
If you are curious and want to view the kpdbcv* function call stack during a select on CDB_ view, you can use gdb:
(gdb) rbreak ^kpdbcv
(gdb) commands
 Type commands for breakpoint(s) 1-38, one per line.
 End with a line saying just "end".
 >continue
 >end
(gdb) c
Continuing.

Breakpoint 1, 0x0000000002578620 in kpdbcvXformFrodef ()

Breakpoint 6, 0x0000000002579510 in kpdbcvLoad ()

Breakpoint 5, 0x00000000025793c0 in kpdbcvIsPartitioned ()

Breakpoint 5, 0x00000000025793c0 in kpdbcvIsPartitioned ()

Breakpoint 3, 0x0000000002578a40 in kpdbcvLoadPartDescr ()

Breakpoint 4, 0x0000000002578f00 in kpdbcvLoadFragDescr ()

Breakpoint 4, 0x0000000002578f00 in kpdbcvLoadFragDescr ()

.../...
Breakpoint 4, 0x0000000002578f00 in kpdbcvLoadFragDescr ()

Breakpoint 4, 0x0000000002578f00 in kpdbcvLoadFragDescr ()

Breakpoint 4, 0x0000000002578f00 in kpdbcvLoadFragDescr ()

Breakpoint 7, 0x000000000257a1c0 in kpdbcvIsCDBViewOwner ()

Breakpoint 25, 0x00000000083cd580 in kpdbcvCDBViewStatsCbk ()

Breakpoint 24, 0x00000000083cd360 in kpdbcvFxdViewStatsCbk ()

Breakpoint 23, 0x00000000083ccef0 in kpdbcvGetFxdViewStats ()

Breakpoint 22, 0x00000000083cc7f0 in kpdbcvGetBaseViewStats ()

Breakpoint 22, 0x00000000083cc7f0 in kpdbcvGetBaseViewStats ()

Breakpoint 10, 0x00000000083c5c80 in kpdbcvIsParallelizable ()

Breakpoint 5, 0x00000000025793c0 in kpdbcvIsPartitioned ()

Breakpoint 11, 0x00000000083c5f90 in kpdbcvGetDOP ()

Breakpoint 37, 0x00000000083cf8e0 in kpdbcvAllocate ()

Breakpoint 36, 0x00000000083cf270 in kpdbcvPredicateToText ()
To sum up this post, we have to keep in mind that queries on CDB_ views use an internal function CDB$VIEW that transform the sql text before parsing and as a result, there is a new fixed table X$CDBVW$ concatenated to a hash that might identify the source table or view. The statement that will run by default in parallel mode because the X$CDBVW$ fixed table is build as partitioned (by default).
On the CBO side, the result of the CDB_ view transformation give an object which have a default cardinality that seems to be always set at a value of 10000 rows. The result is quite different if you try to run CDB$VIEW on dictionary tables (OBJ$, TAB$ etc) and can give some trouble if you have a lot of PDBs opened in your container.
All this transformations seem to be hard coded (partially) in an object file kpdbcv.o included in the libserver12.a library.

Linux monitoring of oracle 12c multi-threaded instances

Oracle 12c comes with a new feature: multithreaded server. In summary, main processes like real time scheduled processes (vktm, lms), or main processes like pmon or dbwn continue to run as processes. For other ones (lgwr, mmon, server processes etc.), they run now in a thread.

This feature has been developed to optimize oracle to be run on new processors with many core and many threads per core (for example SPARC T Processors), but the DBA will have to change many methods he use to analyze problems in a multi-threaded server.

If for some problems, you usually analyze the OS side, top, ps, and other tools have to be used in a different way. Let’s see different tools that can be used to analyze processes and thread in linux (Tools mentioned here has been tested with Oracle Enterprise Linux 6).

For all example above, I used an orcl instance which run in a multi-threaded configuration

  • ps

If I run a simple ps under my config, there are only 6 processes:

[oracle@oel64-12c ~]$ ps -ef | grep [o]rcl
oracle    9871     1  0 21:02 ?        00:00:00 ora_pmon_orcl
oracle    9873     1  0 21:02 ?        00:00:00 ora_psp0_orcl
oracle    9878     1  5 21:02 ?        00:01:35 ora_vktm_orcl
oracle    9882     1  0 21:02 ?        00:00:02 ora_u004_orcl
oracle    9888     1  0 21:02 ?        00:00:11 ora_u005_orcl
oracle    9894     1  0 21:02 ?        00:00:00 ora_dbw0_orcl
 If I want to print all threads that run in these processes, I can run this command:
[oracle@oel64-12c ~]$ ps -eLo pid,pcpu,tid,user,comm,cmd | sed -n -e '1p' -e '/orcl/p'
  PID %CPU   TID USER     COMMAND         CMD
 9871  0.0  9871 oracle   ora_pmon_orcl   ora_pmon_orcl
 9873  0.0  9873 oracle   ora_psp0_orcl   ora_psp0_orcl
 9878  5.2  9878 oracle   ora_vktm_orcl   ora_vktm_orcl
 9882  0.0  9882 oracle   ora_scmn_orcl   ora_u004_orcl
 9882  0.0  9883 oracle   oracle          ora_u004_orcl
 9882  0.0  9884 oracle   ora_gen0_orcl   ora_u004_orcl
 9882  0.0  9885 oracle   ora_mman_orcl   ora_u004_orcl
 9882  0.0  9891 oracle   ora_dbrm_orcl   ora_u004_orcl
 9882  0.0  9895 oracle   ora_lgwr_orcl   ora_u004_orcl
 9882  0.0  9896 oracle   ora_ckpt_orcl   ora_u004_orcl
 9882  0.0  9897 oracle   ora_lg00_orcl   ora_u004_orcl
 9882  0.0  9898 oracle   ora_lg01_orcl   ora_u004_orcl
 9882  0.0  9899 oracle   ora_smon_orcl   ora_u004_orcl
 9882  0.0  9901 oracle   ora_lreg_orcl   ora_u004_orcl
 9888  0.0  9888 oracle   ora_scmn_orcl   ora_u005_orcl
 9888  0.0  9889 oracle   oracle          ora_u005_orcl
 9888  0.0  9890 oracle   ora_diag_orcl   ora_u005_orcl
 9888  0.0  9892 oracle   ora_dia0_orcl   ora_u005_orcl
 9888  0.0  9900 oracle   ora_reco_orcl   ora_u005_orcl
 9888  0.0  9902 oracle   ora_mmon_orcl   ora_u005_orcl
 9888  0.0  9903 oracle   ora_mmnl_orcl   ora_u005_orcl
 9888  0.0  9904 oracle   ora_d000_orcl   ora_u005_orcl
 9888  0.0  9905 oracle   ora_s000_orcl   ora_u005_orcl
 9888  0.0  9906 oracle   ora_n000_orcl   ora_u005_orcl
 9888  1.3  9931 oracle   oracle_9931_orc ora_u005_orcl
 9888  0.0  9932 oracle   ora_tmon_orcl   ora_u005_orcl
 9888  0.0  9933 oracle   ora_tt00_orcl   ora_u005_orcl
 9888  0.0  9934 oracle   ora_smco_orcl   ora_u005_orcl
 9888  0.0  9938 oracle   ora_fbda_orcl   ora_u005_orcl
 9888  0.0  9939 oracle   ora_aqpc_orcl   ora_u005_orcl
 9888  0.0  9944 oracle   ora_p000_orcl   ora_u005_orcl
 9888  0.0  9945 oracle   ora_p001_orcl   ora_u005_orcl
 9888  0.0  9946 oracle   ora_p002_orcl   ora_u005_orcl
 9888  0.0  9947 oracle   ora_p003_orcl   ora_u005_orcl
 9888  0.0  9948 oracle   ora_p004_orcl   ora_u005_orcl
 9888  0.0  9949 oracle   ora_p005_orcl   ora_u005_orcl
 9888  0.0  9950 oracle   ora_p006_orcl   ora_u005_orcl
 9888  0.0  9951 oracle   ora_p007_orcl   ora_u005_orcl
 9888  0.0  9952 oracle   ora_cjq0_orcl   ora_u005_orcl
 9888  0.0  9996 oracle   ora_qm02_orcl   ora_u005_orcl
 9888  0.0  9998 oracle   ora_q002_orcl   ora_u005_orcl
 9888  0.0  9999 oracle   ora_q003_orcl   ora_u005_orcl
 9888  0.0 16009 oracle   ora_w000_orcl   ora_u005_orcl
 9894  0.0  9894 oracle   ora_dbw0_orcl   ora_dbw0_orcl
16414  0.0 16414 oracle   sed             sed -n -e 1p -e /orcl/p
 First column is the PID, second column is the CPU percent burn by the thread, third column is the thread Id, next column is the thread owner, the second last column is the oracle thread name and the last is the process name.
With ps, you can have a static view and possibly identify problematic processes
  • top

With top, you can see you processes or threads in a more dynamic fashion. Top option used to see threads is -H, but you have to mention which processes you want to analyze with -p parameter followed by pids. The main drawback of this command is that -p is limited to 20 pids but for a mid size multi-threaded instance, it’s ok.

[oracle@oel64-12c ~]$ top -p $(pgrep -d',' orcl$) -H
top - 21:55:37 up 1 day,  2:03,  6 users,  load average: 1.10, 1.04, 1.13
Tasks:  43 total,   0 running,  43 sleeping,   0 stopped,   0 zombie
Cpu0  : 25.8%us, 46.5%sy,  0.0%ni, 14.5%id, 10.3%wa,  0.0%hi,  3.0%si,  0.0%st
Cpu1  : 16.3%us, 46.8%sy,  0.0%ni, 28.9%id,  7.8%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4055296k total,  3052640k used,  1002656k free,    20360k buffers
Swap:  8388604k total,  1475216k used,  6913388k free,  2302200k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9878 oracle    -2   0 1489m  17m  15m S  6.7  0.4   3:03.07 ora_vktm_orcl
 9892 oracle    20   0 3457m 340m 252m S  0.5  8.6   0:01.27 ora_dia0_orcl
 9902 oracle    20   0 3457m 340m 252m S  0.5  8.6   0:02.00 ora_mmon_orcl
 9871 oracle    20   0 1489m  21m  19m S  0.0  0.5   0:00.33 ora_pmon_orcl
 9873 oracle    20   0 1489m  17m  15m S  0.0  0.4   0:01.17 ora_psp0_orcl
 9882 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.07 ora_scmn_orcl
 9883 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.00 oracle
 9884 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.25 ora_gen0_orcl
 9885 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.25 ora_mman_orcl
 9891 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.24 ora_dbrm_orcl
 9895 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.33 ora_lgwr_orcl
 9896 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:01.08 ora_ckpt_orcl
 9897 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.12 ora_lg00_orcl
 9898 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.03 ora_lg01_orcl
 9899 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.07 ora_smon_orcl
 9901 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.13 ora_lreg_orcl
 9888 oracle    20   0 3457m 340m 252m S  0.0  8.6   0:00.40 ora_scmn_orcl
 9889 oracle    20   0 3457m 340m 252m S  0.0  8.6   0:00.01 oracle
  • pidstat

pidstat is a command which appears in OEL6. It runs like a vmstat or mpstat with an interval and a counter, but it gives information of how evolve cpu, io, memory consumption for a specific process.

For example, to see cpu consumption every second for the process with pid 9888

[oracle@oel64-12c ~]$ pidstat -p 9888 -u 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:03:14 PM       PID    %usr %system  %guest    %CPU   CPU  Command
10:03:15 PM      9888    0.00    0.00    0.00    0.00     0  ora_scmn_orcl
10:03:16 PM      9888    1.00    0.00    0.00    1.00     1  ora_scmn_orcl
10:03:17 PM      9888    0.00    0.00    0.00    0.00     1  ora_scmn_orcl
10:03:18 PM      9888    1.00    1.00    0.00    2.00     1  ora_scmn_orcl
Please note that the process name is ora_u005_orcl but it’s printed with the command name which is, in fact, the thread name.

So if you want to see every thread in this process, you need to use -t option:

[oracle@oel64-12c ~]$ pidstat -p 9888 -u -t 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:08:42 PM      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
10:08:43 PM      9888         -    0.00    0.00    0.00    0.00     1  ora_scmn_orcl
10:08:43 PM         -      9888    0.00    0.00    0.00    0.00     1  |__ora_scmn_orcl
10:08:43 PM         -      9889    0.00    0.00    0.00    0.00     1  |__oracle
10:08:43 PM         -      9890    0.00    0.00    0.00    0.00     0  |__ora_diag_orcl
10:08:43 PM         -      9892    0.00    0.00    0.00    0.00     0  |__ora_dia0_orcl
10:08:43 PM         -      9900    0.00    0.00    0.00    0.00     0  |__ora_reco_orcl
10:08:43 PM         -      9902    0.00    0.00    0.00    0.00     1  |__ora_mmon_orcl
10:08:43 PM         -      9903    0.00    0.00    0.00    0.00     0  |__ora_mmnl_orcl
10:08:43 PM         -      9904    0.00    0.00    0.00    0.00     0  |__ora_d000_orcl
10:08:43 PM         -      9905    0.00    0.00    0.00    0.00     0  |__ora_s000_orcl
10:08:43 PM         -      9906    0.00    0.00    0.00    0.00     0  |__ora_n000_orcl
10:08:43 PM         -      9932    0.00    0.00    0.00    0.00     0  |__ora_tmon_orcl
10:08:43 PM         -      9933    0.00    0.00    0.00    0.00     0  |__ora_tt00_orcl
10:08:43 PM         -      9934    0.00    0.00    0.00    0.00     1  |__ora_smco_orcl
10:08:43 PM         -      9938    0.00    0.00    0.00    0.00     1  |__ora_fbda_orcl
10:08:43 PM         -      9939    0.00    0.00    0.00    0.00     1  |__ora_aqpc_orcl
10:08:43 PM         -      9944    0.00    0.00    0.00    0.00     0  |__ora_p000_orcl
10:08:43 PM         -      9945    0.00    0.00    0.00    0.00     0  |__ora_p001_orcl
10:08:43 PM         -      9946    0.00    0.00    0.00    0.00     1  |__ora_p002_orcl
10:08:43 PM         -      9947    0.00    0.00    0.00    0.00     0  |__ora_p003_orcl
10:08:43 PM         -      9948    0.00    0.00    0.00    0.00     0  |__ora_p004_orcl
10:08:43 PM         -      9949    0.00    0.00    0.00    0.00     1  |__ora_p005_orcl
10:08:43 PM         -      9950    0.00    0.00    0.00    0.00     1  |__ora_p006_orcl
10:08:43 PM         -      9951    0.00    0.00    0.00    0.00     1  |__ora_p007_orcl
10:08:43 PM         -      9952    0.00    0.00    0.00    0.00     1  |__ora_cjq0_orcl
10:08:43 PM         -      9996    0.00    0.00    0.00    0.00     0  |__ora_qm02_orcl
10:08:43 PM         -      9998    0.00    0.00    0.00    0.00     1  |__ora_q002_orcl
10:08:43 PM         -      9999    0.00    0.00    0.00    0.00     0  |__ora_q003_orcl
10:08:43 PM         -     16009    0.00    0.00    0.00    0.00     1  |__ora_w000_orcl
10:08:43 PM         -     21462    0.00    1.00    0.00    1.00     1  |__ora_vkrm_orcl
10:08:43 PM         -     22117    0.00    0.00    0.00    0.00     1  |__ora_w001_orcl
10:08:43 PM         -     22128    0.00    0.00    0.00    0.00     0  |__ora_w002_orcl
10:08:43 PM         -     22689    0.00    0.00    0.00    0.00     1  |__ora_w003_orcl
10:08:43 PM         -     22703    0.00    0.00    0.00    0.00     0  |__ora_w004_orcl
10:08:43 PM         -     22713    0.00    0.00    0.00    0.00     0  |__ora_w005_orcl
There are other interesting options to monitor IO (-d), page faults and memory (-r), CPU utilization seen above (-u), switching activities (-w).
For example:
[oracle@oel64-12c ~]$ pidstat -p 9888 -w -t 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:57:54 PM      TGID       TID   cswch/s nvcswch/s  Command
10:57:55 PM      9888         -      1.00      1.00  ora_scmn_orcl
10:57:55 PM         -      9888      1.00      1.00  |__ora_scmn_orcl
10:57:55 PM         -      9889      0.00      0.00  |__oracle
10:57:55 PM         -      9890      1.00      0.00  |__ora_diag_orcl
10:57:55 PM         -      9892      1.00      0.00  |__ora_dia0_orcl
10:57:55 PM         -      9900      1.00      0.00  |__ora_reco_orcl
10:57:55 PM         -      9902      1.00      0.00  |__ora_mmon_orcl
10:57:55 PM         -      9903      1.00      1.00  |__ora_mmnl_orcl
10:57:55 PM         -      9904      1.00      0.00  |__ora_d000_orcl
10:57:55 PM         -      9905      1.00      0.00  |__ora_s000_orcl
10:57:55 PM         -      9906      1.00      0.00  |__ora_n000_orcl
10:57:55 PM         -      9932      1.00      0.00  |__ora_tmon_orcl
10:57:55 PM         -      9933      1.00      0.00  |__ora_tt00_orcl
10:57:55 PM         -      9934      1.00      1.00  |__ora_smco_orcl
10:57:55 PM         -      9938      1.00      0.00  |__ora_fbda_orcl
10:57:55 PM         -      9939      1.00      0.00  |__ora_aqpc_orcl
10:57:55 PM         -      9944      0.00      0.00  |__ora_p000_orcl
10:57:55 PM         -      9945      0.00      0.00  |__ora_p001_orcl
10:57:55 PM         -      9946      0.00      0.00  |__ora_p002_orcl
10:57:55 PM         -      9947      0.00      0.00  |__ora_p003_orcl
10:57:55 PM         -      9948      0.00      0.00  |__ora_p004_orcl
10:57:55 PM         -      9949      0.00      0.00  |__ora_p005_orcl
10:57:55 PM         -      9950      0.00      0.00  |__ora_p006_orcl
10:57:55 PM         -      9951      0.00      0.00  |__ora_p007_orcl
10:57:55 PM         -      9952      1.00      0.00  |__ora_cjq0_orcl
10:57:55 PM         -      9996      0.00      0.00  |__ora_qm02_orcl
10:57:55 PM         -      9998      0.00      0.00  |__ora_q002_orcl
10:57:55 PM         -      9999      1.00      0.00  |__ora_q003_orcl
10:57:55 PM         -     21462     96.00      3.00  |__ora_vkrm_orcl
10:57:55 PM         -     22713      1.00      0.00  |__ora_w005_orcl
10:57:55 PM         -     29708      0.00      0.00  |__ora_q001_orcl
10:57:55 PM         -     29709      0.00      0.00  |__oracle_29709_or
10:57:55 PM         -     26975      1.00      0.00  |__ora_w004_orcl
  • gdb (for debug)

If you want to trace system calls made by threads, you can use linux debugger (gdb). I don’t have a deep knowledge of gdb, but you can attach gdb to a process with the -p option.

[oracle@oel64-12c ~]$ gdb -p 9888
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9888

.../...
After this, you have a command which prints threads information (LWP (for Light Weight Process ???) indicates the Thread Id:
(gdb) info threads
  31 Thread 0x7f5a89ff6700 (LWP 29709)  0x0000003abe00e75d in read () from /lib64/libpthread.so.0  <<< my session is located here and is waiting for a command (read syscall)
  30 Thread 0x7f5a81ff2700 (LWP 29708)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  29 Thread 0x7f5b10beb700 (LWP 9889)  0x0000003abdcdf343 in poll () from /lib64/libc.so.6
  28 Thread 0x7f5b0ea2a700 (LWP 9890)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  27 Thread 0x7f5b07fff700 (LWP 9892)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  26 Thread 0x7f5afbfff700 (LWP 9900)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  25 Thread 0x7f5af3fff700 (LWP 9902)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  24 Thread 0x7f5aebfff700 (LWP 9903)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  23 Thread 0x7f5ae3fff700 (LWP 9904)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  22 Thread 0x7f5adbfff700 (LWP 9905)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  21 Thread 0x7f5ad3fff700 (LWP 9906)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  20 Thread 0x7f5acbfff700 (LWP 9932)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  19 Thread 0x7f5ac3fff700 (LWP 9933)  0x0000003abe00ef3d in nanosleep () from /lib64/libpthread.so.0
  18 Thread 0x7f5ab3fff700 (LWP 9934)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  17 Thread 0x7f5aabfff700 (LWP 9938)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  16 Thread 0x7f5aa3fff700 (LWP 9939)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  15 Thread 0x7f5a99ffe700 (LWP 9944)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  14 Thread 0x7f5a97ffd700 (LWP 9945)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  13 Thread 0x7f5a95ffc700 (LWP 9946)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  12 Thread 0x7f5a93ffb700 (LWP 9947)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  11 Thread 0x7f5a91ffa700 (LWP 9948)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  10 Thread 0x7f5a8fff9700 (LWP 9949)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  9 Thread 0x7f5a8dff8700 (LWP 9950)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  8 Thread 0x7f5a8bff7700 (LWP 9951)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  7 Thread 0x7f5a9bfff700 (LWP 9952)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  6 Thread 0x7f5a83ff3700 (LWP 9996)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  5 Thread 0x7f5a87ff5700 (LWP 9998)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  4 Thread 0x7f5a85ff4700 (LWP 9999)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  3 Thread 0x7f5abbfff700 (LWP 21462)  0x0000003abe00ef3d in nanosleep () from /lib64/libpthread.so.0
  2 Thread 0x7f5a79fee700 (LWP 22713)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
* 1 Thread 0x7f5b10f2a9e0 (LWP 9888)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
 Next, you can select a specific thread with the gdb “thread” command:
(gdb) thread 31
[Switching to thread 31 (Thread 0x7f5a89ff6700 (LWP 29709))]#0  0x0000003abe00e75d in read () from /lib64/libpthread.so.0
(gdb) info threads
* 31 Thread 0x7f5a89ff6700 (LWP 29709)  0x0000003abe00e75d in read () from /lib64/libpthread.so.0
  30 Thread 0x7f5a81ff2700 (LWP 29708)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  29 Thread 0x7f5b10beb700 (LWP 9889)  0x0000003abdcdf343 in poll () from /lib64/libc.so.6
  28 Thread 0x7f5b0ea2a700 (LWP 9890)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  27 Thread 0x7f5b07fff700 (LWP 9892)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  26 Thread 0x7f5afbfff700 (LWP 9900)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
 .../...
Next, you can use breakpoints, watchpoint etc. to debug oracle calls etc.
If you are interested by tracing oracle system calls with gdb, Frits Hoogland have written many articles on this subject: