Oracle … as usual

Oracle by Laurent Leturgez

Category Archives: Development

Install a Standalone Spark Environment on Oracle Linux 7

Spark is one of the most trendy project in the Apache Fundation.

From now, I usually used it directly on hadoop clusters, but each time I had to play with spark without the need of a complete hadoop cluster, or to test some basic pieces of code … It became hard to do it, specially on my laptop !!! Running a 3 node CDH cluster on your laptop requires CPU and memory !

So in this post, I decided to write how you can setup a small linux virtual machine, and install the last spark version in standalone mode.

First, of all, you need a fully operating linux box … I chose an Oracle Enterprise linux 7.4 one with  3.8.13-118 UEK kernel.

[spark@spark ~]$ sudo uname -r
3.8.13-118.19.4.el7uek.x86_64

Once installed and configured, you need to install java. In my case, I’ve installed a jdk8 SE:

[spark@spark ~]$ sudo yum localinstall /home/spark/jdk-8u121-linux-x64.rpm -y
[spark@spark ~]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Then, create all the required directories for Spark installation and download sources (If you need another version of Spark, you will find following this URL: https://spark.apache.org/downloads.html) :

[spark@spark ~]$ sudo mkdir /usr/local/share/spark
[spark@spark ~]$ sudo chown spark:spark /usr/local/share/spark
[spark@spark ~]$ curl -O https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
[spark@spark ~]$ tar -xvzf spark-2.2.0.tgz -C /usr/local/share/spark/
[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/

If you are behind a proxy server, you have to create a settings.xml file in $HOME/.m2 directory (you’ll probably have to create it). You have to do it, even if you have set http_proxy variable in your environment (beause maven, which is used during the installation process will use it).

Below, you’ll see what my settings.xml file looks like:

[spark@spark ~]$ cat ~/.m2/settings.xml
<settings>
 <proxies>
 <proxy>
 <id>example-proxy</id>
 <active>true</active>
 <protocol>http</protocol>
 <host>10.239.9.20</host>
 <port>80</port>
 </proxy>
 </proxies>
</settings>

Then, you are ready to configure MAVEN environment and launch the installation process:

[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/
[spark@spark spark-2.2.0]$ export MAVEN_OPTS=-Xmx2g -XX:ReservedCodeCacheSize=512m
[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

At the end of the process, a summary report is printed.

[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

.../...

[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0.jar with /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-shaded.jar
[INFO] Dependency-reduced POM written at: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/dependency-reduced-pom.xml
[INFO]
[INFO] --- maven-source-plugin:3.0.1:jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-sources.jar
[INFO]
[INFO] --- maven-source-plugin:3.0.1:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-test-sources.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [01:04 min]
[INFO] Spark Project Tags ................................. SUCCESS [ 26.598 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 6.316 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.129 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.836 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 9.039 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 21.286 s]
[INFO] Spark Project Core ................................. SUCCESS [02:24 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 20.021 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 13.117 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 33.581 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:22 min]
[INFO] Spark Project SQL .................................. SUCCESS [02:56 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:08 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 3.084 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 51.106 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 4.365 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 2.109 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 8.062 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 9.350 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 2.087 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 12.043 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 12.758 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 19.236 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 5.637 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 9.345 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 3.909 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:54 min
[INFO] Finished at: 2017-09-14T12:22:31+02:00
[INFO] Final Memory: 86M/896M
[INFO] ------------------------------------------------------------------------

At this step, if you run some scripts, you’ll throw an error because, even if you have installed spark in standalone, you need hadoop librairies.

It’s an easy thing to do, we just have to download hadoop and configure our environment that way (Please download the hadoop version you need, I chose 2.8 which is the latest stable version for hadoop2, I didn’t make the test with hadoop3 as it’s still in beta):

[spark@spark ~]$ cd /usr/local/share/
[spark@spark share]$ sudo mkdir hadoop
[spark@spark share]$ sudo chown spark:spark hadoop/
[spark@spark share]$ cd hadoop/
[spark@spark hadoop]$ curl -O http://apache.mirrors.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ tar -xzf hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ cat >> ~/.bashrc
export HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}
export SPARK_HOME=/usr/local/share/spark/spark-2.2.0
export PATH=${SPARK_HOME}/bin:${PATH}
[spark@spark hadoop]$ . ~/.bashrc
[spark@spark hadoop]$ env | egrep 'HADOOP|PATH|SPARK'
SPARK_HOME=/usr/local/share/spark/spark-2.2.0
HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
LD_LIBRARY_PATH=/usr/local/share/hadoop/hadoop-2.8.1/lib/native:/usr/local/share/hadoop/hadoop-2.8.1/lib/native:
PATH=/usr/local/share/spark/spark-2.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/spark/.local/bin:/home/spark/bin

Now, we can run the SparkPi example:

[spark@spark ~]$ run-example SparkPi 500
Pi is roughly 3.141360702827214

Note: If you want to remove all those crappy INFO messages in the output, run the command below to configure log4j properties:

[spark@spark hadoop]$ cd $SPARK_HOME/conf
[spark@spark conf]$ sed 's/log4j\.rootCategory=INFO, console/log4j\.rootCategory=WARN, console/g' log4j.properties.template > log4j.properties

 

That’s done, now you’re ready to run your code on spark. Below, I wrote a sample code written in scala to create a dataframe from an oracle JDBC datasource,  and run a groupby function on it.

[spark@spark ~]$ spark-shell --driver-class-path ojdbc7.jar --jars ojdbc7.jar
Spark context Web UI available at http://192.168.99.14:4040
Spark context available as 'sc' (master = local[*], app id = local-1505397247969).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :load jdbc_sample.scala
Loading jdbc_sample.scala...
import java.util.Properties
connProps: java.util.Properties = {}
res0: Object = null
res1: Object = null
df: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(6,0), PROD_NAME: string ... 20 more fields]

scala> df.printSchema
root
 |-- PROD_ID: decimal(6,0) (nullable = false)
 |-- PROD_NAME: string (nullable = false)
 |-- PROD_DESC: string (nullable = false)
 |-- PROD_SUBCATEGORY: string (nullable = false)
 |-- PROD_SUBCATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_SUBCATEGORY_DESC: string (nullable = false)
 |-- PROD_CATEGORY: string (nullable = false)
 |-- PROD_CATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_CATEGORY_DESC: string (nullable = false)
 |-- PROD_WEIGHT_CLASS: decimal(3,0) (nullable = false)
 |-- PROD_UNIT_OF_MEASURE: string (nullable = true)
 |-- PROD_PACK_SIZE: string (nullable = false)
 |-- SUPPLIER_ID: decimal(6,0) (nullable = false)
 |-- PROD_STATUS: string (nullable = false)
 |-- PROD_LIST_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_MIN_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_TOTAL: string (nullable = false)
 |-- PROD_TOTAL_ID: decimal(38,10) (nullable = false)
 |-- PROD_SRC_ID: decimal(38,10) (nullable = true)
 |-- PROD_EFF_FROM: timestamp (nullable = true)
 |-- PROD_EFF_TO: timestamp (nullable = true)
 |-- PROD_VALID: string (nullable = true)

scala> df.groupBy("PROD_CATEGORY").count.show
+--------------------+-----+
|       PROD_CATEGORY|count|
+--------------------+-----+
|      Software/Other|   26|
|               Photo|   10|
|         Electronics|   13|
|Peripherals and A...|   21|
|            Hardware|    2|
+--------------------+-----+

And … that’s it … have fun with Spark 😉

 

Advertisements

Database Load heatmap with AWR and Python

Recently, one of my customer asked me to design him a tool to quickly see the database load. He already did that for another Rdbms with a heatmap.

To do that, I used two distinct part in my work: Data Extraction, and Data Vizualisation.

Data Extraction from AWR

As he has licensed his databases with the Diagnostic pack , I decided to use AWR repository, and as I’m a bit lazy, I found a query from Marcin Przepiorowski (@pioro) and I modified it to pivot data.


col "00-01_ " for 90.99
col "01-02_ " for 90.99
col "02-03_ " for 90.99
col "03-04_ " for 90.99
col "04-05_ " for 90.99
col "05-06_ " for 90.99
col "06-07_ " for 90.99
col "07-08_ " for 90.99
col "08-09_ " for 90.99
col "09-10_ " for 90.99
col "10-11_ " for 90.99
col "11-12_ " for 90.99
col "12-13_ " for 90.99
col "13-14_ " for 90.99
col "14-15_ " for 90.99
col "15-16_ " for 90.99
col "16-17_ " for 90.99
col "17-18_ " for 90.99
col "18-19_ " for 90.99
col "19-20_ " for 90.99
col "20-21_ " for 90.99
col "21-22_ " for 90.99
col "22-23_ " for 90.99
col "23-24_ " for 90.99

WITH t AS
  (SELECT TO_CHAR(mtime,'YYYY/MM/DD') mtime,
    TO_CHAR(mtime,'HH24') d,
    LOAD AS value
  FROM
    (SELECT to_date(mtime,'YYYY-MM-DD HH24') mtime,
      ROUND(SUM(c1),2) AAS_WAIT,
      ROUND(SUM(c2),2) AAS_CPU,
      ROUND(SUM(cnt),2) AAS,
      ROUND(SUM(load),2) LOAD
    FROM
      (SELECT TO_CHAR(sample_time,'YYYY-MM-DD HH24') mtime,
        DECODE(session_state,'WAITING',COUNT(*),0)/360 c1,
        DECODE(session_state,'ON CPU',COUNT( *),0) /360 c2,
        COUNT(                               *)/360 cnt,
        COUNT(                               *)/360/cpu.core_nb load
      FROM dba_hist_active_sess_history,
        (SELECT value AS core_nb FROM v$osstat WHERE stat_name='NUM_CPU_CORES'
        ) cpu
      WHERE sample_time > sysdate - 30
      GROUP BY TO_CHAR(sample_time,'YYYY-MM-DD HH24'),
        session_state,
        cpu.core_nb
      )
    GROUP BY mtime
    )
  )
SELECT mtime,
  NVL("00-01_ ",0) "00-01_ ",
  NVL("01-02_ ",0) "01-02_ ",
  NVL("02-03_ ",0) "02-03_ ",
  NVL("03-04_ ",0) "03-04_ ",
  NVL("04-05_ ",0) "04-05_ ",
  NVL("05-06_ ",0) "05-06_ ",
  NVL("06-07_ ",0) "06-07_ ",
  NVL("07-08_ ",0) "07-08_ ",
  NVL("08-09_ ",0) "08-09_ ",
  NVL("09-10_ ",0) "09-10_ ",
  NVL("10-11_ ",0) "10-11_ ",
  NVL("11-12_ ",0) "11-12_ ",
  NVL("12-13_ ",0) "12-13_ ",
  NVL("13-14_ ",0) "13-14_ ",
  NVL("14-15_ ",0) "14-15_ ",
  NVL("15-16_ ",0) "15-16_ ",
  NVL("16-17_ ",0) "16-17_ ",
  NVL("17-18_ ",0) "17-18_ ",
  NVL("18-19_ ",0) "18-19_ ",
  NVL("19-20_ ",0) "19-20_ ",
  NVL("20-21_ ",0) "20-21_ ",
  NVL("21-22_ ",0) "21-22_ ",
  NVL("22-23_ ",0) "22-23_ ",
  NVL("23-24_ ",0) "23-24_ "
FROM t pivot( SUM(value) AS " " FOR d IN ('00' AS "00-01",'01' AS "01-02",'02' AS "02-03",'03' AS "03-04",'04' AS "04-05",'05' AS "05-06",'06' AS "06-07",'07' AS "07-08",
                                          '08' AS "08-09",'09' AS "09-10",'10' AS "10-11", '11' AS "11-12",'12' AS "12-13",'13' AS "13-14",'14' AS "14-15",'15' AS "15-16",
                                          '16' AS "16-17",'17' AS "17-18",'18' AS "18-19",'19' AS "19-20",'20' AS "20-21",'21' AS "21-22", '22' AS "22-23",'23' AS "23-24") 
            )
ORDER BY mtime

Nota: this query can be used to extract Database Load, Active Average Session (AAS), CPU part of AAS, and the Wait part of AAS. You just have to replace the value alias in the WITH Block.

Another point, I didn’t look at the plan, so I didn’t tune the statement … if you have some tricks to tune it … feel free to comment 😉

This query gives this kind of result:

MTIME      00-01_  01-02_  02-03_  03-04_  04-05_  05-06_  06-07_  07-08_  08-09_  09-10_  10-11_  11-12_  12-13_  13-14_  14-15_  15-16_  16-17_  17-18_  18-19_  19-20_  20-21_  21-22_  22-23_  23-24_
---------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
2016/11/23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.03    0.03    0.00    0.04    0.02    0.00    0.00    0.04    0.01    0.01    0.01    0.04    0.01
2016/11/24    0.02    0.03    0.01    0.03    0.05    0.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
2016/12/06    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.01    0.01    0.02    0.02    0.00    0.00    0.00    0.00
2016/12/07    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.01    0.00    0.00    0.04    0.03    0.01    0.02    0.01    0.02    0.01    0.01    0.06    0.03
2016/12/08    0.03    0.05    0.02    0.02    0.03    0.03    0.02    0.02    0.03    0.02    0.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
2016/12/12    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.03    0.02    0.05    0.02
2016/12/13    0.03    0.01    0.01    0.01    0.02    0.03    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
2016/12/14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.02    0.02    0.02    0.02    0.18    0.03    0.02    0.01    0.01    0.00    0.00    0.00    0.00    0.00

When the result is greater than 1, your database is loaded, but if it’s greater or equal that 2 … it’s really overloaded. In both cases, you have to analyze why ?

The problem with this kind of results is it’s only numbers and you have to analyze the complete results to detect when the problem occurs, here comes the data vizualisation part 😉

Data Vizualisation with Python

On the Oracle database Area, a guy has made a lot for visualization of performance data, it’s Kyle Hailey (@dbavigil). And Visualize your data is a very important thing to quickly identify bottleneck. So if you’re interested by data vizualisation you can visit his blog, you will find a lot of resources to extract AWR data and to visualize them (and other stuff).

So I decided to use Python to vizualise database load data extracted previously. To do that, I used python 3.4.5 with a bunch of packages installed:

  • cx_Oracle to connect Oracle Database and execute queries on your Oracle Database
  • numpy for arrays manipulation
  • plotly to graph the heatmap

The python script is available by following this URL: https://app.box.com/s/irjzi64lbne6xhw3t57aixqiy95toxht. Then, you have to modify it to enter the connection string (line 33).

My customer asked me that my script generates an HTML file  to visualize the heatmap in a browser (One of the reason I used plotly).

This script generates the file in /var/tmp (but you can modify it at the end) and then open the browser locally (So don’t forget to export X11 display if you run it on a linux server).

Below you can see the heatmap (X Axis is the hour range, Y Axis represent a day analyzed, the most recent day is on the first line of the heatmap.).

heatmap_s

This heatmap will help my customer to quickly visualize his database load, I hope it will help you too ;).

Of course, you can use the tool of your choice to visualize data (Tableau, Excel, Kibana etc.)

 

 

Oracle 12c Application Continuity and its resources usage

During a recent Oracle Meetup in Paris, I met Martin Bach and Bertrand Drouvot. Martin was here to present on Application Continuity, and we had an interesting discussion about this technology. (Bertrand had recently published an interesting post about Application Continuity in a Dataguard configuration … here).

In the post, I will explain quickly how it works based on an sample code I wrote few months ago. I will explained too application continuity resources allocation, where they are consumed, and how much is the resource overhead.

12c Application Continuity … What is it and how does it work?

Application Continuity is a technology that will improve application availability. Before Oracle 12c, Transparent Application Failover (TAF) or Fast Connection Failover (FCF) coud help you to improve application availability without downtime, but developers usually have to implement a TAF or FCF aware code. And usually, it’s wasn’t the case for many reasons (testing and UAT platform which weren’t RAC platform, specifications weren’t written in this aim etc.).

In 12c, Oracle releases application continuity. This technology will do the same thing with more simplicity and has been developed to handle planned and unplanned outages.

Application Continuity works on Oracle 12c database and with application that use:

  • Oracle Weblogic Server
  • Oracle UCP (Unified Connection Pool)
  • Third party JAVA application server which used the PooledConnection Interface
  • JDBC Thin-driver

The lab …

To test application continuity, I wrote a simple test case based on :

  • a 12c (12.1.0.2) RAC Cluster with a container database (CDB), a Pluggable Database (PDB), and a Database Schema (LAURENT) with on 2 tables and a “spy” trigger.
  • a small Java Code sample.

On the database side, I created a TEST table add fill it with 500000 rows, and another one (SPY_TABLE) which will track the id of the TEST table that have been modified :


SQL> create table test as select rownum id, dbms_random.string('U',20) v from dual connect by level <= 500000;

Table created.

SQL> create table spy_table(d date, inst_name varchar2(16), id number);

Table created.

Row updates tracking is made by a trigger on update on the TEST table:


SQL> create or replace trigger spy_test
 2  before update on test for each row
 3  declare
 4    pragma autonomous_transaction;
 5  begin
 6    insert into spy_table values(sysdate,sys_context('USERENV', 'INSTANCE_NAME'), :old.id);
 7  commit;
 8  end;
 9 /

Trigger created.

To make the test, we have to create a RAC service for our database which have these features:

  • FAILOVER_TYPE has to be set on TRANSACTION to enable Application Continuity
  • COMMIT_FAILOVER has to be set to TRUE (Even if it’s a parameter related to Transaction Guard, you have to set it)
  • REPLAY_INITIATION_TIMEOUT, FAILOVER_RETRIES, FAILOVER_DELAY should be reviewed to configure the delays for the replay (see this link for more information)
[oracle@rac1 JAVA]$ srvctl add service -d cdb -s app -preferred cdb1 -available cdb2 -pdb pdb -j SHORT -B SERVICE_TIME -z 30 -w 10 -commit_outcome TRUE -e TRANSACTION -replay_init_time 1800 -retention 86400 -notification TRUE
[oracle@rac1 JAVA]$ srvctl start service -s app -d cdb

[oracle@rac1 JAVA]$ srvctl status service -d cdb
Service app is running on instance(s) cdb1

[oracle@rac1 JAVA]$ /u01/app/12.1.0/grid/bin/crsctl stat res ora.cdb.app.svc -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cdb.app.svc
      1        ONLINE  ONLINE       rac1                     STABLE
--------------------------------------------------------------------------------

Note: I voluntarily configured the service with a preferred node to be sure It will be always the same I will use for my tests (related to resources). But I recommend you to configure it with all instances as “preferred”.

The Java source code is available at this link.

Procedures and functions are:

  • pressAnyKeyToContinue: it’s a private method that make a pause during the program. This pause can be useful to attach a JConsole or VisualVM to track JVM memory and CPU.
  • getStatValue: this function returns the value of an oracle statistics for the current session.
  • doTx: this is the most important function in this program. It will perform the transaction and it’s in this one you will find the most important thing when we code Java with Application Continuity: the callback. This callback is delimited by two calls: beginRequest and endRequest. In this callback, you will put all of the operation you want to protect against failures. If a planned or unplanned outage occurs during the callback, the transaction will be rolled back and then replayed. Please note that in the callback area, you have to:
    • call the prepareStatement function to assign the SQL statement, otherwise you will throw a SQLRecoverableException during the connection failover.
    • disable autoCommit for the connection.
    • put a commit statement to say where the transaction ends.
    private void doTx(Connection c, int numValue) throws SQLException {
        String updsql = "UPDATE test " +
                        "SET v=UPPER(v) " +
                        "WHERE id=?";
        PreparedStatement pstmt = null;

        /* Beginning of the callback
         * AutoCommit has to be disabled in the callback part.
         */
        ((oracle.jdbc.replay.ReplayableConnection)c).beginRequest();
        pstmt=c.prepareStatement(updsql);
        c.setAutoCommit(false);
        for (int i=0;i<numValue;i++) {
            pstmt.setInt(1,i);
            pstmt.executeUpdate();
        }
        c.commit();
        // End of the Callback.
        ((oracle.jdbc.replay.ReplayableConnection)c).endRequest();
        pstmt.close();
    }

  • In the main function, the most important thing is to use the new OracleDataSourceImpl class instead of OracleDataSource.

Then, I compile the code and run it with the number of updates I want to execute in my transaction.

First run

During the run, I execute a “kill -9” against the PMON process of the instance where my java program is connected to:

[oracle@rac1 JAVA]$ ./launchJava.sh SimpleAC 50000
Instance Name = cdb1
Press any key to continue...
Performing transactions

<<<< KILLED THE PMON PROCESS HERE >>>>

Instance Name = cdb2
CPU used by this session = 38643
session pga memory max = 4140696

We can see that, even with an instance failure which was running a transaction (of 50000 updates), the failover occurs without any error message.

The Spy table shows the results above:

SQL> select inst_name,count(*) from spy_table group by inst_name;

INST_NAME	   COUNT(*)
---------------- ----------
cdb2		      50000
cdb1		       1270

This proves that the transaction has been rolled-back and completely replayed after the connection failover.

Mutable values

When a transaction becomes replayable, a problem comes with what we call a mutable value. If you are interested by the definition you can read this on wikipedia: http://en.wikipedia.org/wiki/Immutable_object. But I prefer to explain this in a different way.

Imagine we do a close test to the previous one, but instead of updating data, we insert lines into a table and in this table, we have a timestamp column which takes the value of systimestamp oracle function when we insert a line. Now an outage occurs, the connection if failed over and the transaction is rolled back and replayed on the other node. If we don’t say to Oracle that this value is immutable, the value will be inserted at the timestamp recorded on the second node and not the original one and this could cause some problems in your data integrity.

This problem occurs with sysdate, systimestamp, sequence values and another function that depends on your process id (and other parameters) : SYS_GUID.

If you want to make your variables immutables, you need to have the KEEP grant:


SQL> show user
USER is "SYS"
SQL> grant keep date time to laurent;

Grant succeeded.

SQL> grant keep sysguid to laurent;

Grant succeeded.

SQL> connect laurent/laurent@rac-scan:1521/app
Connected.

SQL> grant keep sequence on seq1 to laurent;

Grant succeeded.

No pain, no gain … what about resources consumption?

Application continuity is a good thing because, with less code modification your java app will become completely fault tolerant. As the documentation mentions, there’s an overhead on CPU and memory … then, let’s check this !

To test this, I used the same environment than previously but I didn’t kill any process, I just ran it with and without callback definition.

Then I measured different things:

  • CPU used by the database session (statistic: CPU used by this session)
  • Memory consumed by the database session (statistic: session pga memory max)
  • CPU used by the java program
  • Memory consumed by the java program (Java heap memory usage)

CPU and memory consumed by the java program has been measured with this good old Jconsole.

To get the java heap memory usage, I set up the program with a heap size of 512Mb as it was the good value to avoid garbage collection during execution.

Here are the results:

  • With callback defined

[oracle@rac1 JAVA]$ egrep 'beginRequest|endRequest' SimpleAC.java
((oracle.jdbc.replay.ReplayableConnection)c).beginRequest();
((oracle.jdbc.replay.ReplayableConnection)c).endRequest();

[oracle@rac1 JAVA]$ ./compile.sh

[oracle@rac1 JAVA]$ ./launchJava.sh -Xms512M -Xmx512M SimpleAC 50000
Instance Name = cdb1
Press any key to continue...
Performing transactions
Instance Name = cdb1
CPU used by this session = 34859
session pga memory max = 3961144

load1_avecCallback

Note the heap memory usage high water mark is 110Mb and the average CPU is 0,4%

  • With no callback defined
[oracle@rac1 JAVA]$ egrep 'beginRequest|endRequest' SimpleAC.java
//((oracle.jdbc.replay.ReplayableConnection)c).beginRequest();
//((oracle.jdbc.replay.ReplayableConnection)c).endRequest();

[oracle@rac1 JAVA]$ ./compile.sh

[oracle@rac1 JAVA]$ ./launchJava.sh -Xms512M -Xmx512M SimpleAC 50000
Instance Name = cdb1
Press any key to continue...
Performing transactions
Instance Name = cdb1
CPU used by this session = 35138
session pga memory max = 3747576

load1_sansCallback

The Heap usage high water mark is 91Mb and the average CPU is 0,2%.

To sum up, we can tell that there is a memory overhead on the jdbc layer (for my example which was based on a transaction of 50000 updates). The CPU overhead announced by Oracle seems to be trivial. On the database side, no overhead has been found on CPU nor on the memory.

If you search more information about Application Continuity, you can read this paper by Oracle : http://www.oracle.com/technetwork/database/database-cloud/private/application-continuity-wp-12c-1966213.pdf

SIMD Extensions in and out Oracle 12.1.0.2

First of all, I would like to thank Tanel Pöder from Enkitec Accenture for its review of this post and some precious information he gave me.

—-

Recently I posted a link on twitter which explains basics of SIMD Programming (https://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html), and I had a reply which asked me if it was Oracle 12c style, and the answer is … yes and no.

What is a SIMD extension?

A SIMD Extension is a CPU instruction that computes many data in only one instruction (Single Instruction Multiple Data). Imagine, you have 2 arrays of 4 integers, and you want to compute a sum of those 2 arrays. A classical way will be to loop on each value and add them one by one and to get the result in another array. This operation will produce 4 operations.

Now Imagine, your arrays are now located in a vector of 4 integers, those 2 vectors are in fact specific registers and with only one CPU instruction, you will add those 2 vectors by producing only one vector. You reduce CPU instructions by 4 … for the same result.

If it’s not clear, don’t go away … I have written small C sample code to demonstrate this.

A bit of history

SIMD extensions are not quite recent. They have been created in 1970 with vector programming.

In 1996, SIMD extensions have been widely deployed with MMX extensions (which are SIMD extensions), then Alvitec systems with motorola processors and IBM Power systems have developed more powerful instructions. Then Intel reveals its new SSE extensions in 1999 that have been improved by other extension SSE2, SSE3, SSSE3, SSE4 and now AVX, AVX2 and AVX512 extensions.

So Oracle is not using a specific extension but those which are available on your platform, because all CPUs are not offering the same extensions. For example, modern processors have AVX extensions, but most recent extension (AVX-512) are only available in Xeon Phi Knights Landing and Xeon Skylake microarchitectures (broadwell successors).

Data Structures

SIMD extensions are based on data structures or vectors.

A vector is an array data structure (don’t be confused with an array datatype) which have a fixed length and which is, in fact, a succession of scalars of one type.

For example, if you have a vector of 64 bits (8 bytes), you can put in it 2 integers because an integer has a 4 bytes size (in x86-64 arch), 8 chars (1 bytes) but only one double (8 bytes long).

Those data structures are located is CPU registers dedicated for those SIMD instructions.

Let’s take an example, you want to process the sum of two vectors in a processor which uses only MMX instructions (old one 😉 ) have 8 registers (MM0 through MM7). Each register holds 64 bits.

First vector content is 1,2 and second one is 1,2. First vector is copied from memory to MM0 register and the second in MM1, and then the CPU launch the SIMD instruction that will produce in MM0 the sum on MM1 and MM0, and then MM0 is copied in memory as a result.

Now imagine, your vector doesn’t hold 64 bits but 128, 256, 512 or 1024 … you will put in it more data and those data will be computed with only one operation …

It’s one of the key of SIMD evolution, MMX uses 64 bits registers (MM0 to MM7), SSE (1/2/3 and 4) uses 128 bits registers (XMM), AVX (1/2) uses 256 bits registers (YMM), and AVX-512 uses 512 bits registers (ZMM).

For Intel processors, vector datatypes are __m64, __mm128, __mm256, and __mm512 (each vector will contain floating point value aka float), you have the equivalent for double precision values (__mm128d, __mm256d, __mm512d) and for other types : int, short, char (__mm128i, __mm256i, __mm512i).

Note: Note that all those types are automatically aligned on a 8, 16, 32 or 64 bytes boundaries.

Now computing data

Now you know how will be computed your data, you can perform operation on it. You can add, multiply your vectors, perform bit shifting etc.

You have the choice to do “classical” operations, or you can use Intel’s intrinsics which are functions which computes a specific operation (basic mathematics, bit shifting, comparisons etc.). All of Intel’s Intrinsics are available at this URL: https://software.intel.com/sites/landingpage/IntrinsicsGuide/. On this page you can also see performance information of each function on different processors.

 Examples

For all examples above, I used C langage.

Compiling “SIMD aware” programs (with GCC)

If you want to compile SIMD aware program, you have to include “immintrin.h” header file which is available with GCC. This header will test which extension you have, and you have used for you compilation. (Just find this file and open it). Depending on your CPU and compilation, it will include another header file:

  • mmintrin.h for MMX instructions and datatypes:
  • xmmintrin.h for SSE
  • emmintrin.h for SSE2
  • pmmintrin.h for SSE3
  • tmmintrin.h for SSSE3
  • smmintrin.h for SSE4.1 and SSE4.2
  • avxintrin.h for AVX

When you compile your program, some extensions are not included by default. Indeed if your CPU supports AVX extensions, if you don’t give the correct option to the compiler, AVX won’t be used.

Main options are:

  • O3: this option enable vectorization loops optimization.
  • msse4.1: this option enable SSE4.1 extension
  • msse4.2: this option enable SSE4.2 extension
  • mavx: this option enable AVX extension
  • mavx2: this option enable AVX2 extension

Other options are available here: https://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/i386-and-x86_002d64-Options.html

To demonstrate this, I used a small program:


#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>

void print_extensions () {
#ifdef __MMX__
printf("MMX ... OK\n");
#else
printf("MMX ... KO\n");
#endif

#ifdef __SSE__
printf("SSE ... OK\n");
#else
printf("SSE ... KO\n");
#endif

#ifdef __SSE2__
printf("SSE2 ... OK\n");
#else
printf("SSE2 ... KO\n");
#endif

#ifdef __SSE3__
printf("SSE3 ... OK\n");
#else
printf("SSE3 ... KO\n");
#endif

#ifdef __SSSE3__
printf("SSSE3 ... OK\n");
#else
printf("SSSE3 ... KO\n");
#endif

#if defined (__SSE4_2__) || defined (__SSE4_1__)
printf("SSE4_1/2 ... OK\n");
#else
printf("SSE4_1/2 ... KO\n");
#endif

#if defined (__AES__) || defined (__PCLMUL__)
printf("AES/PCLMUL ... OK\n");
#else
printf("AES/PCLMUL ... KO\n");
#endif

#ifdef __AVX__
printf("AVX ... OK\n");
#else
printf("AVX ... KO\n");
#endif
}

int main(int argc, char** argv) {
print_extensions();
return 0;
}

If you run it with only O3 optimization, you will get this result:


macbook-laurent:simd $ sysctl -a | egrep 'cpu.*features'
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ENFSTRG RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM
machdep.cpu.extfeatures: SYSCALL 1GBPAGE EM64T LAHF RDTSCP TSCI

macbook-laurent:simd $ cc -O3 -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... KO
AES/PCLMUL ... KO
AVX ... KO

If you run with correct options, your program can use AVX or SSE4 extensions:

macbook-laurent:simd $ cc -O3 -msse4.2 -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... OK
AES/PCLMUL ... KO
AVX ... KO
macbook-laurent:simd $ cc -O3 -mavx -o simd_ext simd_ext.c
macbook-laurent:simd $ ./simd_ext
MMX ... OK
SSE ... OK
SSE2 ... OK
SSE3 ... OK
SSSE3 ... OK
SSE4_1/2 ... OK
AES/PCLMUL ... KO
AVX ... OK

Note that if you enable AVX extension, SSE4 extensions are enabled by default.

Example of SSE2 usage in a basic operation (sum)

The C code above will show you how to perform a sum of two arrays of 16 integers each without using Intel intrinsics:


void func2_sse() {
int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
int b[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
__m128i* aptr;
__m128i* bptr;
int i;
int loopcnt=0;
printf("sizeof(__m128i)=%lu\n",sizeof(__m128i));
printf("sizeof(a)=%lu\n",sizeof(a));

// Above, we cast integer arrays to vectors of integers

aptr=(__m128i*)a;
bptr=(__m128i*)b;

// and now we compute the sum
for (i=0;i<sizeof(a)/sizeof(__m128i);i++) {
loopcnt++;
bptr[i]=aptr[i]+bptr[i];
}

int* c=(int*)bptr;

printf("loopcount = %d\nresult= ",loopcnt);
for (i=0;i<16;i++) {
printf("%d ",c[i]);
}
printf("\n");
}

and the result, my sum has been computed in only 4 loops:


SSE
--------------------
sizeof(__m128i)=16
sizeof(a)=64
loopcount = 4
result= 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Same example with AVX extension:


void func2_avx() {
 int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
 int b[16] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
 __m256i* aptr;
 __m256i* bptr;
 int i;
 int loopcnt=0;
 printf("sizeof(__m256i)=%lu\n",sizeof(__m256i));
 printf("sizeof(a)=%lu\n",sizeof(a));
 aptr=(__m256i*)a;
 bptr=(__m256i*)b;

 for (i=0;i<sizeof(a)/sizeof(__m256i);i++) {
 loopcnt++;
 bptr[i]=aptr[i]+bptr[i];
 }

 int* c=(int*)bptr;

 printf("loopcount = %d\nresult= ",loopcnt);
 for (i=0;i<16;i++) {
 printf("%d ",c[i]);
 }
 printf("\n");
}

and the result, my sum has been computed in only 2 loops:


AVX
--------------------
sizeof(__m256i)=32
sizeof(a)=64
loopcount = 2
result= 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

 Now, let’s compare two data sets with SIMD extension

Next code sample concerns a vector where we want to search the value 10. To do that, we use a comparison function and a function to build a 256bits (AVX) vector full of the value we search. The comparison function works with 32bits packets (useful to compare integers) and returns 0xFFFFFFFF if both values are equal, 0x0 otherwise. As it’s an AVX function, our initial vector composed by 16 values is processed in only 2 CPU cycles.

void func2_compare_32bitsPack() {
    int a[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
    __m256i* aptr;
    __m256i b;
    int i;
    int loopcnt=0;
    aptr=(__m256i*)a;
    // b is a vector full off int(32bits) equal to 10 (the value we search)
    b=_mm256_set1_epi32(10);

    for (i=0;i<sizeof(a)/sizeof(__m256i);i++) {
        loopcnt++;
        // comparison intrinsic function: packed by 32 bits(specific for int: if equal set 0xFFFFFFFF, 0x0 otherwise)
        aptr[i]=_mm256_cmpeq_epi32(aptr[i],b);
    }

    // print results
    int* c=(int*)aptr;

    printf("loopcount = %d\nresult= ",loopcnt);
    for (i=0;i<16;i++) {
        printf("0x%x   ",c[i]);
    }
    printf("\n");
}

And the result:


macbook-laurent:simd $ ./simd
Comparison
loopcount = 2
result= 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xffffffff 0x0 0x0 0x0 0x0 0x0 0x0

It becomes easy to identify that the value 10 is located at the index 10 in our initial array.

Ok, and how SIMD extensions are used in Oracle 12c In Memory ?

If you have read my last post on how to activate SSE4 extensions on VirtualBox guests (https://laurent-leturgez.com/2015/04/14/enable-simd-sse4-extension-in-oracle-virtualbox/) , and Tanel Pöder’s post (https://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/), you have noticed that Oracle can run IM with only SSE2 extension (default), but if your CPUs have SSE4, or AVX extensions, Oracle will use some specific libraries that uses SSE4 (libshpksse4212.so) and AVX (libshpkavx12.so).

If we have a look at functions in those libraries, we will see that every function starts with “kdzk”

[oracle@oel64-112 ~]$ readelf -a /u01/app/oracle/product/12.1.0/dbhome_1/lib/libshpksse4212.a | grep FUNC
 6: 0000000000000030 256 FUNC LOCAL DEFAULT 3 kdzk_overload_opc_name
 23: 0000000000000130 80 FUNC LOCAL DEFAULT 3 kdzk_flag_name
 26: 0000000000000180 112 FUNC LOCAL DEFAULT 3 kdzk_enc_name
 31: 00000000000001f0 320 FUNC LOCAL DEFAULT 3 kdzk_datawidth_name
 64: 0000000000002b70 544 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_1bit
 65: 0000000000002d90 544 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_1bit
 66: 0000000000002fb0 544 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_1bit
 67: 00000000000031d0 592 FUNC LOCAL DEFAULT 3 kdzk_le_dict_1bit
 68: 0000000000003420 592 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_1bit
 69: 0000000000003670 544 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_1bit
 70: 0000000000003890 224 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_1bit
 71: 0000000000003970 576 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_1bit
 72: 0000000000003bb0 576 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_1bit
 73: 0000000000003df0 992 FUNC LOCAL DEFAULT 3 kdzk_ge_le_dict_1bit
 74: 00000000000041d0 512 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_1bit_null
 75: 00000000000043d0 192 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_1bit_null
 76: 0000000000004490 512 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_1bit_null
 77: 0000000000004690 512 FUNC LOCAL DEFAULT 3 kdzk_le_dict_1bit_null
 78: 0000000000004890 464 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_1bit_null
 79: 0000000000004a60 512 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_1bit_null
 80: 0000000000004c60 192 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_1bit_null
 81: 0000000000004d20 528 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_1bit_null
 82: 0000000000004f30 192 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_1bit_null
 83: 0000000000004ff0 528 FUNC LOCAL DEFAULT 3 kdzk_ge_le_dict_1bit_null
 84: 0000000000005200 848 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_2bit_selecti
 85: 0000000000005550 960 FUNC LOCAL DEFAULT 3 kdzk_eq_dict_2bit
 89: 0000000000005910 848 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_2bit_selecti
 90: 0000000000005c60 1056 FUNC LOCAL DEFAULT 3 kdzk_lt_dict_2bit
 91: 0000000000006080 848 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_2bit_selecti
 92: 00000000000063d0 1008 FUNC LOCAL DEFAULT 3 kdzk_gt_dict_2bit
 93: 00000000000067c0 848 FUNC LOCAL DEFAULT 3 kdzk_le_dict_2bit_selecti
 94: 0000000000006b10 1024 FUNC LOCAL DEFAULT 3 kdzk_le_dict_2bit
 95: 0000000000006f10 848 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_2bit_selecti
 96: 0000000000007260 1056 FUNC LOCAL DEFAULT 3 kdzk_ge_dict_2bit
 97: 0000000000007680 848 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_2bit_selecti
 98: 00000000000079d0 960 FUNC LOCAL DEFAULT 3 kdzk_ne_dict_2bit
 99: 0000000000007d90 928 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_2bit_sele
 100: 0000000000008130 1328 FUNC LOCAL DEFAULT 3 kdzk_gt_lt_dict_2bit
 101: 0000000000008660 928 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_2bit_sele
 102: 0000000000008a00 1296 FUNC LOCAL DEFAULT 3 kdzk_gt_le_dict_2bit
 103: 0000000000008f10 928 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_2bit_sele
 104: 00000000000092b0 1328 FUNC LOCAL DEFAULT 3 kdzk_ge_lt_dict_2bit</pre>

kdzk is the Oracle component that manages compression:


SQL> oradebug doc components

.../...

Components in library ADVCMP:
--------------------------
 ADVCMP_MAIN Archive Compression (kdz)
 ADVCMP_COMP Archive Compression: Compression (kdzc, kdzh, kdza)
 ADVCMP_DECOMP Archive Compression: Decompression (kdzd, kdzs)
 ADVCMP_DECOMP_HPK Archive Compression: HPK (kdzk)
 ADVCMP_DECOMP_PCODE Archive Compression: Pcode (kdp)

An interesting thing to see is that, even you use an Oracle Kernel without any SSE4 nor AVX extension active (so your process doesn’t use libshpksse4212.so nor libshpkavx12.so library), you use kdz functions when you query and filter a table which is managed in Memory.

In a session I run the statements above:


SQL> select segment_name,BYTES,BYTES_NOT_POPULATED from v$im_segments

SEGMENT_NAME         BYTES         BYTES_NOT_POPULATED
-------------------- ------------- -------------------
S                         37748736                   0

SQL> select spid from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

SPID
------------------------
3619

SQL> select count(*) from s where amount_sold>1700;

Just before launching the command, I attach my process and run gdb to catch every call to kdz functions:


[oracle@oel64-112 ~]$ pmap -x 3619 | egrep 'sse|avx'

[oracle@oel64-112 ~]$ gdb -pid 3619
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
.../...
Loaded symbols for /u01/app/oracle/product/12.1.0/dbhome_1/lib/libnque12.so
0x000000362ea0e740 in __read_nocancel () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.4.x86_64 libaio-0.3.107-10.el6.x86_64 numactl-2.0.7-8.el6.x86_64
(gdb) rbreak ^kdz

.../...

(gdb) commands
Type commands for breakpoint(s) 1-2165, one per line.
End with a line saying just "end".
>continue
>end

If you study the output, you will see that a lot of functions are called, and in the list, you will find some interesting functions: kdzdcol_get_minval, kdzdcol_get_maxval, kdzk_build_vector etc. Oracle clearly uses vectors to process IM compression units.

In my opinion, it’s normal to use functions related to compression because the kernel manipulates “Compression Units”, and it should integrates SIMD functions in its libraries.

A last curiosity with Oracle 12c (12.1.0.2)

Ok now you had a look to your installation, your machine is “AVX enabled”, and Oracle processes uses the AVX compatible library (libshpkavx212.so), everything is OK and you think you will use all this stuff.

But if you use objdump on this library, and you search for AVX registers, you won’t find anything:


[oracle@oel64-112 ~]$ grep -i ymm objdump_out.1 | wc -l
0

Tanel Pöder gave me the answer !!! Oracle database code is compiled to be compatible with Redhat/Oracle Linux 5, so it must be compatible with kernel 2.6.18. But linux scheduler can work with YMM registers from version 2.6.30 onwards.

You can use new instructions without the kernel knowing about us, but you can’t use registers that are not yet supported by the kernel.

I think next version of Oracle will improve this, maybe in 12.2.

To conclude, there is not Oracle 12c style for SIMD instructions. Oracle has developed functions that uses SIMD instructions, for Intel CPUs, they uses SSE, SSE2, SSE3, SSE4 or AVX depending on the CPU architecture, on IBM AIX these libraries use VMX extension (SIMD instruction on Power) etc.

Sources:

http://blog.tanelpoder.com/2014/10/05/oracle-in-memory-column-store-internals-part-1-which-simd-extensions-are-getting-used/

http://en.wikipedia.org/wiki/Data_structure_alignment

http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

http://en.wikipedia.org/wiki/SIMD

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

https://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html

https://laurent-leturgez.com/2015/04/14/enable-simd-sse4-extension-in-oracle-virtualbox/

Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.

If you already faced RAC problems, you had to analyze log files from the entry points which are the alert.log files.

But as Oracle clusterware logs its events in many logfiles (usually one per process), and you have a logfile directory on each server, it can be difficult to detect the causes, and specially the root cause of a specific problem.

Recently I created a little perl script that will help me for this purpose. This script is named “crs_log_aggregator.pl” and is available for download at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c.

This script works like this. As input, it takes one or n directories to analyze recursively. Those directories are containing all crs log files of all cluster nodes you want to analyze.

For each directory, it parses each log files (except the alert.log) and each history log files (with file extension: l??). Parsing will get the timestamp mentioned in the read line, if there’s no timestamp (because message is written in many lines for example), the perl script will use the preceding timestamp. (If there were no timestamp for example, in the file header, it will flag the timestamp as ‘N/A’). Every analyzed line in the log file, is completed by its origin represented by the relative file path.

As a result, an analyzed line, is represented in the output file by three parts:

  • the timestamp of the message
  • the origin file of the message
  • the message

As a separator, I used a “<SEP>” tag.

So at this step, each line is recorded in a single unsorted file. If a filter expression has been given to the program as input, which is highly recommended, the filter is applied during the parsing process.

Next, the program will sort the output file by timestamp and will produce a single output file which is now filtered (or not) and this file can be analyzed.

Well, let’s see how it works:

For example, let’s consider a two nodes RAC (hostnames are rac1 and rac2).

As alert.log files are not analyzed, you need to analyze them before to see when the problem occurs.

In my case, I want to analyze 2 days: 2013-05-30 and 2013-05-31.

So first step is to copy is a filesystem, or on your desktop, all the logfiles.

I recommend to put in a single directory all the directories related to a single hostname. Here, I have two directories coming from both cluster nodes ($GRID_HOME/log) :

[oracle@rac1 aggregator]$ tree -d .
.
|-- rac1
|   |-- acfslog
|   |-- acfsrepl
|   |-- acfsreplroot
|   |-- acfssec
|   |-- admin
|   |-- agent
|   |   |-- crsd
|   |   |   |-- oraagent_oracle
|   |   |   |-- orarootagent_root
|   |   |   `-- scriptagent_oracle
|   |   `-- ohasd
|   |       |-- oraagent_oracle
|   |       |-- oracssdagent_root
|   |       |-- oracssdmonitor_root
|   |       `-- orarootagent_root
|   |-- client
|   |-- crflogd
|   |-- crfmond
|   |-- crsd
|   |-- cssd
|   |-- ctssd
|   |-- cvu
|   |   |-- cvulog
|   |   `-- cvutrc
|   |-- diskmon
|   |-- evmd
|   |-- gipcd
|   |-- gnsd
|   |-- gpnpd
|   |-- mdnsd
|   |-- ohasd
|   |-- racg
|   |   |-- racgeut
|   |   |-- racgevtf
|   |   `-- racgmain
|   `-- srvm
`-- rac2
    |-- acfslog
    |-- acfsrepl
    |-- acfsreplroot
    |-- acfssec
    |-- admin
    |-- agent
    |   |-- crsd
    |   |   |-- oraagent_oracle
    |   |   |-- orarootagent_root
    |   |   `-- scriptagent_oracle
    |   `-- ohasd
    |       |-- oraagent_oracle
    |       |-- oracssdagent_root
    |       |-- oracssdmonitor_root
    |       `-- orarootagent_root
    |-- client
    |-- crflogd
    |-- crfmond
    |-- crsd
    |-- cssd
    |-- ctssd
    |-- cvu
    |   |-- cvulog
    |   `-- cvutrc
    |-- diskmon
    |-- evmd
    |-- gipcd
    |-- gnsd
    |-- gpnpd
    |-- mdnsd
    |-- ohasd
    |-- racg
    |   |-- racgeut
    |   |-- racgevtf
    |   `-- racgmain
    `-- srvm

74 directories
[oracle@rac1 aggregator]$ ls -l
total 20
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:39 rac1
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:00 rac2
If you want some help about this tool, you can use the -h option:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -h

RAC Logs consolidator
---------------------------------------------

Usage

./crs_log_aggregator.pl -c dir1[,dir2,dir3,...,dirN] -o outputfile [-f filter]

[MANDATORY]: -c dir1,dir2,...,dirN
                 comma separated values of source directories to analyze
[MANDATORY]: -o outputfile
                 unique sorted and filtered output file
[OPTIONAL] : -f filter
                 regex filter to apply
[OPTIONAL] : -h
                 print this help

Examples:
$ ./crs_log_aggregator.pl -c data/ -o outfile
         will sort without any filter all log files (*.log, *.l??) recursively found in the data directory. outfile is produced in the current directory as a result.
$ ./crs_log_aggregator.pl -c data/rac1/cssd,data/rac1/crsd -o outfile
         will sort without any filter only the content of data/rac1/cssd,data/rac1/crsd  directories.
$ ./crs_log_aggregator.pl -c data/ -o outfile_filtered -f '2012-10-26|2012-10-27'
         will sort with a filter (regex) that will filter 2 days 2012-10-26 and 2012-10-27.

        more information on perl regex: http://perldoc.perl.org/perlre.html
Next, I will launch the program with the good options:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -c ./rac1,./rac2 -o output_file -f '2013-05-30|2013-05-31'

---------Alert.log : ./rac1/alertrac1.log excluded from analyze

---------Alert.log : ./rac2/alertrac2.log excluded from analyze

[oracle@rac1 aggregator]$ ls -lh output_file
-rw-r--r-- 1 oracle oinstall 243M Sep  8 15:16 output_file
Please note that, if an alert.log file is found, it will be excluded and mentioned in the standard output.
I highly recommend you to use a filter to reduce the amount of data to sort (It can burn lot of CPU and memory). The -f option is use to do this, and takes a regexp style filter (it ‘s used directly in the perl script, so if you are not aware with regexp used in perl script, you can have a look at this URL: http://perldoc.perl.org/perlre.html).
As a result, you have this kind of messages in the output file:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmRPCBroadcast: rpc(0x2a1002a), status(0), sendcount(0), filtered by specific properties:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmFreeRPCIndex: freeing rpc 673
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmDeleteGrock: (0x7f3f38637990) grock(EVMDMAIN2) deleted
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ COMMCRS][1105488192]clsc_connect: (0x7f23b40531f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_LOGD))
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]clsdmc_send error code: -7
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]Mond is being shutdown by default
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Daemon Select Thread exiting
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Skipping Agent Initiated a check action
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]isRunning is reset to false here
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss1: NTP default config file found
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss8: Return [0] and NTP status [2].
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 3 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 1 event.
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] Returning NETDATA: 1 interfaces
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] # 0 Interface 'eth2',ip='10.10.10.12',mac='08-00-27-92-f1-98',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ora.crf][1089526080] {0:0:613} [stop] (:CLSN00108:) clsn_agent::stop }
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [    AGFW][1089526080] {0:0:613} Command: stop for resource: ora.crf 1 1 completed with status: SUCCESS

Now, you can analyze the output file directly, but I have developed another app (in Java). This app is available at this URL: https://app.box.com/s/xi3h7bpssrqtkp55q43c.
To launch it, you need to use java 1.7.
This app will show in a more beautiful interface the result of the output file. Messages that are coming from the same file are colored with the same color:
[oracle@rac1 aggregator]$ java -jar Aggr_visualisator.jar
 Aggr_visualisator
Update:
  • in v0.2, I implemented a filter function. This filter is active on the “Message” column.
  • I implemented a colorization function for the “Message” column.
    • if the term “error” is found, the cell will be colored in red
    • If the term “warn” is found, the cell will be colored in orange
    • and if the term “fail” is found, the cell wil be colored in yellow
v0.2 is available at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c
two snapshots of this version below 😉
  • Messages colorization:
Aggr_visualisator_0.2_1
  • Filtering messages (based on a regexp)
Aggr_visualisator_0.2_2
If you have any suggestion to improve these tools, send me an email or a comment. I will study them.
I hope this will help you 😉