Oracle … as usual

Oracle by Laurent Leturgez

Install a Standalone Spark Environment on Oracle Linux 7

Spark is one of the most trendy project in the Apache Fundation.

From now, I usually used it directly on hadoop clusters, but each time I had to play with spark without the need of a complete hadoop cluster, or to test some basic pieces of code … It became hard to do it, specially on my laptop !!! Running a 3 node CDH cluster on your laptop requires CPU and memory !

So in this post, I decided to write how you can setup a small linux virtual machine, and install the last spark version in standalone mode.

First, of all, you need a fully operating linux box … I chose an Oracle Enterprise linux 7.4 one with  3.8.13-118 UEK kernel.

[spark@spark ~]$ sudo uname -r
3.8.13-118.19.4.el7uek.x86_64

Once installed and configured, you need to install java. In my case, I’ve installed a jdk8 SE:

[spark@spark ~]$ sudo yum localinstall /home/spark/jdk-8u121-linux-x64.rpm -y
[spark@spark ~]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Then, create all the required directories for Spark installation and download sources (If you need another version of Spark, you will find following this URL: https://spark.apache.org/downloads.html) :

[spark@spark ~]$ sudo mkdir /usr/local/share/spark
[spark@spark ~]$ sudo chown spark:spark /usr/local/share/spark
[spark@spark ~]$ curl -O https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
[spark@spark ~]$ tar -xvzf spark-2.2.0.tgz -C /usr/local/share/spark/
[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/

If you are behind a proxy server, you have to create a settings.xml file in $HOME/.m2 directory (you’ll probably have to create it). You have to do it, even if you have set http_proxy variable in your environment (beause maven, which is used during the installation process will use it).

Below, you’ll see what my settings.xml file looks like:

[spark@spark ~]$ cat ~/.m2/settings.xml
<settings>
 <proxies>
 <proxy>
 <id>example-proxy</id>
 <active>true</active>
 <protocol>http</protocol>
 <host>10.239.9.20</host>
 <port>80</port>
 </proxy>
 </proxies>
</settings>

Then, you are ready to configure MAVEN environment and launch the installation process:

[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/
[spark@spark spark-2.2.0]$ export MAVEN_OPTS=-Xmx2g -XX:ReservedCodeCacheSize=512m
[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

At the end of the process, a summary report is printed.

[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

.../...

[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0.jar with /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-shaded.jar
[INFO] Dependency-reduced POM written at: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/dependency-reduced-pom.xml
[INFO]
[INFO] --- maven-source-plugin:3.0.1:jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-sources.jar
[INFO]
[INFO] --- maven-source-plugin:3.0.1:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-test-sources.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [01:04 min]
[INFO] Spark Project Tags ................................. SUCCESS [ 26.598 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 6.316 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.129 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.836 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 9.039 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 21.286 s]
[INFO] Spark Project Core ................................. SUCCESS [02:24 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 20.021 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 13.117 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 33.581 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:22 min]
[INFO] Spark Project SQL .................................. SUCCESS [02:56 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:08 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 3.084 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 51.106 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 4.365 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 2.109 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 8.062 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 9.350 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 2.087 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 12.043 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 12.758 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 19.236 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 5.637 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 9.345 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 3.909 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:54 min
[INFO] Finished at: 2017-09-14T12:22:31+02:00
[INFO] Final Memory: 86M/896M
[INFO] ------------------------------------------------------------------------

At this step, if you run some scripts, you’ll throw an error because, even if you have installed spark in standalone, you need hadoop librairies.

It’s an easy thing to do, we just have to download hadoop and configure our environment that way (Please download the hadoop version you need, I chose 2.8 which is the latest stable version for hadoop2, I didn’t make the test with hadoop3 as it’s still in beta):

[spark@spark ~]$ cd /usr/local/share/
[spark@spark share]$ sudo mkdir hadoop
[spark@spark share]$ sudo chown spark:spark hadoop/
[spark@spark share]$ cd hadoop/
[spark@spark hadoop]$ curl -O http://apache.mirrors.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ tar -xzf hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ cat >> ~/.bashrc
export HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}
export SPARK_HOME=/usr/local/share/spark/spark-2.2.0
export PATH=${SPARK_HOME}/bin:${PATH}
[spark@spark hadoop]$ . ~/.bashrc
[spark@spark hadoop]$ env | egrep 'HADOOP|PATH|SPARK'
SPARK_HOME=/usr/local/share/spark/spark-2.2.0
HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
LD_LIBRARY_PATH=/usr/local/share/hadoop/hadoop-2.8.1/lib/native:/usr/local/share/hadoop/hadoop-2.8.1/lib/native:
PATH=/usr/local/share/spark/spark-2.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/spark/.local/bin:/home/spark/bin

Now, we can run the SparkPi example:

[spark@spark ~]$ run-example SparkPi 500
Pi is roughly 3.141360702827214

Note: If you want to remove all those crappy INFO messages in the output, run the command below to configure log4j properties:

[spark@spark hadoop]$ cd $SPARK_HOME/conf
[spark@spark conf]$ sed 's/log4j\.rootCategory=INFO, console/log4j\.rootCategory=WARN, console/g' log4j.properties.template > log4j.properties

 

That’s done, now you’re ready to run your code on spark. Below, I wrote a sample code written in scala to create a dataframe from an oracle JDBC datasource,  and run a groupby function on it.

[spark@spark ~]$ spark-shell --driver-class-path ojdbc7.jar --jars ojdbc7.jar
Spark context Web UI available at http://192.168.99.14:4040
Spark context available as 'sc' (master = local[*], app id = local-1505397247969).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :load jdbc_sample.scala
Loading jdbc_sample.scala...
import java.util.Properties
connProps: java.util.Properties = {}
res0: Object = null
res1: Object = null
df: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(6,0), PROD_NAME: string ... 20 more fields]

scala> df.printSchema
root
 |-- PROD_ID: decimal(6,0) (nullable = false)
 |-- PROD_NAME: string (nullable = false)
 |-- PROD_DESC: string (nullable = false)
 |-- PROD_SUBCATEGORY: string (nullable = false)
 |-- PROD_SUBCATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_SUBCATEGORY_DESC: string (nullable = false)
 |-- PROD_CATEGORY: string (nullable = false)
 |-- PROD_CATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_CATEGORY_DESC: string (nullable = false)
 |-- PROD_WEIGHT_CLASS: decimal(3,0) (nullable = false)
 |-- PROD_UNIT_OF_MEASURE: string (nullable = true)
 |-- PROD_PACK_SIZE: string (nullable = false)
 |-- SUPPLIER_ID: decimal(6,0) (nullable = false)
 |-- PROD_STATUS: string (nullable = false)
 |-- PROD_LIST_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_MIN_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_TOTAL: string (nullable = false)
 |-- PROD_TOTAL_ID: decimal(38,10) (nullable = false)
 |-- PROD_SRC_ID: decimal(38,10) (nullable = true)
 |-- PROD_EFF_FROM: timestamp (nullable = true)
 |-- PROD_EFF_TO: timestamp (nullable = true)
 |-- PROD_VALID: string (nullable = true)

scala> df.groupBy("PROD_CATEGORY").count.show
+--------------------+-----+
|       PROD_CATEGORY|count|
+--------------------+-----+
|      Software/Other|   26|
|               Photo|   10|
|         Electronics|   13|
|Peripherals and A...|   21|
|            Hardware|    2|
+--------------------+-----+

And … that’s it … have fun with Spark 😉

 

Advertisements

How user/role password security is managed in Oracle vs. Postgres

First of all, I would like to say that I will not compare a licensed product (Oracle) vs. a freely distributed product (Postgres). This article is written to mention how password security is managed in both products, not to tell Oracle (or postgres) is the best product or vice versa.

Ok, that said, I have analyzed and worked on how password security is managed based on 4 topics: Password hashes, Salt used, password length and case sensitivity.

During all these tests, I used an Oracle 12.2 database (EE) and the last postgres server version 9.6.2 I built from sources.

HASHES

In Oracle and in Postgres, both rdbms use hash algorithms to store encrypted passwords.

Oracle

In Oracle 12.2, depending on the value of SQLNET.ALLOWED_LOGON_VERSION_SERVER in sqlnet.ora, the password can be stored using different hash algorithms.

To sum up:

  • Before Oracle 11gR1: an algorithm based on DES algorithm is used to generate password hash. This algorithm is weak due to the key length which is too short, and because the password generated is not case sensitive. The password hash is store in USER$.PASSWORD in the dictionnary.
  • From Oracle 11gR1: Oracle uses SHA1 algorithm. Nowdays, SHA1 has been broken and collisions are possible. SHA1 password hash is stored in USER$.SPARE4 (the hash is the string that starts with S:) .
  • Starting with Oracle 12c, Oracle started to use SHA2(512) combined with PBKDF2 function (See. https://en.wikipedia.org/wiki/PBKDF2). At the moment I wrote this blogpost, these algorithms haven’t been broken yet. SHA2(512) password hash is stored in USER$.SPARE4 (the hash is the string that starts with T:) .

Please note, that in Oracle 12.1, an HTML digest has been used but has been now removed in Oracle 12.2. By default, Oracle 12c stores SHA1 and SHA2(512) hashes:

SQL> create user laurent identified by laurent;

User created.

SQL> select name,spare4 from user$ where name='LAURENT';

NAME                 SPARE4
-------------------- ------------------------------------------------------------
LAURENT              S:5F0BAF84F6E102D0B1B7AEA78A5C1234E39F131EEC9C5C5E6B07CF3482
                     AF;T:B340F958041950CD2977301A15A2781D5EA3F71D51E25E9432EBC78
                     D369F13D9D568AC230422FDD2E28F1360D90F360E0A52B76B6F32522EF2B
                     3C1625489A5FBC93D710F50C4F0DDE7BFB8EB4CD4C729

Postgres

In postgres, password hashes are generated with the md5 algorithm. The hashed password is store in pg_authid table (in pg_catalog), the 3 first bytes are set to ‘md5’ followed by the hash:

postgres=# create role test with encrypted password 'test' login;

postgres=# select rolname,rolpassword from pg_authid where rolname='test';
 rolname |             rolpassword
---------+-------------------------------------
 test    | md505a671c66aefea124cc08b76ea6d30bb

Please note that, even if it’s not the default, password can be stored without encryption in the catalog.

postgres=# alter role test with unencrypted password 'passwd';
ALTER ROLE
postgres=# select rolname,rolsuper,rolpassword from pg_authid where rolname='test';
 rolname | rolsuper | rolpassword
---------+----------+-------------
 test    | f        | passwd

 

Match score

Oracle wins the hashes game because Oracle 12 is the only version that offer a strong hash algorithm (SHA2(512) + PBKDF2) that hasn’t been broken yet (SHA1 and MD5 are both weak algorithms).

SALT

As a remind, a salt is a random data that is used as an additional input to a function that hashes a password. As a result, the hashed password will be different each time it will be modified, even if the same password is used.

Oracle

  • Before Oracle 11g, Oracle don’t use a real salt. the algorithm (you can find it easily on the internet) uses DES encryption and a initial key “0x0123456789ABCDEF”.
  • From Oracle 11gR1, password is concatenated with a (supposed) random 10 bytes string and hashes with the SHA1 algorithm
  • With Oracle 12c onwards, a (supposed) random salt is used too.

As a result, if we set the same password than previous one, the password hash changes:

SQL> alter user laurent identified by passwd;

User altered.

SQL> select name,spare4 from user$ where name='LAURENT';

NAME                           SPARE4
------------------------------ ------------------------------------------------------------------------------------------------------------------------
LAURENT                        S:3F2A7144E07CE3631F88D00C9851703D107A4230B3C8A820BAC49372BFE6;T:47A8D24FFC2465100B8C5097278DDDF012EFD24303680DB8C7151ED
                               54D3DE0E2BD649A3788FDB50316B843903D686034BEA6E5FC51AF914F2C30547DCF89BF507CDEB5D9B14BA70B8572C35251DCF861


SQL> alter user laurent identified by passwd;

User altered.

SQL> select name,spare4 from user$ where name='LAURENT';

NAME                           SPARE4
------------------------------ ------------------------------------------------------------------------------------------------------------------------
LAURENT                        S:6073B325C321F6CAFD97297ECC5CD7C423F3F381A1E0111400AC8BFBA87F;T:003F805AA6CA9E86393D84FB09E0B66BEBDEC3CBBA0DCC4EE59E83F
                               85E980BF86AB262D05BC4C424392383EE6FA4A0B88A5E0C7D1A76A6A4EBE1900CF5B37C5F20ED1D42F0D2FA912DD222E72E716391

Postgres

If you have a look into the postgres C code, password is hashed in the src/backend/commands/user.c file inside a function called CreateRole or AlterRole depending on what you are doing on your role.

Let’s take the AlterRole function for example. In this function, a call is made to pg_md5_encrypt function. This function code is available in the src/backend/libpq/md5.c file.

By having a look to the pg_md5_encrypt function call and how it’s designed, it’s easy to concluded that role name is used as salt.

Postgres adds a string to the password before hashing it, but it’s not random. As a consequence, if you do the same operation as I did previously for Oracle, we can see below that the password hash remains the same which is … in my opinion, weak.

postgres=# alter role test with encrypted password 'passwd';
ALTER ROLE
postgres=# select rolname,rolsuper,rolpassword from pg_authid where rolname='test';
 rolname | rolsuper |             rolpassword
---------+----------+-------------------------------------
 test    | f        | md50104560cc42670b87acd04a5bba33344
(1 row)

postgres=# alter role test with encrypted password 'passwd';
ALTER ROLE
postgres=# select rolname,rolsuper,rolpassword from pg_authid where rolname='test';
 rolname | rolsuper |             rolpassword
---------+----------+-------------------------------------
 test    | f        | md50104560cc42670b87acd04a5bba33344

 

Match score

Oracle wins the salt match because it uses what is really defined as a salt … random data.

PASSWORD LENGTH

As we all know, the more the password is long the more it is difficult to guess (specially through a brute force attack).

Oracle

Oracle have a hard limit for password length which is set to 30 characters.

Above 30 characters length, an error is thrown:

SQL> alter user laurent identified by abcdefghijklmnopqrstuvwxyz0123;

User altered.

SQL> alter user laurent identified by abcdefghijklmnopqrstuvwxyz01234;
alter user laurent identified by abcdefghijklmnopqrstuvwxyz01234
                                 *
ERROR at line 1:
ORA-00972: identifier is too long

Postgres

For postgres, I was able to set a password with unlimited length (ALTER ROLE Statement ends correctly) but I could only login the database by using a password up to 995 characters, and only by using PGPASSWORD environment variable (see below). Above 995 characters, message “LOG:  invalid message length” is logged in the server logfile (I didn’t find any reference to this limit in the code, and but it deserves another analysis to go further):

Please note that, with the psql binary, this one uses the C function named “simple_prompt” to read the password entered in the standard input file, and this function call is made with a limit of 100 characters. So if you use psql to login, you cannot directly type a password with more than 100 characters. (See. src/bin/psql/startup.c, src/port/sprompt.c).

postgres=# alter role test with encrypted password 'abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123';
ALTER ROLE
postgres=# \q


laurent@pg:~/pgsrc/postgresql-9.6.2$ export PGPASSWORD=abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123abcdefghijklmnopqrstuvwxyz0123

laurent@pg:~/pgsrc/postgresql-9.6.2$ psql -d laurent -U test
psql (9.6.2)
Type "help" for help.

laurent=>

Match score

On the field of password length, Postgres wins the game. Indeed, with PG you can set very long password and hacking a password can become a very long task to do (but be careful with the client tools you use). But, above 100 characters length, login to a postgres becomes painful depending on the client you use.

With Oracle or with postgres, the best thing to do is to use long password that contain all kind of characters (upper and lower case, number, special characters).

CASE SENSITIVITY

In this field, (modern) Oracle and Postgres are using case sensitive password.

But with Oracle

  • 10g
  • 11g with sec_case_sensitive_logon parameter set to FALSE
  • 12c with SQLNET.ALLOWED_LOGON_VERSION_SERVER set 8, 9, 10 or 11.

Passwords become case unsensitive which is not very secured.

 

Oracle ACE: First year in the program

Yesterday, I celebrated my first year in the Oracle ACE Program. I have been nominated by Mohamed Houri and supported by very well known and brilliant people in the Oracle Community (Kellyn Potvin Gorman, Tanel Poder, Deiby Gomez, Franck Pachot and Ludovico Caldara), and during this last year, I had to participate actively in the community.

If you’re reading this post and you are interested by Oracle Technology, I will try to explain why joining (or participating) to it is a good thing for everyone, you, Oracle, and all the Oracle professionals.

 The ACE Program

Oracle ACE program recognizes all the people who are working with Oracle products (and not specially with their badass Cloud platform !!), who talk about their experiences and knowledges, who give regular help to customers across forums, blogs etc.

As Tim Hall mentioned it in a post I read a couple of months ago, you don’t have to be a fellow expert on a technology to be recognized in this program. And even, Oracle want to use this program to promote their badass Cloud Platform, you don’t have to talk about it, in all what you write.

In the program, there are 3 levels: ACE Associate, ACE, and ACE Director. Explanation of these levels are given in the ACE Program website (http://www.oracle.com/technetwork/community/oracle-ace/become-an-ace/index.html#acelevels).

Recently I have validated my award and I will probably remain in the program for the next year.

Why to join (or participate) to the Oracle Community?

First of all, you can participate to the Oracle Community without being an ACE.

You can participate to the community by many ways:

  • you can blog on your experiences and knowledge
  • you can test or review some specific products in the Oracle products’ portfolio
  • you can write articles and submit them in OTN or Oracle related papers (OracleScene from UKOUG for example)
  • you can tweet about the technology
  • you can help other users in Oracle forums
  • You can organize meetups or public meetings
  • You can create and/or participate to one or many Oracle User Groups
  • You can present some technical stuff during Oracle International Conferences or meetups
  • etc etc.

In my case, I like to attend Oracle Conferences because I like to present session during these ones, but the most interesting thing is to share time, technical opinions (and beers) with speakers, blogger, fellow experts etc. (I’m not a big fan of these marketing events where sales and marketing people are the kings because they sell or promote Oracle products, and you suck because you are just a simple technical guy).

During these technical conferences, you usually attend sessions (sometimes up to 6 per day) and those ones are always interesting because you chose them before ! You learn a lot about other projects that have been done in other countries etc. To sum up … you learn a lot, surrounded by the most brilliant people in the Oracle world !!

Then I like to organize meetup in my area to present some of my work and to invite international speakers that want to present their work, we talk about our experiences, we drink some great belgian and french beers and share some pizzas … like at home, the only difference is that we are not watching a football game but we are talking about Oracle products !

Another interesting thing, but a little more under the hood, is to interact with people by using modern social tools (twitter, blogs, and sometimes email). Recently, I worked with Pieter Van Puymbroeck (@vanpupi). First he asked me to borrow a java source code I wrote a couple of years ago. No matter with that, but after some code enhancements (yes, my code was a bit crappy ! 🙂 ), he encountered some problems to reproduce what I did. I was my great pleasure to work on the code with him and to review his platform (and mine too) to discover finally where the problem was. What a great experience to share knowledge with other people !

 

Now I start my second year in the program and I hope it will be as rich as the first one of knowledge sharing. And I hope I will continue to learn about technologies with other people.

Get the min and max value of your In Memory Storage Indexes

Yesterday, I read a blog post from Maria Colgan (https://sqlmaria.com/2017/02/21/oracle-storage-index/) who described why you will not always see the benefits of In memory Storage Indexes.

I won’t re-write Maria’s post but to sum up, Oracle doesn’t sort the data to build IMCUs, as a result min and max values of the storage indexes are not very selective. I’ve explained this in my SIMD related presentation (available here: https://www.slideshare.net/lolo115/ukoug15-simd-outside-and-inside-oracle-12c-12102, See. Slides 22 and 23).

To go further, we can find a bunch of views, oops a bunch of undocumented views, related to IM segments, Compression units and SMUs.  (See doc bug in MOS: Bug 19361690 : SEVERAL V$ VIEWS FOR INMEMORY ARE NOT DOCUMENTED, Bug doesn’t seem to be fixed in 12.2)

If you have a look at these views, they contains a lot of very interesting things, and specially the min and max values for every IMCUs and columns inside.

In the below example, I took the same kind of example that Maria described.

First, I created two tables loaded in the IM store. Those tables are based on SH.SALES table, the first one is not sorted, the second is ordered by AMOUNT_SOLD :

SQL> create table s inmemory no memcompress priority critical as select * from sh.sales;

SQL> create table s2 inmemory no memcompress priority critical as select * from sh.sales order by amount_sold;

SQL> @IM_seg
Enter value for owner: LAURENT
Enter value for segment_name:

OWNER       SEGMENT_NAME    PARTITION_NAM TABLESPACE_NAME INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED POPULATE_ INMEMORY INMEMORY_COMPRESS
----------- --------------- ------------- --------------- ------------- ---------- ------------------- --------- -------- -----------------
LAURENT     S                             USERS                28966912   37748736                   0 COMPLETED CRITICAL NO MEMCOMPRESS
LAURENT     S2                            USERS                28966912   37748736                   0 COMPLETED CRITICAL NO MEMCOMPRESS
                                                          -------------
sum                                                            57933824

Then, I used a script of mine based on one of these undocumented stuff, the V$IM_COL_CU view that gives us min and max values for each column in the IMCUs.

SQL> @IM_IMCU_stats
Enter value for tab_owner: LAURENT
Enter value for tab_name: S
Enter value for column_name: AMOUNT_SOLD

IMCU_ADDR                  OBJD ONAME                COLUMN_NAME                    DATA_TYPE  DICTIONARY_ENTRIES MIN_VAL         MAX_VAL
-------------------- ---------- -------------------- ------------------------------ ---------- ------------------ --------------- ---------------
00000000610FFF70          94467 S                    AMOUNT_SOLD                    NUMBER                      0 7.14            1520.39
00000000613FFF70          94467 S                    AMOUNT_SOLD                    NUMBER                      0 7.22            1533.38
00000000616FFF70          94467 S                    AMOUNT_SOLD                    NUMBER                      0 6.83            1566.01
00000000619FFF70          94467 S                    AMOUNT_SOLD                    NUMBER                      0 6.4             1598.63
0000000061CFFF70          94467 S                    AMOUNT_SOLD                    NUMBER                      0 6.54            1738.43
0000000076000000          94467 S                    AMOUNT_SOLD                    NUMBER                      0 6.54            1738.43
0000000076300000          94467 S                    AMOUNT_SOLD                    NUMBER                      0 6.54            1782.72
0000000076600000          94467 S                    AMOUNT_SOLD                    NUMBER                      0 7.13            1782.72
0000000076900000          94467 S                    AMOUNT_SOLD                    NUMBER                      0 7.13            1753.2

9 rows selected.

-- -------------------------------------

SQL> @IM_IMCU_stats
Enter value for tab_owner: LAURENT
Enter value for tab_name: S2
Enter value for column_name: AMOUNT_SOLD

IMCU_ADDR                  OBJD ONAME                COLUMN_NAME                    DATA_TYPE  DICTIONARY_ENTRIES MIN_VAL         MAX_VAL
-------------------- ---------- -------------------- ------------------------------ ---------- ------------------ --------------- ---------------
0000000071FFFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 481             1782.72
00000000721FFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 69.08           481
00000000724FFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 51.43           69.08
00000000727FFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 46.31           51.43
0000000072AFFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 32.35           46.31
0000000072DFFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 24.24           32.35
00000000730FFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 17.79           24.24
00000000733FFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 10.79           17.79
00000000736FFFE8          94469 S2                   AMOUNT_SOLD                    NUMBER                      0 6.4             10.79

9 rows selected.

We can clearly see that S2 table has been ordered by AMOUNT_SOLD, and if I count the number of lines with a value of AMOUNT_SOLD equal to 20, S table (unordered) will read all IMCUs (9), a query againt S2 will prune 8 of the 9 IMCUs.

SQL> @IM_sesstat

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
CPU used by this session                                                  3
IM scan CUs no memcompress                                                0
IM scan CUs memcompress for dml                                           0
IM scan CUs memcompress for query low                                     0
IM scan CUs memcompress for query high                                    0
IM scan CUs memcompress for capacity low                                  0
IM scan CUs memcompress for capacity high                                 0
IM scan CUs columns accessed                                              0
IM scan CUs columns decompressed                                          0
IM scan CUs columns theoretical max                                       0
IM scan rows                                                              0
IM scan rows valid                                                        0
IM scan rows optimized                                                    0
IM scan rows projected                                                    0
IM scan CUs split pieces                                                  0
IM scan CUs pruned                                                        0

16 rows selected.

SQL> select /*+ INMEMORY */ count(*) from s where amount_sold=20;

  COUNT(*)
----------
       140

1 row selected.

SQL> @IM_sesstat

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
CPU used by this session                                                 20
IM scan CUs no memcompress                                                9
IM scan CUs memcompress for dml                                           0
IM scan CUs memcompress for query low                                     0
IM scan CUs memcompress for query high                                    0
IM scan CUs memcompress for capacity low                                  0
IM scan CUs memcompress for capacity high                                 0
IM scan CUs columns accessed                                              9
IM scan CUs columns decompressed                                          0
IM scan CUs columns theoretical max                                      63
IM scan rows                                                         918843
IM scan rows valid                                                   918843
IM scan rows optimized                                                    0
IM scan rows projected                                                  140
IM scan CUs split pieces                                                  9
IM scan CUs pruned                                                        0

16 rows selected.

-- ---------------------------------------------

SQL> @IM_sesstat

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
CPU used by this session                                                  1
IM scan CUs no memcompress                                                0
IM scan CUs memcompress for dml                                           0
IM scan CUs memcompress for query low                                     0
IM scan CUs memcompress for query high                                    0
IM scan CUs memcompress for capacity low                                  0
IM scan CUs memcompress for capacity high                                 0
IM scan CUs columns accessed                                              0
IM scan CUs columns decompressed                                          0
IM scan CUs columns theoretical max                                       0
IM scan rows                                                              0
IM scan rows valid                                                        0
IM scan rows optimized                                                    0
IM scan rows projected                                                    0
IM scan CUs split pieces                                                  0
IM scan CUs pruned                                                        0

16 rows selected.

SQL> select /*+ INMEMORY */ count(*) from s2 where amount_sold=20;

  COUNT(*)
----------
       140

SQL> @IM_sesstat

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
CPU used by this session                                                  2
IM scan CUs no memcompress                                                9
IM scan CUs memcompress for dml                                           0
IM scan CUs memcompress for query low                                     0
IM scan CUs memcompress for query high                                    0
IM scan CUs memcompress for capacity low                                  0
IM scan CUs memcompress for capacity high                                 0
IM scan CUs columns accessed                                              1
IM scan CUs columns decompressed                                          0
IM scan CUs columns theoretical max                                      63
IM scan rows                                                         918843
IM scan rows valid                                                   103923
IM scan rows optimized                                               814920
IM scan rows projected                                                  140
IM scan CUs split pieces                                                  9
IM scan CUs pruned                                                        8

16 rows selected.

 

All the scripts I used in the post, including the one used to see storage indexes, are available here:

That’s all for today !! 🙂

 

Install and configure DTrace on Oracle Linux

Dtrace is one of the best tool to perform dynamic tracing of program execution.

Dtrace has been initially released on Solaris and now it’s ported on Linux.

In this post, I will describe very shortly how to install and configure Dtrace port on an Oracle Linux 6 box with UEK4 Kernel.

First, download dtrace-util and dtrace-util-devel packages. These packages are available at this URL : http://www.oracle.com/technetwork/server-storage/linux/downloads/linux-dtrace-2800968.html. You just have to download the correct releases depending on your UEK kernel version, and your Oracle Linux Distribution.

In my case, I chose “DTrace utilities, Oracle Linux 6 (x86_64)” for UEK4 kernel.

[root@oel6 dtrace]# ls
dtrace-utils-0.5.1-3.el6.x86_64.rpm  dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm

Then, I used yum to install both packages, but before you have to configure (if not already done) your yum repo for UEK4 :

[public_ol6_UEKR4]
name=Latest Unbreakable Enterprise Kernel Release 4 for Oracle Linux $releasever ($basearch)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/UEKR4/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=$uekr4

Now, install the packages:

[root@oel6 dtrace]# yum localinstall dtrace-utils-*
Loaded plugins: auto-update-debuginfo, refresh-packagekit, security, ulninfo
Setting up Local Package Process
Examining dtrace-utils-0.5.1-3.el6.x86_64.rpm: dtrace-utils-0.5.1-3.el6.x86_64
Marking dtrace-utils-0.5.1-3.el6.x86_64.rpm to be installed
Examining dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm: dtrace-utils-devel-0.5.1-3.el6.x86_64
Marking dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package dtrace-utils.x86_64 0:0.5.1-3.el6 will be installed
--> Processing Dependency: dtrace-modules-shared-headers for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf.so.1(LIBDTRACE_CTF_1.0)(64bit) for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf.so.1()(64bit) for package: dtrace-utils-0.5.1-3.el6.x86_64
---> Package dtrace-utils-devel.x86_64 0:0.5.1-3.el6 will be installed
--> Processing Dependency: libdtrace-ctf-devel > 0.4.0 for package: dtrace-utils-devel-0.5.1-3.el6.x86_64
--> Running transaction check
---> Package dtrace-modules-shared-headers.x86_64 0:0.5.3-2.el6 will be installed
---> Package libdtrace-ctf.x86_64 0:0.5.0-3.el6 will be installed
---> Package libdtrace-ctf-devel.x86_64 0:0.5.0-3.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================================================================================================================
 Package                                                           Arch                                       Version                                            Repository                                                                  Size
==================================================================================================================================================================================================================================================
Installing:
 dtrace-utils                                                      x86_64                                     0.5.1-3.el6                                        /dtrace-utils-0.5.1-3.el6.x86_64                                           786 k
 dtrace-utils-devel                                                x86_64                                     0.5.1-3.el6                                        /dtrace-utils-devel-0.5.1-3.el6.x86_64                                      76 k
Installing for dependencies:
 dtrace-modules-shared-headers                                     x86_64                                     0.5.3-2.el6                                        public_ol6_UEKR4                                                            30 k
 libdtrace-ctf                                                     x86_64                                     0.5.0-3.el6                                        public_ol6_UEKR4                                                            28 k
 libdtrace-ctf-devel                                               x86_64                                     0.5.0-3.el6                                        public_ol6_UEKR4                                                            15 k

Transaction Summary
==================================================================================================================================================================================================================================================
Install       5 Package(s)

Total size: 935 k
Total download size: 73 k
Installed size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/3): dtrace-modules-shared-headers-0.5.3-2.el6.x86_64.rpm                                                                                                                                                                |  30 kB     00:00
(2/3): libdtrace-ctf-0.5.0-3.el6.x86_64.rpm                                                                                                                                                                                |  28 kB     00:00
(3/3): libdtrace-ctf-devel-0.5.0-3.el6.x86_64.rpm  
.../...

 

Installing those packages is not sufficient, you have to install a package containing the kernel modules for dtrace, and as the version of this package depends on your kernel, you have to run the yum command below:

[root@oel6 dtrace]# yum install dtrace-modules-`uname -r`
Loaded plugins: auto-update-debuginfo, refresh-packagekit, security, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package dtrace-modules-4.1.12-37.5.1.el6uek.x86_64 0:0.5.2-1.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================================================================================================================
 Package                                                                     Arch                                           Version                                                Repository                                                Size
==================================================================================================================================================================================================================================================
Installing:
 dtrace-modules-4.1.12-37.5.1.el6uek                                         x86_64                                         0.5.2-1.el6                                            public_ol6_UEKR4                                         1.2 M

Transaction Summary
==================================================================================================================================================================================================================================================
Install       1 Package(s)

Total download size: 1.2 M
Installed size: 6.1 M
Is this ok [y/N]: y

Once the packages are installed, you have to load a bunch of modules into the kernel.

You can do that manually by running the command below:

[root@oel6 dtrace]# modprobe -a dtrace profile systrace sdt dt_test

Or, you can configure your Linux box to load those module during startup. In my case, as I run that in a OL6 box, I configured a file in /etc/sysconfig/modules/dtrace.modules and change its permissions.

[root@oel6 dtrace]# cat > /etc/sysconfig/modules/dtrace.modules
#!/bin/sh

if [ ! -c /dev/dtrace/dtrace ] ; then
        exec /sbin/modprobe -a dtrace profile systrace sdt dt_test  >/dev/null 2>&1
fi

[root@oel6 dtrace]# chmod 755 /etc/sysconfig/modules/dtrace.modules

After a reboot, dtrace runs fine with the root user:

[root@oel6 dtrace]# dtrace -l | wc -l
670

 
But not with the oracle user:

[oracle@oel6 ~]$ dtrace -l
dtrace: failed to initialize dtrace: DTrace requires additional privileges

To fix that, we have to set two small tricks:
1) Create a dtrace unix group and assign this group to the oracle user (or any user you want to grant dtrace utilization)

[root@oel6 ~]# id -a oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),48(apache),54322(dba)
[root@oel6 ~]# groupadd dtrace
[root@oel6 ~]# usermod -a -G dtrace oracle
[root@oel6 ~]# id -a oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),48(apache),54322(dba),54325(dtrace)

2) Configure the /dev/dtrace/dtrace Unix device to have the correct group ownership:

[root@oel6 ~]# cat /etc/udev/rules.d/10-dtrace.rules
kernel=="dtrace/dtrace", GROUP="dtrace" MODE="0660"

After a last reboot, that works fine even for my oracle user, and I can trace pmon and every process of my Oracle instances:

[oracle@oel6 ~]$ dtrace -l | wc -l
670

[oracle@oel6 ~]$ ps -ef | grep pmon
oracle    3436     1  0 18:16 ?        00:00:00 ora_pmon_orcl11
oracle    3519  3381  0 18:16 pts/0    00:00:00 grep pmon

[oracle@oel6 ~]$ cat test.d
#!/usr/sbin/dtrace -qs
syscall:::entry
/pid == $1/
{
  @num[probefunc] = count();
}

[oracle@oel6 ~]$ ./test.d 3436


  mmap                                                              1
  munmap                                                            1
  newfstat                                                          1
  getrusage                                                         4
  poll                                                              4
  times                                                             6
  close                                                            19
  open                                                             19
  read                                                             19

That’s all for today 🙂 .