Oracle … as usual

Oracle by Laurent Leturgez

Category Archives: Linux

Install a Standalone Spark Environment on Oracle Linux 7

Spark is one of the most trendy project in the Apache Fundation.

From now, I usually used it directly on hadoop clusters, but each time I had to play with spark without the need of a complete hadoop cluster, or to test some basic pieces of code … It became hard to do it, specially on my laptop !!! Running a 3 node CDH cluster on your laptop requires CPU and memory !

So in this post, I decided to write how you can setup a small linux virtual machine, and install the last spark version in standalone mode.

First, of all, you need a fully operating linux box … I chose an Oracle Enterprise linux 7.4 one with  3.8.13-118 UEK kernel.

[spark@spark ~]$ sudo uname -r
3.8.13-118.19.4.el7uek.x86_64

Once installed and configured, you need to install java. In my case, I’ve installed a jdk8 SE:

[spark@spark ~]$ sudo yum localinstall /home/spark/jdk-8u121-linux-x64.rpm -y
[spark@spark ~]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Then, create all the required directories for Spark installation and download sources (If you need another version of Spark, you will find following this URL: https://spark.apache.org/downloads.html) :

[spark@spark ~]$ sudo mkdir /usr/local/share/spark
[spark@spark ~]$ sudo chown spark:spark /usr/local/share/spark
[spark@spark ~]$ curl -O https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
[spark@spark ~]$ tar -xvzf spark-2.2.0.tgz -C /usr/local/share/spark/
[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/

If you are behind a proxy server, you have to create a settings.xml file in $HOME/.m2 directory (you’ll probably have to create it). You have to do it, even if you have set http_proxy variable in your environment (beause maven, which is used during the installation process will use it).

Below, you’ll see what my settings.xml file looks like:

[spark@spark ~]$ cat ~/.m2/settings.xml
<settings>
 <proxies>
 <proxy>
 <id>example-proxy</id>
 <active>true</active>
 <protocol>http</protocol>
 <host>10.239.9.20</host>
 <port>80</port>
 </proxy>
 </proxies>
</settings>

Then, you are ready to configure MAVEN environment and launch the installation process:

[spark@spark ~]$ cd /usr/local/share/spark/spark-2.2.0/
[spark@spark spark-2.2.0]$ export MAVEN_OPTS=-Xmx2g -XX:ReservedCodeCacheSize=512m
[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

At the end of the process, a summary report is printed.

[spark@spark spark-2.2.0]$ ./build/mvn -DskipTests clean package

.../...

[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0.jar with /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-shaded.jar
[INFO] Dependency-reduced POM written at: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/dependency-reduced-pom.xml
[INFO]
[INFO] --- maven-source-plugin:3.0.1:jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-sources.jar
[INFO]
[INFO] --- maven-source-plugin:3.0.1:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-0-10-assembly_2.11 ---
[INFO] Building jar: /usr/local/share/spark/spark-2.2.0/external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.0-test-sources.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [01:04 min]
[INFO] Spark Project Tags ................................. SUCCESS [ 26.598 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 6.316 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.129 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.836 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 9.039 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 21.286 s]
[INFO] Spark Project Core ................................. SUCCESS [02:24 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 20.021 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 13.117 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 33.581 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:22 min]
[INFO] Spark Project SQL .................................. SUCCESS [02:56 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:08 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 3.084 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 51.106 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 4.365 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 2.109 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 8.062 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 9.350 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 2.087 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 12.043 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 12.758 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 19.236 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 5.637 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 9.345 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 3.909 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:54 min
[INFO] Finished at: 2017-09-14T12:22:31+02:00
[INFO] Final Memory: 86M/896M
[INFO] ------------------------------------------------------------------------

At this step, if you run some scripts, you’ll throw an error because, even if you have installed spark in standalone, you need hadoop librairies.

It’s an easy thing to do, we just have to download hadoop and configure our environment that way (Please download the hadoop version you need, I chose 2.8 which is the latest stable version for hadoop2, I didn’t make the test with hadoop3 as it’s still in beta):

[spark@spark ~]$ cd /usr/local/share/
[spark@spark share]$ sudo mkdir hadoop
[spark@spark share]$ sudo chown spark:spark hadoop/
[spark@spark share]$ cd hadoop/
[spark@spark hadoop]$ curl -O http://apache.mirrors.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ tar -xzf hadoop-2.8.1.tar.gz
[spark@spark hadoop]$ cat >> ~/.bashrc
export HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}
export SPARK_HOME=/usr/local/share/spark/spark-2.2.0
export PATH=${SPARK_HOME}/bin:${PATH}
[spark@spark hadoop]$ . ~/.bashrc
[spark@spark hadoop]$ env | egrep 'HADOOP|PATH|SPARK'
SPARK_HOME=/usr/local/share/spark/spark-2.2.0
HADOOP_HOME=/usr/local/share/hadoop/hadoop-2.8.1
LD_LIBRARY_PATH=/usr/local/share/hadoop/hadoop-2.8.1/lib/native:/usr/local/share/hadoop/hadoop-2.8.1/lib/native:
PATH=/usr/local/share/spark/spark-2.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/spark/.local/bin:/home/spark/bin

Now, we can run the SparkPi example:

[spark@spark ~]$ run-example SparkPi 500
Pi is roughly 3.141360702827214

Note: If you want to remove all those crappy INFO messages in the output, run the command below to configure log4j properties:

[spark@spark hadoop]$ cd $SPARK_HOME/conf
[spark@spark conf]$ sed 's/log4j\.rootCategory=INFO, console/log4j\.rootCategory=WARN, console/g' log4j.properties.template > log4j.properties

 

That’s done, now you’re ready to run your code on spark. Below, I wrote a sample code written in scala to create a dataframe from an oracle JDBC datasource,  and run a groupby function on it.

[spark@spark ~]$ spark-shell --driver-class-path ojdbc7.jar --jars ojdbc7.jar
Spark context Web UI available at http://192.168.99.14:4040
Spark context available as 'sc' (master = local[*], app id = local-1505397247969).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :load jdbc_sample.scala
Loading jdbc_sample.scala...
import java.util.Properties
connProps: java.util.Properties = {}
res0: Object = null
res1: Object = null
df: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(6,0), PROD_NAME: string ... 20 more fields]

scala> df.printSchema
root
 |-- PROD_ID: decimal(6,0) (nullable = false)
 |-- PROD_NAME: string (nullable = false)
 |-- PROD_DESC: string (nullable = false)
 |-- PROD_SUBCATEGORY: string (nullable = false)
 |-- PROD_SUBCATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_SUBCATEGORY_DESC: string (nullable = false)
 |-- PROD_CATEGORY: string (nullable = false)
 |-- PROD_CATEGORY_ID: decimal(38,10) (nullable = false)
 |-- PROD_CATEGORY_DESC: string (nullable = false)
 |-- PROD_WEIGHT_CLASS: decimal(3,0) (nullable = false)
 |-- PROD_UNIT_OF_MEASURE: string (nullable = true)
 |-- PROD_PACK_SIZE: string (nullable = false)
 |-- SUPPLIER_ID: decimal(6,0) (nullable = false)
 |-- PROD_STATUS: string (nullable = false)
 |-- PROD_LIST_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_MIN_PRICE: decimal(8,2) (nullable = false)
 |-- PROD_TOTAL: string (nullable = false)
 |-- PROD_TOTAL_ID: decimal(38,10) (nullable = false)
 |-- PROD_SRC_ID: decimal(38,10) (nullable = true)
 |-- PROD_EFF_FROM: timestamp (nullable = true)
 |-- PROD_EFF_TO: timestamp (nullable = true)
 |-- PROD_VALID: string (nullable = true)

scala> df.groupBy("PROD_CATEGORY").count.show
+--------------------+-----+
|       PROD_CATEGORY|count|
+--------------------+-----+
|      Software/Other|   26|
|               Photo|   10|
|         Electronics|   13|
|Peripherals and A...|   21|
|            Hardware|    2|
+--------------------+-----+

And … that’s it … have fun with Spark 😉

 

Advertisements

Install and configure DTrace on Oracle Linux

Dtrace is one of the best tool to perform dynamic tracing of program execution.

Dtrace has been initially released on Solaris and now it’s ported on Linux.

In this post, I will describe very shortly how to install and configure Dtrace port on an Oracle Linux 6 box with UEK4 Kernel.

First, download dtrace-util and dtrace-util-devel packages. These packages are available at this URL : http://www.oracle.com/technetwork/server-storage/linux/downloads/linux-dtrace-2800968.html. You just have to download the correct releases depending on your UEK kernel version, and your Oracle Linux Distribution.

In my case, I chose “DTrace utilities, Oracle Linux 6 (x86_64)” for UEK4 kernel.

[root@oel6 dtrace]# ls
dtrace-utils-0.5.1-3.el6.x86_64.rpm  dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm

Then, I used yum to install both packages, but before you have to configure (if not already done) your yum repo for UEK4 :

[public_ol6_UEKR4]
name=Latest Unbreakable Enterprise Kernel Release 4 for Oracle Linux $releasever ($basearch)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/UEKR4/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=$uekr4

Now, install the packages:

[root@oel6 dtrace]# yum localinstall dtrace-utils-*
Loaded plugins: auto-update-debuginfo, refresh-packagekit, security, ulninfo
Setting up Local Package Process
Examining dtrace-utils-0.5.1-3.el6.x86_64.rpm: dtrace-utils-0.5.1-3.el6.x86_64
Marking dtrace-utils-0.5.1-3.el6.x86_64.rpm to be installed
Examining dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm: dtrace-utils-devel-0.5.1-3.el6.x86_64
Marking dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package dtrace-utils.x86_64 0:0.5.1-3.el6 will be installed
--> Processing Dependency: dtrace-modules-shared-headers for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf.so.1(LIBDTRACE_CTF_1.0)(64bit) for package: dtrace-utils-0.5.1-3.el6.x86_64
--> Processing Dependency: libdtrace-ctf.so.1()(64bit) for package: dtrace-utils-0.5.1-3.el6.x86_64
---> Package dtrace-utils-devel.x86_64 0:0.5.1-3.el6 will be installed
--> Processing Dependency: libdtrace-ctf-devel > 0.4.0 for package: dtrace-utils-devel-0.5.1-3.el6.x86_64
--> Running transaction check
---> Package dtrace-modules-shared-headers.x86_64 0:0.5.3-2.el6 will be installed
---> Package libdtrace-ctf.x86_64 0:0.5.0-3.el6 will be installed
---> Package libdtrace-ctf-devel.x86_64 0:0.5.0-3.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================================================================================================================
 Package                                                           Arch                                       Version                                            Repository                                                                  Size
==================================================================================================================================================================================================================================================
Installing:
 dtrace-utils                                                      x86_64                                     0.5.1-3.el6                                        /dtrace-utils-0.5.1-3.el6.x86_64                                           786 k
 dtrace-utils-devel                                                x86_64                                     0.5.1-3.el6                                        /dtrace-utils-devel-0.5.1-3.el6.x86_64                                      76 k
Installing for dependencies:
 dtrace-modules-shared-headers                                     x86_64                                     0.5.3-2.el6                                        public_ol6_UEKR4                                                            30 k
 libdtrace-ctf                                                     x86_64                                     0.5.0-3.el6                                        public_ol6_UEKR4                                                            28 k
 libdtrace-ctf-devel                                               x86_64                                     0.5.0-3.el6                                        public_ol6_UEKR4                                                            15 k

Transaction Summary
==================================================================================================================================================================================================================================================
Install       5 Package(s)

Total size: 935 k
Total download size: 73 k
Installed size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/3): dtrace-modules-shared-headers-0.5.3-2.el6.x86_64.rpm                                                                                                                                                                |  30 kB     00:00
(2/3): libdtrace-ctf-0.5.0-3.el6.x86_64.rpm                                                                                                                                                                                |  28 kB     00:00
(3/3): libdtrace-ctf-devel-0.5.0-3.el6.x86_64.rpm  
.../...

 

Installing those packages is not sufficient, you have to install a package containing the kernel modules for dtrace, and as the version of this package depends on your kernel, you have to run the yum command below:

[root@oel6 dtrace]# yum install dtrace-modules-`uname -r`
Loaded plugins: auto-update-debuginfo, refresh-packagekit, security, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package dtrace-modules-4.1.12-37.5.1.el6uek.x86_64 0:0.5.2-1.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================================================================================================================
 Package                                                                     Arch                                           Version                                                Repository                                                Size
==================================================================================================================================================================================================================================================
Installing:
 dtrace-modules-4.1.12-37.5.1.el6uek                                         x86_64                                         0.5.2-1.el6                                            public_ol6_UEKR4                                         1.2 M

Transaction Summary
==================================================================================================================================================================================================================================================
Install       1 Package(s)

Total download size: 1.2 M
Installed size: 6.1 M
Is this ok [y/N]: y

Once the packages are installed, you have to load a bunch of modules into the kernel.

You can do that manually by running the command below:

[root@oel6 dtrace]# modprobe -a dtrace profile systrace sdt dt_test

Or, you can configure your Linux box to load those module during startup. In my case, as I run that in a OL6 box, I configured a file in /etc/sysconfig/modules/dtrace.modules and change its permissions.

[root@oel6 dtrace]# cat > /etc/sysconfig/modules/dtrace.modules
#!/bin/sh

if [ ! -c /dev/dtrace/dtrace ] ; then
        exec /sbin/modprobe -a dtrace profile systrace sdt dt_test  >/dev/null 2>&1
fi

[root@oel6 dtrace]# chmod 755 /etc/sysconfig/modules/dtrace.modules

After a reboot, dtrace runs fine with the root user:

[root@oel6 dtrace]# dtrace -l | wc -l
670

 
But not with the oracle user:

[oracle@oel6 ~]$ dtrace -l
dtrace: failed to initialize dtrace: DTrace requires additional privileges

To fix that, we have to set two small tricks:
1) Create a dtrace unix group and assign this group to the oracle user (or any user you want to grant dtrace utilization)

[root@oel6 ~]# id -a oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),48(apache),54322(dba)
[root@oel6 ~]# groupadd dtrace
[root@oel6 ~]# usermod -a -G dtrace oracle
[root@oel6 ~]# id -a oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),48(apache),54322(dba),54325(dtrace)

2) Configure the /dev/dtrace/dtrace Unix device to have the correct group ownership:

[root@oel6 ~]# cat /etc/udev/rules.d/10-dtrace.rules
kernel=="dtrace/dtrace", GROUP="dtrace" MODE="0660"

After a last reboot, that works fine even for my oracle user, and I can trace pmon and every process of my Oracle instances:

[oracle@oel6 ~]$ dtrace -l | wc -l
670

[oracle@oel6 ~]$ ps -ef | grep pmon
oracle    3436     1  0 18:16 ?        00:00:00 ora_pmon_orcl11
oracle    3519  3381  0 18:16 pts/0    00:00:00 grep pmon

[oracle@oel6 ~]$ cat test.d
#!/usr/sbin/dtrace -qs
syscall:::entry
/pid == $1/
{
  @num[probefunc] = count();
}

[oracle@oel6 ~]$ ./test.d 3436


  mmap                                                              1
  munmap                                                            1
  newfstat                                                          1
  getrusage                                                         4
  poll                                                              4
  times                                                             6
  close                                                            19
  open                                                             19
  read                                                             19

That’s all for today 🙂 .
 

Linux monitoring of oracle 12c multi-threaded instances

Oracle 12c comes with a new feature: multithreaded server. In summary, main processes like real time scheduled processes (vktm, lms), or main processes like pmon or dbwn continue to run as processes. For other ones (lgwr, mmon, server processes etc.), they run now in a thread.

This feature has been developed to optimize oracle to be run on new processors with many core and many threads per core (for example SPARC T Processors), but the DBA will have to change many methods he use to analyze problems in a multi-threaded server.

If for some problems, you usually analyze the OS side, top, ps, and other tools have to be used in a different way. Let’s see different tools that can be used to analyze processes and thread in linux (Tools mentioned here has been tested with Oracle Enterprise Linux 6).

For all example above, I used an orcl instance which run in a multi-threaded configuration

  • ps

If I run a simple ps under my config, there are only 6 processes:

[oracle@oel64-12c ~]$ ps -ef | grep [o]rcl
oracle    9871     1  0 21:02 ?        00:00:00 ora_pmon_orcl
oracle    9873     1  0 21:02 ?        00:00:00 ora_psp0_orcl
oracle    9878     1  5 21:02 ?        00:01:35 ora_vktm_orcl
oracle    9882     1  0 21:02 ?        00:00:02 ora_u004_orcl
oracle    9888     1  0 21:02 ?        00:00:11 ora_u005_orcl
oracle    9894     1  0 21:02 ?        00:00:00 ora_dbw0_orcl
 If I want to print all threads that run in these processes, I can run this command:
[oracle@oel64-12c ~]$ ps -eLo pid,pcpu,tid,user,comm,cmd | sed -n -e '1p' -e '/orcl/p'
  PID %CPU   TID USER     COMMAND         CMD
 9871  0.0  9871 oracle   ora_pmon_orcl   ora_pmon_orcl
 9873  0.0  9873 oracle   ora_psp0_orcl   ora_psp0_orcl
 9878  5.2  9878 oracle   ora_vktm_orcl   ora_vktm_orcl
 9882  0.0  9882 oracle   ora_scmn_orcl   ora_u004_orcl
 9882  0.0  9883 oracle   oracle          ora_u004_orcl
 9882  0.0  9884 oracle   ora_gen0_orcl   ora_u004_orcl
 9882  0.0  9885 oracle   ora_mman_orcl   ora_u004_orcl
 9882  0.0  9891 oracle   ora_dbrm_orcl   ora_u004_orcl
 9882  0.0  9895 oracle   ora_lgwr_orcl   ora_u004_orcl
 9882  0.0  9896 oracle   ora_ckpt_orcl   ora_u004_orcl
 9882  0.0  9897 oracle   ora_lg00_orcl   ora_u004_orcl
 9882  0.0  9898 oracle   ora_lg01_orcl   ora_u004_orcl
 9882  0.0  9899 oracle   ora_smon_orcl   ora_u004_orcl
 9882  0.0  9901 oracle   ora_lreg_orcl   ora_u004_orcl
 9888  0.0  9888 oracle   ora_scmn_orcl   ora_u005_orcl
 9888  0.0  9889 oracle   oracle          ora_u005_orcl
 9888  0.0  9890 oracle   ora_diag_orcl   ora_u005_orcl
 9888  0.0  9892 oracle   ora_dia0_orcl   ora_u005_orcl
 9888  0.0  9900 oracle   ora_reco_orcl   ora_u005_orcl
 9888  0.0  9902 oracle   ora_mmon_orcl   ora_u005_orcl
 9888  0.0  9903 oracle   ora_mmnl_orcl   ora_u005_orcl
 9888  0.0  9904 oracle   ora_d000_orcl   ora_u005_orcl
 9888  0.0  9905 oracle   ora_s000_orcl   ora_u005_orcl
 9888  0.0  9906 oracle   ora_n000_orcl   ora_u005_orcl
 9888  1.3  9931 oracle   oracle_9931_orc ora_u005_orcl
 9888  0.0  9932 oracle   ora_tmon_orcl   ora_u005_orcl
 9888  0.0  9933 oracle   ora_tt00_orcl   ora_u005_orcl
 9888  0.0  9934 oracle   ora_smco_orcl   ora_u005_orcl
 9888  0.0  9938 oracle   ora_fbda_orcl   ora_u005_orcl
 9888  0.0  9939 oracle   ora_aqpc_orcl   ora_u005_orcl
 9888  0.0  9944 oracle   ora_p000_orcl   ora_u005_orcl
 9888  0.0  9945 oracle   ora_p001_orcl   ora_u005_orcl
 9888  0.0  9946 oracle   ora_p002_orcl   ora_u005_orcl
 9888  0.0  9947 oracle   ora_p003_orcl   ora_u005_orcl
 9888  0.0  9948 oracle   ora_p004_orcl   ora_u005_orcl
 9888  0.0  9949 oracle   ora_p005_orcl   ora_u005_orcl
 9888  0.0  9950 oracle   ora_p006_orcl   ora_u005_orcl
 9888  0.0  9951 oracle   ora_p007_orcl   ora_u005_orcl
 9888  0.0  9952 oracle   ora_cjq0_orcl   ora_u005_orcl
 9888  0.0  9996 oracle   ora_qm02_orcl   ora_u005_orcl
 9888  0.0  9998 oracle   ora_q002_orcl   ora_u005_orcl
 9888  0.0  9999 oracle   ora_q003_orcl   ora_u005_orcl
 9888  0.0 16009 oracle   ora_w000_orcl   ora_u005_orcl
 9894  0.0  9894 oracle   ora_dbw0_orcl   ora_dbw0_orcl
16414  0.0 16414 oracle   sed             sed -n -e 1p -e /orcl/p
 First column is the PID, second column is the CPU percent burn by the thread, third column is the thread Id, next column is the thread owner, the second last column is the oracle thread name and the last is the process name.
With ps, you can have a static view and possibly identify problematic processes
  • top

With top, you can see you processes or threads in a more dynamic fashion. Top option used to see threads is -H, but you have to mention which processes you want to analyze with -p parameter followed by pids. The main drawback of this command is that -p is limited to 20 pids but for a mid size multi-threaded instance, it’s ok.

[oracle@oel64-12c ~]$ top -p $(pgrep -d',' orcl$) -H
top - 21:55:37 up 1 day,  2:03,  6 users,  load average: 1.10, 1.04, 1.13
Tasks:  43 total,   0 running,  43 sleeping,   0 stopped,   0 zombie
Cpu0  : 25.8%us, 46.5%sy,  0.0%ni, 14.5%id, 10.3%wa,  0.0%hi,  3.0%si,  0.0%st
Cpu1  : 16.3%us, 46.8%sy,  0.0%ni, 28.9%id,  7.8%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4055296k total,  3052640k used,  1002656k free,    20360k buffers
Swap:  8388604k total,  1475216k used,  6913388k free,  2302200k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9878 oracle    -2   0 1489m  17m  15m S  6.7  0.4   3:03.07 ora_vktm_orcl
 9892 oracle    20   0 3457m 340m 252m S  0.5  8.6   0:01.27 ora_dia0_orcl
 9902 oracle    20   0 3457m 340m 252m S  0.5  8.6   0:02.00 ora_mmon_orcl
 9871 oracle    20   0 1489m  21m  19m S  0.0  0.5   0:00.33 ora_pmon_orcl
 9873 oracle    20   0 1489m  17m  15m S  0.0  0.4   0:01.17 ora_psp0_orcl
 9882 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.07 ora_scmn_orcl
 9883 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.00 oracle
 9884 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.25 ora_gen0_orcl
 9885 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.25 ora_mman_orcl
 9891 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.24 ora_dbrm_orcl
 9895 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.33 ora_lgwr_orcl
 9896 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:01.08 ora_ckpt_orcl
 9897 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.12 ora_lg00_orcl
 9898 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.03 ora_lg01_orcl
 9899 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.07 ora_smon_orcl
 9901 oracle    20   0 2594m 1.2g 1.2g S  0.0 31.2   0:00.13 ora_lreg_orcl
 9888 oracle    20   0 3457m 340m 252m S  0.0  8.6   0:00.40 ora_scmn_orcl
 9889 oracle    20   0 3457m 340m 252m S  0.0  8.6   0:00.01 oracle
  • pidstat

pidstat is a command which appears in OEL6. It runs like a vmstat or mpstat with an interval and a counter, but it gives information of how evolve cpu, io, memory consumption for a specific process.

For example, to see cpu consumption every second for the process with pid 9888

[oracle@oel64-12c ~]$ pidstat -p 9888 -u 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:03:14 PM       PID    %usr %system  %guest    %CPU   CPU  Command
10:03:15 PM      9888    0.00    0.00    0.00    0.00     0  ora_scmn_orcl
10:03:16 PM      9888    1.00    0.00    0.00    1.00     1  ora_scmn_orcl
10:03:17 PM      9888    0.00    0.00    0.00    0.00     1  ora_scmn_orcl
10:03:18 PM      9888    1.00    1.00    0.00    2.00     1  ora_scmn_orcl
Please note that the process name is ora_u005_orcl but it’s printed with the command name which is, in fact, the thread name.

So if you want to see every thread in this process, you need to use -t option:

[oracle@oel64-12c ~]$ pidstat -p 9888 -u -t 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:08:42 PM      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
10:08:43 PM      9888         -    0.00    0.00    0.00    0.00     1  ora_scmn_orcl
10:08:43 PM         -      9888    0.00    0.00    0.00    0.00     1  |__ora_scmn_orcl
10:08:43 PM         -      9889    0.00    0.00    0.00    0.00     1  |__oracle
10:08:43 PM         -      9890    0.00    0.00    0.00    0.00     0  |__ora_diag_orcl
10:08:43 PM         -      9892    0.00    0.00    0.00    0.00     0  |__ora_dia0_orcl
10:08:43 PM         -      9900    0.00    0.00    0.00    0.00     0  |__ora_reco_orcl
10:08:43 PM         -      9902    0.00    0.00    0.00    0.00     1  |__ora_mmon_orcl
10:08:43 PM         -      9903    0.00    0.00    0.00    0.00     0  |__ora_mmnl_orcl
10:08:43 PM         -      9904    0.00    0.00    0.00    0.00     0  |__ora_d000_orcl
10:08:43 PM         -      9905    0.00    0.00    0.00    0.00     0  |__ora_s000_orcl
10:08:43 PM         -      9906    0.00    0.00    0.00    0.00     0  |__ora_n000_orcl
10:08:43 PM         -      9932    0.00    0.00    0.00    0.00     0  |__ora_tmon_orcl
10:08:43 PM         -      9933    0.00    0.00    0.00    0.00     0  |__ora_tt00_orcl
10:08:43 PM         -      9934    0.00    0.00    0.00    0.00     1  |__ora_smco_orcl
10:08:43 PM         -      9938    0.00    0.00    0.00    0.00     1  |__ora_fbda_orcl
10:08:43 PM         -      9939    0.00    0.00    0.00    0.00     1  |__ora_aqpc_orcl
10:08:43 PM         -      9944    0.00    0.00    0.00    0.00     0  |__ora_p000_orcl
10:08:43 PM         -      9945    0.00    0.00    0.00    0.00     0  |__ora_p001_orcl
10:08:43 PM         -      9946    0.00    0.00    0.00    0.00     1  |__ora_p002_orcl
10:08:43 PM         -      9947    0.00    0.00    0.00    0.00     0  |__ora_p003_orcl
10:08:43 PM         -      9948    0.00    0.00    0.00    0.00     0  |__ora_p004_orcl
10:08:43 PM         -      9949    0.00    0.00    0.00    0.00     1  |__ora_p005_orcl
10:08:43 PM         -      9950    0.00    0.00    0.00    0.00     1  |__ora_p006_orcl
10:08:43 PM         -      9951    0.00    0.00    0.00    0.00     1  |__ora_p007_orcl
10:08:43 PM         -      9952    0.00    0.00    0.00    0.00     1  |__ora_cjq0_orcl
10:08:43 PM         -      9996    0.00    0.00    0.00    0.00     0  |__ora_qm02_orcl
10:08:43 PM         -      9998    0.00    0.00    0.00    0.00     1  |__ora_q002_orcl
10:08:43 PM         -      9999    0.00    0.00    0.00    0.00     0  |__ora_q003_orcl
10:08:43 PM         -     16009    0.00    0.00    0.00    0.00     1  |__ora_w000_orcl
10:08:43 PM         -     21462    0.00    1.00    0.00    1.00     1  |__ora_vkrm_orcl
10:08:43 PM         -     22117    0.00    0.00    0.00    0.00     1  |__ora_w001_orcl
10:08:43 PM         -     22128    0.00    0.00    0.00    0.00     0  |__ora_w002_orcl
10:08:43 PM         -     22689    0.00    0.00    0.00    0.00     1  |__ora_w003_orcl
10:08:43 PM         -     22703    0.00    0.00    0.00    0.00     0  |__ora_w004_orcl
10:08:43 PM         -     22713    0.00    0.00    0.00    0.00     0  |__ora_w005_orcl
There are other interesting options to monitor IO (-d), page faults and memory (-r), CPU utilization seen above (-u), switching activities (-w).
For example:
[oracle@oel64-12c ~]$ pidstat -p 9888 -w -t 1
Linux 2.6.39-400.17.1.el6uek.x86_64 (oel64-12c.localdomain)     01/14/2014      _x86_64_        (2 CPU)

10:57:54 PM      TGID       TID   cswch/s nvcswch/s  Command
10:57:55 PM      9888         -      1.00      1.00  ora_scmn_orcl
10:57:55 PM         -      9888      1.00      1.00  |__ora_scmn_orcl
10:57:55 PM         -      9889      0.00      0.00  |__oracle
10:57:55 PM         -      9890      1.00      0.00  |__ora_diag_orcl
10:57:55 PM         -      9892      1.00      0.00  |__ora_dia0_orcl
10:57:55 PM         -      9900      1.00      0.00  |__ora_reco_orcl
10:57:55 PM         -      9902      1.00      0.00  |__ora_mmon_orcl
10:57:55 PM         -      9903      1.00      1.00  |__ora_mmnl_orcl
10:57:55 PM         -      9904      1.00      0.00  |__ora_d000_orcl
10:57:55 PM         -      9905      1.00      0.00  |__ora_s000_orcl
10:57:55 PM         -      9906      1.00      0.00  |__ora_n000_orcl
10:57:55 PM         -      9932      1.00      0.00  |__ora_tmon_orcl
10:57:55 PM         -      9933      1.00      0.00  |__ora_tt00_orcl
10:57:55 PM         -      9934      1.00      1.00  |__ora_smco_orcl
10:57:55 PM         -      9938      1.00      0.00  |__ora_fbda_orcl
10:57:55 PM         -      9939      1.00      0.00  |__ora_aqpc_orcl
10:57:55 PM         -      9944      0.00      0.00  |__ora_p000_orcl
10:57:55 PM         -      9945      0.00      0.00  |__ora_p001_orcl
10:57:55 PM         -      9946      0.00      0.00  |__ora_p002_orcl
10:57:55 PM         -      9947      0.00      0.00  |__ora_p003_orcl
10:57:55 PM         -      9948      0.00      0.00  |__ora_p004_orcl
10:57:55 PM         -      9949      0.00      0.00  |__ora_p005_orcl
10:57:55 PM         -      9950      0.00      0.00  |__ora_p006_orcl
10:57:55 PM         -      9951      0.00      0.00  |__ora_p007_orcl
10:57:55 PM         -      9952      1.00      0.00  |__ora_cjq0_orcl
10:57:55 PM         -      9996      0.00      0.00  |__ora_qm02_orcl
10:57:55 PM         -      9998      0.00      0.00  |__ora_q002_orcl
10:57:55 PM         -      9999      1.00      0.00  |__ora_q003_orcl
10:57:55 PM         -     21462     96.00      3.00  |__ora_vkrm_orcl
10:57:55 PM         -     22713      1.00      0.00  |__ora_w005_orcl
10:57:55 PM         -     29708      0.00      0.00  |__ora_q001_orcl
10:57:55 PM         -     29709      0.00      0.00  |__oracle_29709_or
10:57:55 PM         -     26975      1.00      0.00  |__ora_w004_orcl
  • gdb (for debug)

If you want to trace system calls made by threads, you can use linux debugger (gdb). I don’t have a deep knowledge of gdb, but you can attach gdb to a process with the -p option.

[oracle@oel64-12c ~]$ gdb -p 9888
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9888

.../...
After this, you have a command which prints threads information (LWP (for Light Weight Process ???) indicates the Thread Id:
(gdb) info threads
  31 Thread 0x7f5a89ff6700 (LWP 29709)  0x0000003abe00e75d in read () from /lib64/libpthread.so.0  <<< my session is located here and is waiting for a command (read syscall)
  30 Thread 0x7f5a81ff2700 (LWP 29708)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  29 Thread 0x7f5b10beb700 (LWP 9889)  0x0000003abdcdf343 in poll () from /lib64/libc.so.6
  28 Thread 0x7f5b0ea2a700 (LWP 9890)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  27 Thread 0x7f5b07fff700 (LWP 9892)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  26 Thread 0x7f5afbfff700 (LWP 9900)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  25 Thread 0x7f5af3fff700 (LWP 9902)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  24 Thread 0x7f5aebfff700 (LWP 9903)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  23 Thread 0x7f5ae3fff700 (LWP 9904)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  22 Thread 0x7f5adbfff700 (LWP 9905)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  21 Thread 0x7f5ad3fff700 (LWP 9906)  0x0000003abdce9163 in epoll_wait () from /lib64/libc.so.6
  20 Thread 0x7f5acbfff700 (LWP 9932)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  19 Thread 0x7f5ac3fff700 (LWP 9933)  0x0000003abe00ef3d in nanosleep () from /lib64/libpthread.so.0
  18 Thread 0x7f5ab3fff700 (LWP 9934)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  17 Thread 0x7f5aabfff700 (LWP 9938)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  16 Thread 0x7f5aa3fff700 (LWP 9939)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  15 Thread 0x7f5a99ffe700 (LWP 9944)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  14 Thread 0x7f5a97ffd700 (LWP 9945)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  13 Thread 0x7f5a95ffc700 (LWP 9946)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  12 Thread 0x7f5a93ffb700 (LWP 9947)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  11 Thread 0x7f5a91ffa700 (LWP 9948)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  10 Thread 0x7f5a8fff9700 (LWP 9949)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  9 Thread 0x7f5a8dff8700 (LWP 9950)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  8 Thread 0x7f5a8bff7700 (LWP 9951)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  7 Thread 0x7f5a9bfff700 (LWP 9952)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  6 Thread 0x7f5a83ff3700 (LWP 9996)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  5 Thread 0x7f5a87ff5700 (LWP 9998)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  4 Thread 0x7f5a85ff4700 (LWP 9999)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  3 Thread 0x7f5abbfff700 (LWP 21462)  0x0000003abe00ef3d in nanosleep () from /lib64/libpthread.so.0
  2 Thread 0x7f5a79fee700 (LWP 22713)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
* 1 Thread 0x7f5b10f2a9e0 (LWP 9888)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
 Next, you can select a specific thread with the gdb “thread” command:
(gdb) thread 31
[Switching to thread 31 (Thread 0x7f5a89ff6700 (LWP 29709))]#0  0x0000003abe00e75d in read () from /lib64/libpthread.so.0
(gdb) info threads
* 31 Thread 0x7f5a89ff6700 (LWP 29709)  0x0000003abe00e75d in read () from /lib64/libpthread.so.0
  30 Thread 0x7f5a81ff2700 (LWP 29708)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  29 Thread 0x7f5b10beb700 (LWP 9889)  0x0000003abdcdf343 in poll () from /lib64/libc.so.6
  28 Thread 0x7f5b0ea2a700 (LWP 9890)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  27 Thread 0x7f5b07fff700 (LWP 9892)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
  26 Thread 0x7f5afbfff700 (LWP 9900)  0x0000003abdceb22a in semtimedop () from /lib64/libc.so.6
 .../...
Next, you can use breakpoints, watchpoint etc. to debug oracle calls etc.
If you are interested by tracing oracle system calls with gdb, Frits Hoogland have written many articles on this subject:

Install a simple hadoop cluster for testing

Today, I won’t write an oracle related post, but a post about hadoop. If you don’t have a full rack of servers in your cellar, and you want to write map/reduce algorithm and test it, you will need a simple cluster.

In my case, I installed hadoop on a 3 nodes cluster (bigdata1, bigdata2 and bigdata3). Each node is an Oracle Enterprise Linux 6 box with :

  • 2 Gb RAM
  • a 30Gb system disk
  • 2 disks mounted on /mnt/hdfs/1 and /mnt/hdfs/2 for storing HDFS datas. Each disk is 10Gb.

My master node will be bigdata1, with 2 slaves nodes: bigdata2 and bigdata3.

Each box has java installed:

[oracle@bigdata1 ~]$ java -version
java version "1.7.0_11"
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

First, you need to download last stable hadoop tarball: http://hadoop.apache.org/releases.html#Download. In my case, I downloaded Hadoop 1.1.2.

After this, you need to untar the file on each servers:

[oracle@bigdata3 ~]$ pwd
/home/oracle
[oracle@bigdata3 ~]$ tar -xvzf hadoop-1.1.2.tar.gz

On the master node, edit conf/core-site.xml to configure the namenode:

[oracle@bigdata1 hadoop-1.1.2]$ cat conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://bigdata1:8020</value>
    </property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/hdfs/1/hadoop/dfs,/mnt/hdfs/2/hadoop/dfs</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.checkpoint.dir</name>
  <value>/mnt/hdfs/1/hadoop/dfs/namesecondary,/mnt/hdfs/2/hadoop/dfs/namesecondary</value>
  <description>Determines where on the local filesystem the DFS secondary
               name node should store the temporary images to merge.
               If this is a comma-delimited list of directories then the image is
               replicated in all of the directories for redundancy.
  </description>
</property>
</configuration>

An important parameter is hadoop.tmp.dir. Indeed, as we want to setup a simple cluster for testing, we would like to keep this configuration as simple as possible. If you have a look to the doc pages related to config file (core, hdfs and mapreduce), you will see that most of the parameter (dfs.data.dir, mapred.system.dir etc. are derived from hadoop.tmp.dir parameter). This will keep our config very simple.

More information about this file here: http://hadoop.apache.org/docs/r1.1.2/core-default.html
Next, you have to configure the conf/hdfs-site.xml file (hdfs config). This is a basic config file with location of the name table on the local fs, and the number block replication ways:
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/mnt/hdfs/1/hadoop/dfs/name,/mnt/hdfs/2/hadoop/dfs/name</value>
</property>

<property>
  <name>dfs.replication</name>
      <value>2</value>
      <description>Default block replication.
                   The actual number of replications can be specified when the file is created.
                   The default is used if replication is not specified in create time.
      </description>
</property>
</configuration>
Other parameters are documented here: http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html
Finally, we need to configure the mapreduce config file (located in HADOOP_HOME/conf/mapred-site.xml).
In my case, I decided to configure only one parameter which is the location of the JobTracker (usually located on the namenode).
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
    <name>mapred.job.tracker</name>
    <value>bigdata1:9001</value>
</property>
</configuration>
  
You will find more information about mapreduce configuration parameters here: http://hadoop.apache.org/docs/r1.1.2/mapred-default.html
The next step is to mention which server will act as a namenode, and which servers will act as datanode.
To do this, we have to configure name resolution correctly, each node must resolve all the node names in the cluster:
[oracle@bigdata1 hadoop-1.1.2]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.99.40 bigdata1.localdomain bigdata1
192.168.99.41 bigdata2.localdomain bigdata2
192.168.99.42 bigdata3.localdomain bigdata3
The next step, is to edit masters and slaves files:
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/masters
bigdata1
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/slaves
bigdata1
bigdata2
bigdata3

Ok, now our system is configured as we want … simple !

Next step is to configure the hadoop environment. This can be done in the HADOOP_HOME/conf/hadoop-env.sh file.
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/hadoop-env.sh | sed -e ‘/^#/d’

export JAVA_HOME=/usr/java/jdk1.7.0_11
export HADOOP_HEAPSIZE=1024
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
export HADOOP_LOG_DIR=/home/oracle/hadoop-1.1.2/log

Each parameter is documented in the file.

Don’t forget to copy each config file on each cluster node.

Next step is to create directory for the name table location:

[oracle@bigdata1 hadoop-1.1.2]$ mkdir -p /mnt/hdfs/1/hadoop/dfs/name
[oracle@bigdata1 hadoop-1.1.2]$ mkdir -p /mnt/hdfs/2/hadoop/dfs/name

Now, we can format the namenode:

[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop namenode -format
13/07/02 09:01:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = bigdata1.localdomain/192.168.99.40
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.1.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
Re-format filesystem in /mnt/hdfs/1/hadoop/dfs/name ? (Y or N) Y
Re-format filesystem in /mnt/hdfs/2/hadoop/dfs/name ? (Y or N) Y
13/07/02 09:01:29 INFO util.GSet: VM type       = 64-bit
13/07/02 09:01:29 INFO util.GSet: 2% max memory = 19.7975 MB
13/07/02 09:01:29 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/07/02 09:01:29 INFO util.GSet: recommended=2097152, actual=2097152
13/07/02 09:01:29 INFO namenode.FSNamesystem: fsOwner=oracle
13/07/02 09:01:29 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/02 09:01:29 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/02 09:01:29 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/07/02 09:01:29 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/02 09:01:29 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/07/02 09:01:30 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/02 09:01:30 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/mnt/hdfs/1/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/mnt/hdfs/1/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO common.Storage: Storage directory /mnt/hdfs/1/hadoop/dfs/name has been successfully formatted.
13/07/02 09:01:30 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/02 09:01:30 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/mnt/hdfs/2/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/mnt/hdfs/2/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO common.Storage: Storage directory /mnt/hdfs/2/hadoop/dfs/name has been successfully formatted.
13/07/02 09:01:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdata1.localdomain/192.168.99.40
************************************************************/
 And now … the most important step, we will start our cluster. This operation is launched from the namenode:
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/start-all.sh
starting namenode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-namenode-bigdata1.localdomain.out
bigdata2: starting datanode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-datanode-bigdata2.localdomain.out
bigdata3: starting datanode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-datanode-bigdata3.localdomain.out
bigdata1: starting datanode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-datanode-bigdata1.localdomain.out
bigdata1: starting secondarynamenode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-secondarynamenode-bigdata1.localdomain.out
starting jobtracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-jobtracker-bigdata1.localdomain.out
bigdata2: starting tasktracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-tasktracker-bigdata2.localdomain.out
bigdata3: starting tasktracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-tasktracker-bigdata3.localdomain.out
bigdata1: starting tasktracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-tasktracker-bigdata1.localdomain.out
Ok everything seems to be fine (you can have a look to the log files to check processes health).
You can use to jpm command to check that every process is now launched:
  • On the name node (in my case, it acts as name node and data node, that’s why we find a TaskTracker and a DataNode process)
[oracle@bigdata1 hadoop-1.1.2]$ jps -m
11699 JobTracker
11592 SecondaryNameNode
11832 TaskTracker
11964 Jps -m
11467 DataNode
11345 NameNode
  • On the data nodes
[oracle@bigdata2 log]$ jps -m
6675 DataNode
6788 TaskTracker
6866 Jps -m

[oracle@bigdata3 conf]$ jps -m
6856 Jps -m
6770 TaskTracker
6657 DataNode
We can now, push big files on hdfs to launch map/reduce on it.
First, we need to create directories:
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -mkdir input
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -mkdir output
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -ls
Found 2 items
drwxr-xr-x - oracle supergroup 0 2013-07-02 09:35 /user/oracle/input
drwxr-xr-x - oracle supergroup 0 2013-07-02 09:36 /user/oracle/output
Next, we will copy a sample file that contains information about artists and albums (see a sample above)
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -copyFromLocal /tmp/unique_tracks.txt /user/oracle/input
[oracle@bigdata1 hadoop-1.1.2]$ head -50 /tmp/unique_tracks.txt
TRMMMHY12903CB53F1<SEP>SOPMIYT12A6D4F851E<SEP>Joseph Locke<SEP>Goodbye
TRMMMML128F4280EE9<SEP>SOJCFMH12A8C13B0C2<SEP>The Sun Harbor's Chorus-Documentary Recordings<SEP>Mama_ mama can't you see ?
TRMMMNS128F93548E1<SEP>SOYGNWH12AB018191E<SEP>3 Gars Su'l Sofa<SEP>L'antarctique
TRMMMXJ12903CBF111<SEP>SOLJTLX12AB01890ED<SEP>Jorge Negrete<SEP>El hijo del pueblo
TRMMMCJ128F930BFF8<SEP>SOQQESG12A58A7AA28<SEP>Danny Diablo<SEP>Cold Beer feat. Prince Metropolitan
TRMMMBW128F4260CAE<SEP>SOMPVQB12A8C1379BB<SEP>Tiger Lou<SEP>Pilots
TRMMMXI128F4285A3F<SEP>SOGPCJI12A8C13CCA0<SEP>Waldemar Bastos<SEP>N Gana
TRMMMKI128F931D80D<SEP>SOSDCFG12AB0184647<SEP>Lena Philipsson<SEP>006
TRMMMUT128F42646E8<SEP>SOBARPM12A8C133DFF<SEP>Shawn Colvin<SEP>(Looking For) The Heart Of Saturday
TRMMMQY128F92F0EA3<SEP>SOKOVRQ12A8C142811<SEP>Dying Fetus<SEP>Ethos of Coercion
.../...
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -du
Found 2 items
84046293    hdfs://bigdata1:8020/user/oracle/input
0           hdfs://bigdata1:8020/user/oracle/output
Finally, we can create our MapReduce program. In my case, I used a simple counter of artists (Third field):
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class ArtistCounter {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      String[] res = value.toString().split("<SEP>");
        word.set(res[2].toLowerCase());
        context.write(word, one);
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "Laurent's Singer counter");

    job.setJarByClass(ArtistCounter.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
Next step, compile …
[oracle@bigdata1 java]$ echo $CLASSPATH
/home/oracle/hadoop-1.1.2/libexec/../conf:/usr/java/jdk1.7.0_11/lib/tools.jar:/home/oracle/hadoop-1.1.2/libexec/..:/home/oracle/hadoop-1.1.2/libexec/../hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/asm-3.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjrt-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjtools-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-1.7.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-cli-1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-codec-1.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-collections-3.2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-configuration-1.6.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-daemon-1.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-digester-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-el-1.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-httpclient-3.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-io-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-lang-2.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-api-1.0.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-math-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-net-3.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/core-3.1.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-capacity-scheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-fairscheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-thriftfs-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hsqldb-1.8.0.10.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-compiler-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-runtime-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jdeb-0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-core-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-json-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-server-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jets3t-0.6.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-util-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsch-0.1.42.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/junit-4.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/kfs-0.2.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/log4j-1.2.15.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/mockito-all-1.8.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/oro-2.0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/servlet-api-2.5-20081211.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-api-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/xmlenc-0.52.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/home/oracle/hadoop-1.1.2/hadoop-ant-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-client-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-examples-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-minicluster-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-test-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-tools-1.1.2.jar

[oracle@bigdata1 java]$ javac -cp $CLASSPATH:. ArtistCounter.java
Next … packaging
[oracle@bigdata1 java]$ jar -cvf ArtistCounter.jar ArtistCounter*.class
added manifest
adding: ArtistCounter.class(in = 1489) (out= 813)(deflated 45%)
adding: ArtistCounter$IntSumReducer.class(in = 1751) (out= 742)(deflated 57%)
adding: ArtistCounter$TokenizerMapper.class(in = 1721) (out= 718)(deflated 58%)
And now, we can run this small test program in our hadoop cluster:
[oracle@bigdata1 java]$ echo $CLASSPATH
/home/oracle/hadoop-1.1.2/libexec/../conf:/usr/java/jdk1.7.0_11/lib/tools.jar:/home/oracle/hadoop-1.1.2/libexec/..:/home/oracle/hadoop-1.1.2/libexec/../hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/asm-3.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjrt-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjtools-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-1.7.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-cli-1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-codec-1.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-collections-3.2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-configuration-1.6.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-daemon-1.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-digester-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-el-1.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-httpclient-3.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-io-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-lang-2.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-api-1.0.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-math-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-net-3.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/core-3.1.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-capacity-scheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-fairscheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-thriftfs-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hsqldb-1.8.0.10.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-compiler-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-runtime-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jdeb-0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-core-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-json-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-server-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jets3t-0.6.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-util-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsch-0.1.42.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/junit-4.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/kfs-0.2.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/log4j-1.2.15.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/mockito-all-1.8.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/oro-2.0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/servlet-api-2.5-20081211.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-api-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/xmlenc-0.52.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/home/oracle/hadoop-1.1.2/hadoop-ant-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-client-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-examples-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-minicluster-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-test-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-tools-1.1.2.jar

[oracle@bigdata1 java]$ /home/oracle/hadoop-1.1.2/bin/hadoop jar ArtistCounter.jar ArtistCounter /user/oracle/input /user/oracle/output/laurent
13/07/02 09:56:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/02 09:56:08 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 09:56:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/02 09:56:08 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/02 09:56:08 INFO mapred.JobClient: Running job: job_201307020925_0001
13/07/02 09:56:09 INFO mapred.JobClient:  map 0% reduce 0%
13/07/02 09:56:20 INFO mapred.JobClient:  map 50% reduce 0%
13/07/02 09:56:22 INFO mapred.JobClient:  map 100% reduce 0%
13/07/02 09:56:28 INFO mapred.JobClient:  map 100% reduce 33%
13/07/02 09:56:31 INFO mapred.JobClient:  map 100% reduce 100%
13/07/02 09:56:32 INFO mapred.JobClient: Job complete: job_201307020925_0001
13/07/02 09:56:32 INFO mapred.JobClient: Counters: 29
13/07/02 09:56:32 INFO mapred.JobClient:   Job Counters
13/07/02 09:56:32 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/02 09:56:32 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18647
13/07/02 09:56:32 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/02 09:56:32 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/02 09:56:32 INFO mapred.JobClient:     Launched map tasks=2
13/07/02 09:56:32 INFO mapred.JobClient:     Data-local map tasks=2
13/07/02 09:56:32 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10862
13/07/02 09:56:32 INFO mapred.JobClient:   File Output Format Counters
13/07/02 09:56:32 INFO mapred.JobClient:     Bytes Written=1651492
13/07/02 09:56:32 INFO mapred.JobClient:   FileSystemCounters
13/07/02 09:56:32 INFO mapred.JobClient:     FILE_BYTES_READ=6240405
13/07/02 09:56:32 INFO mapred.JobClient:     HDFS_BYTES_READ=84050631
13/07/02 09:56:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=9124002
13/07/02 09:56:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1651492
13/07/02 09:56:32 INFO mapred.JobClient:   File Input Format Counters
13/07/02 09:56:32 INFO mapred.JobClient:     Bytes Read=84050389
13/07/02 09:56:32 INFO mapred.JobClient:   Map-Reduce Framework
13/07/02 09:56:32 INFO mapred.JobClient:     Map output materialized bytes=2724501
13/07/02 09:56:32 INFO mapred.JobClient:     Map input records=1000000
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce shuffle bytes=2724501
13/07/02 09:56:32 INFO mapred.JobClient:     Spilled Records=370725
13/07/02 09:56:32 INFO mapred.JobClient:     Map output bytes=18534685
13/07/02 09:56:32 INFO mapred.JobClient:     Total committed heap usage (bytes)=270082048
13/07/02 09:56:32 INFO mapred.JobClient:     CPU time spent (ms)=10870
13/07/02 09:56:32 INFO mapred.JobClient:     Combine input records=1150423
13/07/02 09:56:32 INFO mapred.JobClient:     SPLIT_RAW_BYTES=242
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce input records=110151
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce input groups=72663
13/07/02 09:56:32 INFO mapred.JobClient:     Combine output records=260574
13/07/02 09:56:32 INFO mapred.JobClient:     Physical memory (bytes) snapshot=453685248
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce output records=72663
13/07/02 09:56:32 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3173515264
13/07/02 09:56:32 INFO mapred.JobClient:     Map output records=1000000
Last step, we can see the content of the result file, and counting how many album did the mapreducer find in the file:
[oracle@bigdata1 java]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -ls /user/oracle/output/laurent
Found 3 items
-rw-r--r--   3 oracle supergroup          0 2013-07-02 09:56 /user/oracle/output/laurent/_SUCCESS
drwxr-xr-x   - oracle supergroup          0 2013-07-02 09:56 /user/oracle/output/laurent/_logs
-rw-r--r--   3 oracle supergroup    1651492 2013-07-02 09:56 /user/oracle/output/laurent/part-r-00000

[oracle@bigdata1 java]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -cat /user/oracle/output/laurent/part-r-00000 | grep 'pink floyd'
pink floyd      117
pink floyd tribute      10

BBED in Oracle 11g

If you like playing with oracle internals, maybe you frequently use BBED (Block Editor). In Oracle 11g, BBED becomes unavailable but if you search in the ins_rdbms.mk makefile (located in $ORACLE_HOME/rdbms/lib directory), you will see that the entry is still there.

If you want to use it with an Oracle 11g database, you will have to get some files from an old Oracle 10g installation and build the binary in your Oracle 11g home.

The operation is detailed in the differents points above:

1- If you have installed Oracle 10g and 11g on the same server, define your two Oracle Home (otherwise you can use scp to copy files across the network)

export ORA10G_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORA11G_HOME=/u01/app/oracle/product/11.2.0/dbhome_1

2- Copy the librairies shipped with Oracle 10g but unavailable in the Oracle 11g

cp $ORA10G_HOME/rdbms/lib/ssbbded.o $ORA11G_HOME/rdbms/lib
cp $ORA10G_HOME/rdbms/lib/sbbdpt.o $ORA11G_HOME/rdbms/lib

3- Copy the message files shipped with Oracle 10g but unavailable in Oracle 11g

cp $ORA10G_HOME/rdbms/mesg/bbed* $ORA11G_HOME/rdbms/mesg

4- Build the bbed executable

make -f $ORA11G_HOME/rdbms/lib/ins_rdbms.mk BBED=$ORA11G_HOME/bin/bbed $ORA11G_HOME/bin/bbed

5- Have fun …

[oracle@linux1 ~]$ . oraenv
ORACLE_SID = [oracle] ? db112
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1 is /u01/app/oracle
[oracle@linux1 ~]$ $ORACLE_HOME/bin/bbed
Password:

BBED: Release 2.0.0.0.0 - Limited Production on Thu Jun 9 10:10:49 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED>

Password is : blockedit

Don’t forget to use the library in the same kernel architecture …