Data … as usual

All things about data by Laurent Leturgez

Category Archives: Linux

Install a simple hadoop cluster for testing

Today, I won’t write an oracle related post, but a post about hadoop. If you don’t have a full rack of servers in your cellar, and you want to write map/reduce algorithm and test it, you will need a simple cluster.

In my case, I installed hadoop on a 3 nodes cluster (bigdata1, bigdata2 and bigdata3). Each node is an Oracle Enterprise Linux 6 box with :

  • 2 Gb RAM
  • a 30Gb system disk
  • 2 disks mounted on /mnt/hdfs/1 and /mnt/hdfs/2 for storing HDFS datas. Each disk is 10Gb.

My master node will be bigdata1, with 2 slaves nodes: bigdata2 and bigdata3.

Each box has java installed:

[oracle@bigdata1 ~]$ java -version
java version "1.7.0_11"
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

First, you need to download last stable hadoop tarball: http://hadoop.apache.org/releases.html#Download. In my case, I downloaded Hadoop 1.1.2.

After this, you need to untar the file on each servers:

[oracle@bigdata3 ~]$ pwd
/home/oracle
[oracle@bigdata3 ~]$ tar -xvzf hadoop-1.1.2.tar.gz

On the master node, edit conf/core-site.xml to configure the namenode:

[oracle@bigdata1 hadoop-1.1.2]$ cat conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://bigdata1:8020</value>
    </property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/hdfs/1/hadoop/dfs,/mnt/hdfs/2/hadoop/dfs</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.checkpoint.dir</name>
  <value>/mnt/hdfs/1/hadoop/dfs/namesecondary,/mnt/hdfs/2/hadoop/dfs/namesecondary</value>
  <description>Determines where on the local filesystem the DFS secondary
               name node should store the temporary images to merge.
               If this is a comma-delimited list of directories then the image is
               replicated in all of the directories for redundancy.
  </description>
</property>
</configuration>

An important parameter is hadoop.tmp.dir. Indeed, as we want to setup a simple cluster for testing, we would like to keep this configuration as simple as possible. If you have a look to the doc pages related to config file (core, hdfs and mapreduce), you will see that most of the parameter (dfs.data.dir, mapred.system.dir etc. are derived from hadoop.tmp.dir parameter). This will keep our config very simple.

More information about this file here: http://hadoop.apache.org/docs/r1.1.2/core-default.html
Next, you have to configure the conf/hdfs-site.xml file (hdfs config). This is a basic config file with location of the name table on the local fs, and the number block replication ways:
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/mnt/hdfs/1/hadoop/dfs/name,/mnt/hdfs/2/hadoop/dfs/name</value>
</property>

<property>
  <name>dfs.replication</name>
      <value>2</value>
      <description>Default block replication.
                   The actual number of replications can be specified when the file is created.
                   The default is used if replication is not specified in create time.
      </description>
</property>
</configuration>
Other parameters are documented here: http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html
Finally, we need to configure the mapreduce config file (located in HADOOP_HOME/conf/mapred-site.xml).
In my case, I decided to configure only one parameter which is the location of the JobTracker (usually located on the namenode).
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
    <name>mapred.job.tracker</name>
    <value>bigdata1:9001</value>
</property>
</configuration>
  
You will find more information about mapreduce configuration parameters here: http://hadoop.apache.org/docs/r1.1.2/mapred-default.html
The next step is to mention which server will act as a namenode, and which servers will act as datanode.
To do this, we have to configure name resolution correctly, each node must resolve all the node names in the cluster:
[oracle@bigdata1 hadoop-1.1.2]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.99.40 bigdata1.localdomain bigdata1
192.168.99.41 bigdata2.localdomain bigdata2
192.168.99.42 bigdata3.localdomain bigdata3
The next step, is to edit masters and slaves files:
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/masters
bigdata1
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/slaves
bigdata1
bigdata2
bigdata3

Ok, now our system is configured as we want … simple !

Next step is to configure the hadoop environment. This can be done in the HADOOP_HOME/conf/hadoop-env.sh file.
[oracle@bigdata1 hadoop-1.1.2]$ cat conf/hadoop-env.sh | sed -e ‘/^#/d’

export JAVA_HOME=/usr/java/jdk1.7.0_11
export HADOOP_HEAPSIZE=1024
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
export HADOOP_LOG_DIR=/home/oracle/hadoop-1.1.2/log

Each parameter is documented in the file.

Don’t forget to copy each config file on each cluster node.

Next step is to create directory for the name table location:

[oracle@bigdata1 hadoop-1.1.2]$ mkdir -p /mnt/hdfs/1/hadoop/dfs/name
[oracle@bigdata1 hadoop-1.1.2]$ mkdir -p /mnt/hdfs/2/hadoop/dfs/name

Now, we can format the namenode:

[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop namenode -format
13/07/02 09:01:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = bigdata1.localdomain/192.168.99.40
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.1.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
Re-format filesystem in /mnt/hdfs/1/hadoop/dfs/name ? (Y or N) Y
Re-format filesystem in /mnt/hdfs/2/hadoop/dfs/name ? (Y or N) Y
13/07/02 09:01:29 INFO util.GSet: VM type       = 64-bit
13/07/02 09:01:29 INFO util.GSet: 2% max memory = 19.7975 MB
13/07/02 09:01:29 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/07/02 09:01:29 INFO util.GSet: recommended=2097152, actual=2097152
13/07/02 09:01:29 INFO namenode.FSNamesystem: fsOwner=oracle
13/07/02 09:01:29 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/02 09:01:29 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/02 09:01:29 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/07/02 09:01:29 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/02 09:01:29 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/07/02 09:01:30 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/02 09:01:30 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/mnt/hdfs/1/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/mnt/hdfs/1/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO common.Storage: Storage directory /mnt/hdfs/1/hadoop/dfs/name has been successfully formatted.
13/07/02 09:01:30 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/02 09:01:30 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/mnt/hdfs/2/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/mnt/hdfs/2/hadoop/dfs/name/current/edits
13/07/02 09:01:30 INFO common.Storage: Storage directory /mnt/hdfs/2/hadoop/dfs/name has been successfully formatted.
13/07/02 09:01:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdata1.localdomain/192.168.99.40
************************************************************/
 And now … the most important step, we will start our cluster. This operation is launched from the namenode:
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/start-all.sh
starting namenode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-namenode-bigdata1.localdomain.out
bigdata2: starting datanode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-datanode-bigdata2.localdomain.out
bigdata3: starting datanode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-datanode-bigdata3.localdomain.out
bigdata1: starting datanode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-datanode-bigdata1.localdomain.out
bigdata1: starting secondarynamenode, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-secondarynamenode-bigdata1.localdomain.out
starting jobtracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-jobtracker-bigdata1.localdomain.out
bigdata2: starting tasktracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-tasktracker-bigdata2.localdomain.out
bigdata3: starting tasktracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-tasktracker-bigdata3.localdomain.out
bigdata1: starting tasktracker, logging to /home/oracle/hadoop-1.1.2/log/hadoop-oracle-tasktracker-bigdata1.localdomain.out
Ok everything seems to be fine (you can have a look to the log files to check processes health).
You can use to jpm command to check that every process is now launched:
  • On the name node (in my case, it acts as name node and data node, that’s why we find a TaskTracker and a DataNode process)
[oracle@bigdata1 hadoop-1.1.2]$ jps -m
11699 JobTracker
11592 SecondaryNameNode
11832 TaskTracker
11964 Jps -m
11467 DataNode
11345 NameNode
  • On the data nodes
[oracle@bigdata2 log]$ jps -m
6675 DataNode
6788 TaskTracker
6866 Jps -m

[oracle@bigdata3 conf]$ jps -m
6856 Jps -m
6770 TaskTracker
6657 DataNode
We can now, push big files on hdfs to launch map/reduce on it.
First, we need to create directories:
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -mkdir input
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -mkdir output
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -ls
Found 2 items
drwxr-xr-x - oracle supergroup 0 2013-07-02 09:35 /user/oracle/input
drwxr-xr-x - oracle supergroup 0 2013-07-02 09:36 /user/oracle/output
Next, we will copy a sample file that contains information about artists and albums (see a sample above)
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -copyFromLocal /tmp/unique_tracks.txt /user/oracle/input
[oracle@bigdata1 hadoop-1.1.2]$ head -50 /tmp/unique_tracks.txt
TRMMMHY12903CB53F1<SEP>SOPMIYT12A6D4F851E<SEP>Joseph Locke<SEP>Goodbye
TRMMMML128F4280EE9<SEP>SOJCFMH12A8C13B0C2<SEP>The Sun Harbor's Chorus-Documentary Recordings<SEP>Mama_ mama can't you see ?
TRMMMNS128F93548E1<SEP>SOYGNWH12AB018191E<SEP>3 Gars Su'l Sofa<SEP>L'antarctique
TRMMMXJ12903CBF111<SEP>SOLJTLX12AB01890ED<SEP>Jorge Negrete<SEP>El hijo del pueblo
TRMMMCJ128F930BFF8<SEP>SOQQESG12A58A7AA28<SEP>Danny Diablo<SEP>Cold Beer feat. Prince Metropolitan
TRMMMBW128F4260CAE<SEP>SOMPVQB12A8C1379BB<SEP>Tiger Lou<SEP>Pilots
TRMMMXI128F4285A3F<SEP>SOGPCJI12A8C13CCA0<SEP>Waldemar Bastos<SEP>N Gana
TRMMMKI128F931D80D<SEP>SOSDCFG12AB0184647<SEP>Lena Philipsson<SEP>006
TRMMMUT128F42646E8<SEP>SOBARPM12A8C133DFF<SEP>Shawn Colvin<SEP>(Looking For) The Heart Of Saturday
TRMMMQY128F92F0EA3<SEP>SOKOVRQ12A8C142811<SEP>Dying Fetus<SEP>Ethos of Coercion
.../...
[oracle@bigdata1 hadoop-1.1.2]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -du
Found 2 items
84046293    hdfs://bigdata1:8020/user/oracle/input
0           hdfs://bigdata1:8020/user/oracle/output
Finally, we can create our MapReduce program. In my case, I used a simple counter of artists (Third field):
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class ArtistCounter {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      String[] res = value.toString().split("<SEP>");
        word.set(res[2].toLowerCase());
        context.write(word, one);
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "Laurent's Singer counter");

    job.setJarByClass(ArtistCounter.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
Next step, compile …
[oracle@bigdata1 java]$ echo $CLASSPATH
/home/oracle/hadoop-1.1.2/libexec/../conf:/usr/java/jdk1.7.0_11/lib/tools.jar:/home/oracle/hadoop-1.1.2/libexec/..:/home/oracle/hadoop-1.1.2/libexec/../hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/asm-3.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjrt-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjtools-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-1.7.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-cli-1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-codec-1.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-collections-3.2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-configuration-1.6.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-daemon-1.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-digester-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-el-1.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-httpclient-3.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-io-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-lang-2.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-api-1.0.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-math-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-net-3.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/core-3.1.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-capacity-scheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-fairscheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-thriftfs-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hsqldb-1.8.0.10.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-compiler-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-runtime-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jdeb-0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-core-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-json-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-server-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jets3t-0.6.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-util-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsch-0.1.42.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/junit-4.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/kfs-0.2.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/log4j-1.2.15.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/mockito-all-1.8.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/oro-2.0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/servlet-api-2.5-20081211.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-api-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/xmlenc-0.52.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/home/oracle/hadoop-1.1.2/hadoop-ant-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-client-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-examples-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-minicluster-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-test-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-tools-1.1.2.jar

[oracle@bigdata1 java]$ javac -cp $CLASSPATH:. ArtistCounter.java
Next … packaging
[oracle@bigdata1 java]$ jar -cvf ArtistCounter.jar ArtistCounter*.class
added manifest
adding: ArtistCounter.class(in = 1489) (out= 813)(deflated 45%)
adding: ArtistCounter$IntSumReducer.class(in = 1751) (out= 742)(deflated 57%)
adding: ArtistCounter$TokenizerMapper.class(in = 1721) (out= 718)(deflated 58%)
And now, we can run this small test program in our hadoop cluster:
[oracle@bigdata1 java]$ echo $CLASSPATH
/home/oracle/hadoop-1.1.2/libexec/../conf:/usr/java/jdk1.7.0_11/lib/tools.jar:/home/oracle/hadoop-1.1.2/libexec/..:/home/oracle/hadoop-1.1.2/libexec/../hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/asm-3.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjrt-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/aspectjtools-1.6.11.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-1.7.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-cli-1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-codec-1.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-collections-3.2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-configuration-1.6.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-daemon-1.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-digester-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-el-1.0.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-httpclient-3.0.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-io-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-lang-2.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-logging-api-1.0.4.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-math-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/commons-net-3.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/core-3.1.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-capacity-scheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-fairscheduler-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hadoop-thriftfs-1.1.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/hsqldb-1.8.0.10.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-compiler-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jasper-runtime-5.5.12.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jdeb-0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-core-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-json-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jersey-server-1.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jets3t-0.6.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jetty-util-6.1.26.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsch-0.1.42.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/junit-4.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/kfs-0.2.2.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/log4j-1.2.15.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/mockito-all-1.8.5.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/oro-2.0.8.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/servlet-api-2.5-20081211.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-api-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/xmlenc-0.52.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/oracle/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/home/oracle/hadoop-1.1.2/hadoop-ant-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-client-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-core-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-examples-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-minicluster-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-test-1.1.2.jar:/home/oracle/hadoop-1.1.2/hadoop-tools-1.1.2.jar

[oracle@bigdata1 java]$ /home/oracle/hadoop-1.1.2/bin/hadoop jar ArtistCounter.jar ArtistCounter /user/oracle/input /user/oracle/output/laurent
13/07/02 09:56:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/02 09:56:08 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 09:56:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/02 09:56:08 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/02 09:56:08 INFO mapred.JobClient: Running job: job_201307020925_0001
13/07/02 09:56:09 INFO mapred.JobClient:  map 0% reduce 0%
13/07/02 09:56:20 INFO mapred.JobClient:  map 50% reduce 0%
13/07/02 09:56:22 INFO mapred.JobClient:  map 100% reduce 0%
13/07/02 09:56:28 INFO mapred.JobClient:  map 100% reduce 33%
13/07/02 09:56:31 INFO mapred.JobClient:  map 100% reduce 100%
13/07/02 09:56:32 INFO mapred.JobClient: Job complete: job_201307020925_0001
13/07/02 09:56:32 INFO mapred.JobClient: Counters: 29
13/07/02 09:56:32 INFO mapred.JobClient:   Job Counters
13/07/02 09:56:32 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/02 09:56:32 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18647
13/07/02 09:56:32 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/02 09:56:32 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/02 09:56:32 INFO mapred.JobClient:     Launched map tasks=2
13/07/02 09:56:32 INFO mapred.JobClient:     Data-local map tasks=2
13/07/02 09:56:32 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10862
13/07/02 09:56:32 INFO mapred.JobClient:   File Output Format Counters
13/07/02 09:56:32 INFO mapred.JobClient:     Bytes Written=1651492
13/07/02 09:56:32 INFO mapred.JobClient:   FileSystemCounters
13/07/02 09:56:32 INFO mapred.JobClient:     FILE_BYTES_READ=6240405
13/07/02 09:56:32 INFO mapred.JobClient:     HDFS_BYTES_READ=84050631
13/07/02 09:56:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=9124002
13/07/02 09:56:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1651492
13/07/02 09:56:32 INFO mapred.JobClient:   File Input Format Counters
13/07/02 09:56:32 INFO mapred.JobClient:     Bytes Read=84050389
13/07/02 09:56:32 INFO mapred.JobClient:   Map-Reduce Framework
13/07/02 09:56:32 INFO mapred.JobClient:     Map output materialized bytes=2724501
13/07/02 09:56:32 INFO mapred.JobClient:     Map input records=1000000
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce shuffle bytes=2724501
13/07/02 09:56:32 INFO mapred.JobClient:     Spilled Records=370725
13/07/02 09:56:32 INFO mapred.JobClient:     Map output bytes=18534685
13/07/02 09:56:32 INFO mapred.JobClient:     Total committed heap usage (bytes)=270082048
13/07/02 09:56:32 INFO mapred.JobClient:     CPU time spent (ms)=10870
13/07/02 09:56:32 INFO mapred.JobClient:     Combine input records=1150423
13/07/02 09:56:32 INFO mapred.JobClient:     SPLIT_RAW_BYTES=242
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce input records=110151
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce input groups=72663
13/07/02 09:56:32 INFO mapred.JobClient:     Combine output records=260574
13/07/02 09:56:32 INFO mapred.JobClient:     Physical memory (bytes) snapshot=453685248
13/07/02 09:56:32 INFO mapred.JobClient:     Reduce output records=72663
13/07/02 09:56:32 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3173515264
13/07/02 09:56:32 INFO mapred.JobClient:     Map output records=1000000
Last step, we can see the content of the result file, and counting how many album did the mapreducer find in the file:
[oracle@bigdata1 java]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -ls /user/oracle/output/laurent
Found 3 items
-rw-r--r--   3 oracle supergroup          0 2013-07-02 09:56 /user/oracle/output/laurent/_SUCCESS
drwxr-xr-x   - oracle supergroup          0 2013-07-02 09:56 /user/oracle/output/laurent/_logs
-rw-r--r--   3 oracle supergroup    1651492 2013-07-02 09:56 /user/oracle/output/laurent/part-r-00000

[oracle@bigdata1 java]$ /home/oracle/hadoop-1.1.2/bin/hadoop dfs -cat /user/oracle/output/laurent/part-r-00000 | grep 'pink floyd'
pink floyd      117
pink floyd tribute      10
Advertisement

BBED in Oracle 11g

If you like playing with oracle internals, maybe you frequently use BBED (Block Editor). In Oracle 11g, BBED becomes unavailable but if you search in the ins_rdbms.mk makefile (located in $ORACLE_HOME/rdbms/lib directory), you will see that the entry is still there.

If you want to use it with an Oracle 11g database, you will have to get some files from an old Oracle 10g installation and build the binary in your Oracle 11g home.

The operation is detailed in the differents points above:

1- If you have installed Oracle 10g and 11g on the same server, define your two Oracle Home (otherwise you can use scp to copy files across the network)

export ORA10G_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORA11G_HOME=/u01/app/oracle/product/11.2.0/dbhome_1

2- Copy the librairies shipped with Oracle 10g but unavailable in the Oracle 11g

cp $ORA10G_HOME/rdbms/lib/ssbbded.o $ORA11G_HOME/rdbms/lib
cp $ORA10G_HOME/rdbms/lib/sbbdpt.o $ORA11G_HOME/rdbms/lib

3- Copy the message files shipped with Oracle 10g but unavailable in Oracle 11g

cp $ORA10G_HOME/rdbms/mesg/bbed* $ORA11G_HOME/rdbms/mesg

4- Build the bbed executable

make -f $ORA11G_HOME/rdbms/lib/ins_rdbms.mk BBED=$ORA11G_HOME/bin/bbed $ORA11G_HOME/bin/bbed

5- Have fun …

[oracle@linux1 ~]$ . oraenv
ORACLE_SID = [oracle] ? db112
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1 is /u01/app/oracle
[oracle@linux1 ~]$ $ORACLE_HOME/bin/bbed
Password:

BBED: Release 2.0.0.0.0 - Limited Production on Thu Jun 9 10:10:49 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED>

Password is : blockedit

Don’t forget to use the library in the same kernel architecture …

Oracle 11g md_restore and compatible.rdbms

I have recently made a restore test of my diskgroup metadata with the new version of oracle server : 11.2.0.1.

Initially, I have created a TEST_DG diskgroup with the new asmca interface.
As I had a 10g database on my laptop, I have set the compatibility.rdbms parameter on 10.2.0.0.
After a rapid md_backup command to save my diskgroup metadatas, I have tried to restore it with the md_restore command.

What a surprise when I saw this error message:

ASMCMD [+] > md_restore md_backup.sav -G TEST_DG
Current Diskgroup metadata being restored: TEST_DG
ASMCMD-09352: CREATE DISKGROUP failed
ORA-15018: diskgroup cannot be created
ORA-15283: ASM operation requires compatible.rdbms of 11.1.0.7.0 or higher (DBD ERROR: OCIStmtExecute)

After a short research, I realize that the CREATE DISKGROUP command is generated like this:

create diskgroup TEST_DG EXTERNAL redundancy disk '/dev/oracleasm/disks/ASM10'name TEST_DG_0001 size 100M disk '/dev/oracleasm/disks/ASM05' name
TEST_DG_0000 size 100M attribute 'compatible.asm' = '11.2.0.0.0' , 'compatible.rdbms' = '10.2.0.0.0' , 'au_size' = '1048576', 'sector_size' = '512
', 'cell.smart_scan_capable' = 'FALSE';

And this command uses properties which have been introduced in Oracle 11g release (for example: sector_size). Next command works fine:

create diskgroup TEST_DG EXTERNAL redundancy disk '/dev/oracleasm/disks/ASM10' name TEST_DG_0001 size 100M disk '/dev/oracleasm/disks/ASM05' name
TEST_DG_0000 size 100M attribute 'compatible.asm' = '11.2.0.0.0' , 'compatible.rdbms' = '10.2.0.0.0' ;

So, if like me, you have some Oracle 10g databases that run on an ASM 11g release, you have to restore your diskgroup metadata in an SQL file (by using -S option of the md_restore command) and adapt the DDLs for your needs.

Not really easy ! 😉

Load ACFS on CentOS 5

In my last post, I wrote that acfs was not compatible on CentOS.
So I decided to search a solution … and I found it.

acfsload, the program who loads the acfs driver makes an OS version check which fails.
This program uses a perl module osds_acfslib.pm which is located on the $ORACLE_HOME/lib directory of the Grid Infrastructure.

In this perl module, you can find at lines 280/281 the perl code that makes this version check. This check is only an “rpm -qa | grep release” !

So if you are on a CentOS 5, you just have to complete this line by adding a centos check :

[root@linux1 lib]# cd /u01/app/oracle/product/11.2.0/grid/lib
[root@linux1 lib]# cp -p osds_acfslib.pm osds_acfslib.pm.ORIG
.../...
[root@linux1 lib]# diff osds_acfslib.pm osds_acfslib.pm.ORIG
281,281
< ($release =~ /redhat-release-5/) || ($release =~ /centos-release-5/))
---
> ($release =~ /redhat-release-5/))

When you have done this add-on, you must finish what the installer have failed ie. copy the acfs kernel modules at the right place and regenerate
kernel module dependencies.

NB: Be careful of your kernel version, in this example, I was using a 2.6.18-92.1.22.el5 kernel.

[root@linux1 lib]# mkdir /lib/modules/2.6.18-92.1.22.el5/extra/usm
[root@linux1 lib]# cp /u01/app/oracle/product/11.2.0/grid/install/usm/EL5 /i386/2.6.18-8/2.6.18-8.el5-i686/bin/*.ko /lib/modules/2.6.18-92.1.22.el5/extra/usm/
[root@linux1 lib]# chmod 744 /lib/modules/2.6.18-92.1.22.el5/extra/usm/*
[root@linux1 lib]# depmod

Now you can execute the acfsload and load the acfs modules without warning:

[root@linux1 lib]# /u01/app/oracle/product/11.2.0/grid/bin/acfsload start -s

You can check everything is loaded with the acfsdriverstate command:

[root@linux1 lib]# cd /u01/app/oracle/product/11.2.0/grid/bin
[root@linux1 bin]# ./acfsdriverstate -orahome /u01/app/oracle/product/11.2.0/grid loaded
ACFS-9203: TRUE

Note, on the next reboot, you have to load the acfs modules, and mount the acfs filesystem.
You can do this by writing you own init shell script based on the chkconfig format, and load it on the desired runlevel.

Acfs is not compatible on CentOS

[root@linux1 bin]# ./acfsload start -s
ADVM/ACFS is not supported on centos-release-5-2.el5.centos

Yuk 😦

I’m searching for a solution