Data … as usual

All things about data by Laurent Leturgez

Category Archives: perl

Oracle 12c, VMWare Fusion and the perl binary’s segmentation fault

Recently I have installed a new Virtual Machine in VM Ware Fusion 7 Pro under Oracle Enterprise Linux 6.6 for running Oracle 12c Grid Infrastructure and database (12.1.0.2). Mac OS X is a 10.9.5 (Mavericks). As usual for my tests environment, I deploy the oracle-rdbms-server-12cR1-preinstall rpm package to be sure everything will be correctly configured on my system. During the installation process, there was a first error during link edition of rman (“error invoking target irman ioracle of makefile /u01/app/oracle/product/12.1.0/grid/rdbms/lib/ins_rdbms.mk”). This problem can be fixed by copying the libjavavm12.a library located in $ORACLE_HOME/javavm/jdk/jdk7/lib to $ORACLE_HOME/lib directory, and then retry the link edition by clicking on “Retry” button. So, if you don’t use Grid Infrastructure here or if you install only the database kernel … you won’t have any problem. Maybe, you will encountered the problems detailed above if you create a mutitenant (or single tenant) database (because it needs to run catcon.pl script … but we will see why later). In my case, I was deploying Grid Infrastructure and at the end of the installation process, you have to run root.sh script to configure the CRS stack. At this step, the root.sh script fails with an error on  roothas.pl script and a cute “Segmentation fault (core dumped)” error:


Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
/u01/app/oracle/product/12.1.0/grid/crs/config/rootconfig.sh: line 131: 4562 Segmentation fault (core dumped) $ROOTSCRIPT $ROOTSCRIPT_ARGS
The command '/u01/app/oracle/product/12.1.0/grid/perl/bin/perl -I/u01/app/oracle/product/12.1.0/grid/perl/lib -I/u01/app/oracle/product/12.1.0/grid/crs/install /u01/app/oracle/product/12.1.0/grid/crs/install/roothas.pl ' execution failed

After analyzing the error, I get the cause of my problem which is located on the perl binary delivered with Oracle. If I run a simple “perl -v”, I got the error.


[oracle@oel6 bin]$ pwd
/u01/app/oracle/product/12.1.0/grid/perl/bin
[oracle@oel6 bin]$ ./perl -v
Segmentation fault (core dumped)

If we go deeper, gdb shows us an error located on PerlIO functions and specifically on PerlIO_default_layers function:


[oracle@oel6 bin]$ gdb perl
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /u01/app/oracle/product/12.1.0/grid/perl/bin/perl...(no debugging symbols found)...done.
(gdb) r
Starting program: /u01/app/oracle/product/12.1.0/grid/perl/bin/perl
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x0000000000531399 in PerlIO_default_layers ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.7.x86_64 libgcc-4.4.7-11.el6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64
(gdb) bt
#0 0x0000000000531399 in PerlIO_default_layers ()
#1 0x000000000053070e in PerlIO_resolve_layers ()
#2 0x000000000053050f in PerlIO_openn ()
#3 0x0000000000530206 in PerlIO_fdopen ()
#4 0x0000000000530198 in PerlIO_stdstreams ()
#5 0x0000000000530150 in Perl_PerlIO_stdin ()
#6 0x0000000000435f09 in S_parse_body ()
#7 0x00000000004343bb in perl_parse ()
#8 0x000000000041cf13 in main ()

IMHO, it’s a specific problem on WMWare fusion because on a VM hosted on Virtualbox there’s no problem, and more :

  • If I copy the working perl binary from the virtualbox VM to the VMWare VM … execution failed on VMWare
  • If I copy the failing perl binary from the VMWare VM to the virtualbox VM … it works fine

At the time I write this post, I have asked to a friend of mine if he can reproduced the problem on a VM hosted on an ESX server, I will update this post as soon as I will get his results. I made a lot of different tests (Downgrading VMWare fusion to Fusion 6 Pro, Installing OEL 6.4, OEL 7.0, Redhat 6.5 and 7, creating my vmdk’s on another disk, changing scsi driver etc … always the same problem). I read on many blogs, people have faced the same problem and have symlinked the failing perl to the system perl binary which runs fine. Doing this can produce different issues because of the INC directory which is not the same, you can encountered some errors due to version compatibility between modules in $OH/perl/lib directory and the perl binary. Indeed Oracle 12cR1 uses a perl v5.14.1 and system perl binary is 5.10 on OEL/RHEL 6, and 5.16 on OEL/RHEL7. I workarounded the problem by recreating the perl binary from sources, but you need to be careful of the $OH/perl/lib directory, because it contains all the perl modules needed by Oracle. See above, the different steps to compile perl binary and replace it in $ORACLE_HOME:


export ORACLE_HOME=/u01/app/oracle/product/12.1.0/grid
cd ~
rm -rf perl
mkdir perl
cd perl/
curl -O http://www.cpan.org/src/5.0/perl-5.14.1.tar.gz
tar -xvzf perl-5.14.1.tar.gz

cd $ORACLE_HOME
mv perl/ perl.OLD
mkdir perl
cd /home/oracle/perl/perl-5.14.1
./Configure -des -Dprefix=$ORACLE_HOME/perl -Doptimize=-O3 -Dusethreads -Duseithreads -Duserelocatableinc ; make clean ; make ; make install
cd $ORACLE_HOME/perl
rm -rf lib/ man/
cp -r ../perl.OLD/lib/ .
cp -r ../perl.OLD/man/ .
cp ../perl.OLD/bin/dbilogstrip bin/
cp ../perl.OLD/bin/dbiprof bin/
cp ../perl.OLD/bin/dbiproxy bin/
cp ../perl.OLD/bin/ora_explain bin/

Once these steps are done and you have a running perl binary, you can safely run your root.sh script, create multitenant databases, and all the stuff that needs $ORACLE_HOME/bin/perl.

UPDATE: I have just downloaded the last release of Oracle Virtualbox (5.0) on my Macbook and the same problems occurs. They can be resolved by using the same method described in this post.

UPDATE2 : I would like to thank Deiby Gomez for his trick. Indeed, in my original post I forgot to keep some perl binaries that could be useful 😉 : dbilogstrip, dbiprof, dbiproxy and ora_explain. The script above has been modified.

UPDATE3:  Danny Bryant from Enkitec has worked closely with Oracle VM and virtual box team and they discovered the origin of the bug. This one should be fixed in a next release. But, if you use Virtualbox 5.0, Danny published a workaround, you can read it following this link: http://dbaontap.com/2016/01/13/vbox5/

 

Advertisement

Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.

If you already faced RAC problems, you had to analyze log files from the entry points which are the alert.log files.

But as Oracle clusterware logs its events in many logfiles (usually one per process), and you have a logfile directory on each server, it can be difficult to detect the causes, and specially the root cause of a specific problem.

Recently I created a little perl script that will help me for this purpose. This script is named “crs_log_aggregator.pl” and is available for download at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c.

This script works like this. As input, it takes one or n directories to analyze recursively. Those directories are containing all crs log files of all cluster nodes you want to analyze.

For each directory, it parses each log files (except the alert.log) and each history log files (with file extension: l??). Parsing will get the timestamp mentioned in the read line, if there’s no timestamp (because message is written in many lines for example), the perl script will use the preceding timestamp. (If there were no timestamp for example, in the file header, it will flag the timestamp as ‘N/A’). Every analyzed line in the log file, is completed by its origin represented by the relative file path.

As a result, an analyzed line, is represented in the output file by three parts:

  • the timestamp of the message
  • the origin file of the message
  • the message

As a separator, I used a “<SEP>” tag.

So at this step, each line is recorded in a single unsorted file. If a filter expression has been given to the program as input, which is highly recommended, the filter is applied during the parsing process.

Next, the program will sort the output file by timestamp and will produce a single output file which is now filtered (or not) and this file can be analyzed.

Well, let’s see how it works:

For example, let’s consider a two nodes RAC (hostnames are rac1 and rac2).

As alert.log files are not analyzed, you need to analyze them before to see when the problem occurs.

In my case, I want to analyze 2 days: 2013-05-30 and 2013-05-31.

So first step is to copy is a filesystem, or on your desktop, all the logfiles.

I recommend to put in a single directory all the directories related to a single hostname. Here, I have two directories coming from both cluster nodes ($GRID_HOME/log) :

[oracle@rac1 aggregator]$ tree -d .
.
|-- rac1
|   |-- acfslog
|   |-- acfsrepl
|   |-- acfsreplroot
|   |-- acfssec
|   |-- admin
|   |-- agent
|   |   |-- crsd
|   |   |   |-- oraagent_oracle
|   |   |   |-- orarootagent_root
|   |   |   `-- scriptagent_oracle
|   |   `-- ohasd
|   |       |-- oraagent_oracle
|   |       |-- oracssdagent_root
|   |       |-- oracssdmonitor_root
|   |       `-- orarootagent_root
|   |-- client
|   |-- crflogd
|   |-- crfmond
|   |-- crsd
|   |-- cssd
|   |-- ctssd
|   |-- cvu
|   |   |-- cvulog
|   |   `-- cvutrc
|   |-- diskmon
|   |-- evmd
|   |-- gipcd
|   |-- gnsd
|   |-- gpnpd
|   |-- mdnsd
|   |-- ohasd
|   |-- racg
|   |   |-- racgeut
|   |   |-- racgevtf
|   |   `-- racgmain
|   `-- srvm
`-- rac2
    |-- acfslog
    |-- acfsrepl
    |-- acfsreplroot
    |-- acfssec
    |-- admin
    |-- agent
    |   |-- crsd
    |   |   |-- oraagent_oracle
    |   |   |-- orarootagent_root
    |   |   `-- scriptagent_oracle
    |   `-- ohasd
    |       |-- oraagent_oracle
    |       |-- oracssdagent_root
    |       |-- oracssdmonitor_root
    |       `-- orarootagent_root
    |-- client
    |-- crflogd
    |-- crfmond
    |-- crsd
    |-- cssd
    |-- ctssd
    |-- cvu
    |   |-- cvulog
    |   `-- cvutrc
    |-- diskmon
    |-- evmd
    |-- gipcd
    |-- gnsd
    |-- gpnpd
    |-- mdnsd
    |-- ohasd
    |-- racg
    |   |-- racgeut
    |   |-- racgevtf
    |   `-- racgmain
    `-- srvm

74 directories
[oracle@rac1 aggregator]$ ls -l
total 20
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:39 rac1
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:00 rac2
If you want some help about this tool, you can use the -h option:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -h

RAC Logs consolidator
---------------------------------------------

Usage

./crs_log_aggregator.pl -c dir1[,dir2,dir3,...,dirN] -o outputfile [-f filter]

[MANDATORY]: -c dir1,dir2,...,dirN
                 comma separated values of source directories to analyze
[MANDATORY]: -o outputfile
                 unique sorted and filtered output file
[OPTIONAL] : -f filter
                 regex filter to apply
[OPTIONAL] : -h
                 print this help

Examples:
$ ./crs_log_aggregator.pl -c data/ -o outfile
         will sort without any filter all log files (*.log, *.l??) recursively found in the data directory. outfile is produced in the current directory as a result.
$ ./crs_log_aggregator.pl -c data/rac1/cssd,data/rac1/crsd -o outfile
         will sort without any filter only the content of data/rac1/cssd,data/rac1/crsd  directories.
$ ./crs_log_aggregator.pl -c data/ -o outfile_filtered -f '2012-10-26|2012-10-27'
         will sort with a filter (regex) that will filter 2 days 2012-10-26 and 2012-10-27.

        more information on perl regex: http://perldoc.perl.org/perlre.html
Next, I will launch the program with the good options:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -c ./rac1,./rac2 -o output_file -f '2013-05-30|2013-05-31'

---------Alert.log : ./rac1/alertrac1.log excluded from analyze

---------Alert.log : ./rac2/alertrac2.log excluded from analyze

[oracle@rac1 aggregator]$ ls -lh output_file
-rw-r--r-- 1 oracle oinstall 243M Sep  8 15:16 output_file
Please note that, if an alert.log file is found, it will be excluded and mentioned in the standard output.
I highly recommend you to use a filter to reduce the amount of data to sort (It can burn lot of CPU and memory). The -f option is use to do this, and takes a regexp style filter (it ‘s used directly in the perl script, so if you are not aware with regexp used in perl script, you can have a look at this URL: http://perldoc.perl.org/perlre.html).
As a result, you have this kind of messages in the output file:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmRPCBroadcast: rpc(0x2a1002a), status(0), sendcount(0), filtered by specific properties:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmFreeRPCIndex: freeing rpc 673
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmDeleteGrock: (0x7f3f38637990) grock(EVMDMAIN2) deleted
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ COMMCRS][1105488192]clsc_connect: (0x7f23b40531f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_LOGD))
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]clsdmc_send error code: -7
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]Mond is being shutdown by default
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Daemon Select Thread exiting
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Skipping Agent Initiated a check action
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]isRunning is reset to false here
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss1: NTP default config file found
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss8: Return [0] and NTP status [2].
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 3 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 1 event.
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] Returning NETDATA: 1 interfaces
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] # 0 Interface 'eth2',ip='10.10.10.12',mac='08-00-27-92-f1-98',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ora.crf][1089526080] {0:0:613} [stop] (:CLSN00108:) clsn_agent::stop }
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [    AGFW][1089526080] {0:0:613} Command: stop for resource: ora.crf 1 1 completed with status: SUCCESS

Now, you can analyze the output file directly, but I have developed another app (in Java). This app is available at this URL: https://app.box.com/s/xi3h7bpssrqtkp55q43c.
To launch it, you need to use java 1.7.
This app will show in a more beautiful interface the result of the output file. Messages that are coming from the same file are colored with the same color:
[oracle@rac1 aggregator]$ java -jar Aggr_visualisator.jar
 Aggr_visualisator
Update:
  • in v0.2, I implemented a filter function. This filter is active on the “Message” column.
  • I implemented a colorization function for the “Message” column.
    • if the term “error” is found, the cell will be colored in red
    • If the term “warn” is found, the cell will be colored in orange
    • and if the term “fail” is found, the cell wil be colored in yellow
v0.2 is available at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c
two snapshots of this version below 😉
  • Messages colorization:
Aggr_visualisator_0.2_1
  • Filtering messages (based on a regexp)
Aggr_visualisator_0.2_2
If you have any suggestion to improve these tools, send me an email or a comment. I will study them.
I hope this will help you 😉