Data … as usual

All things about data by Laurent Leturgez

Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.

9 Comments Posted by Laurent on September 11, 2013

If you already faced RAC problems, you had to analyze log files from the entry points which are the alert.log files.

But as Oracle clusterware logs its events in many logfiles (usually one per process), and you have a logfile directory on each server, it can be difficult to detect the causes, and specially the root cause of a specific problem.

Recently I created a little perl script that will help me for this purpose. This script is named “crs_log_aggregator.pl” and is available for download at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c.

This script works like this. As input, it takes one or n directories to analyze recursively. Those directories are containing all crs log files of all cluster nodes you want to analyze.

For each directory, it parses each log files (except the alert.log) and each history log files (with file extension: l??). Parsing will get the timestamp mentioned in the read line, if there’s no timestamp (because message is written in many lines for example), the perl script will use the preceding timestamp. (If there were no timestamp for example, in the file header, it will flag the timestamp as ‘N/A’). Every analyzed line in the log file, is completed by its origin represented by the relative file path.

As a result, an analyzed line, is represented in the output file by three parts:

the timestamp of the message
the origin file of the message
the message

As a separator, I used a “<SEP>” tag.

So at this step, each line is recorded in a single unsorted file. If a filter expression has been given to the program as input, which is highly recommended, the filter is applied during the parsing process.

Next, the program will sort the output file by timestamp and will produce a single output file which is now filtered (or not) and this file can be analyzed.

Well, let’s see how it works:

For example, let’s consider a two nodes RAC (hostnames are rac1 and rac2).

As alert.log files are not analyzed, you need to analyze them before to see when the problem occurs.

In my case, I want to analyze 2 days: 2013-05-30 and 2013-05-31.

So first step is to copy is a filesystem, or on your desktop, all the logfiles.

I recommend to put in a single directory all the directories related to a single hostname. Here, I have two directories coming from both cluster nodes ($GRID_HOME/log) :

[oracle@rac1 aggregator]$ tree -d .
.
|-- rac1
|   |-- acfslog
|   |-- acfsrepl
|   |-- acfsreplroot
|   |-- acfssec
|   |-- admin
|   |-- agent
|   |   |-- crsd
|   |   |   |-- oraagent_oracle
|   |   |   |-- orarootagent_root
|   |   |   `-- scriptagent_oracle
|   |   `-- ohasd
|   |       |-- oraagent_oracle
|   |       |-- oracssdagent_root
|   |       |-- oracssdmonitor_root
|   |       `-- orarootagent_root
|   |-- client
|   |-- crflogd
|   |-- crfmond
|   |-- crsd
|   |-- cssd
|   |-- ctssd
|   |-- cvu
|   |   |-- cvulog
|   |   `-- cvutrc
|   |-- diskmon
|   |-- evmd
|   |-- gipcd
|   |-- gnsd
|   |-- gpnpd
|   |-- mdnsd
|   |-- ohasd
|   |-- racg
|   |   |-- racgeut
|   |   |-- racgevtf
|   |   `-- racgmain
|   `-- srvm
`-- rac2
    |-- acfslog
    |-- acfsrepl
    |-- acfsreplroot
    |-- acfssec
    |-- admin
    |-- agent
    |   |-- crsd
    |   |   |-- oraagent_oracle
    |   |   |-- orarootagent_root
    |   |   `-- scriptagent_oracle
    |   `-- ohasd
    |       |-- oraagent_oracle
    |       |-- oracssdagent_root
    |       |-- oracssdmonitor_root
    |       `-- orarootagent_root
    |-- client
    |-- crflogd
    |-- crfmond
    |-- crsd
    |-- cssd
    |-- ctssd
    |-- cvu
    |   |-- cvulog
    |   `-- cvutrc
    |-- diskmon
    |-- evmd
    |-- gipcd
    |-- gnsd
    |-- gpnpd
    |-- mdnsd
    |-- ohasd
    |-- racg
    |   |-- racgeut
    |   |-- racgevtf
    |   `-- racgmain
    `-- srvm

74 directories
[oracle@rac1 aggregator]$ ls -l
total 20
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:39 rac1
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:00 rac2

If you want some help about this tool, you can use the -h option:

[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -h

RAC Logs consolidator
---------------------------------------------

Usage

./crs_log_aggregator.pl -c dir1[,dir2,dir3,...,dirN] -o outputfile [-f filter]

[MANDATORY]: -c dir1,dir2,...,dirN
                 comma separated values of source directories to analyze
[MANDATORY]: -o outputfile
                 unique sorted and filtered output file
[OPTIONAL] : -f filter
                 regex filter to apply
[OPTIONAL] : -h
                 print this help

Examples:
$ ./crs_log_aggregator.pl -c data/ -o outfile
         will sort without any filter all log files (*.log, *.l??) recursively found in the data directory. outfile is produced in the current directory as a result.
$ ./crs_log_aggregator.pl -c data/rac1/cssd,data/rac1/crsd -o outfile
         will sort without any filter only the content of data/rac1/cssd,data/rac1/crsd  directories.
$ ./crs_log_aggregator.pl -c data/ -o outfile_filtered -f '2012-10-26|2012-10-27'
         will sort with a filter (regex) that will filter 2 days 2012-10-26 and 2012-10-27.

        more information on perl regex: http://perldoc.perl.org/perlre.html

Next, I will launch the program with the good options:

[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -c ./rac1,./rac2 -o output_file -f '2013-05-30|2013-05-31'

---------Alert.log : ./rac1/alertrac1.log excluded from analyze

---------Alert.log : ./rac2/alertrac2.log excluded from analyze

[oracle@rac1 aggregator]$ ls -lh output_file
-rw-r--r-- 1 oracle oinstall 243M Sep  8 15:16 output_file

Please note that, if an alert.log file is found, it will be excluded and mentioned in the standard output.

I highly recommend you to use a filter to reduce the amount of data to sort (It can burn lot of CPU and memory). The -f option is use to do this, and takes a regexp style filter (it ‘s used directly in the perl script, so if you are not aware with regexp used in perl script, you can have a look at this URL: http://perldoc.perl.org/perlre.html).

As a result, you have this kind of messages in the output file:

2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmRPCBroadcast: rpc(0x2a1002a), status(0), sendcount(0), filtered by specific properties:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmFreeRPCIndex: freeing rpc 673
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmDeleteGrock: (0x7f3f38637990) grock(EVMDMAIN2) deleted
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ COMMCRS][1105488192]clsc_connect: (0x7f23b40531f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_LOGD))
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]clsdmc_send error code: -7
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]Mond is being shutdown by default
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Daemon Select Thread exiting
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Skipping Agent Initiated a check action
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]isRunning is reset to false here
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss1: NTP default config file found
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss8: Return [0] and NTP status [2].
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 3 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 1 event.
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] Returning NETDATA: 1 interfaces
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] # 0 Interface 'eth2',ip='10.10.10.12',mac='08-00-27-92-f1-98',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ora.crf][1089526080] {0:0:613} [stop] (:CLSN00108:) clsn_agent::stop }
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [    AGFW][1089526080] {0:0:613} Command: stop for resource: ora.crf 1 1 completed with status: SUCCESS

Now, you can analyze the output file directly, but I have developed another app (in Java). This app is available at this URL: https://app.box.com/s/xi3h7bpssrqtkp55q43c.

To launch it, you need to use java 1.7.

This app will show in a more beautiful interface the result of the output file. Messages that are coming from the same file are colored with the same color:

[oracle@rac1 aggregator]$ java -jar Aggr_visualisator.jar

Update:

in v0.2, I implemented a filter function. This filter is active on the “Message” column.
I implemented a colorization function for the “Message” column.
- if the term “error” is found, the cell will be colored in red
- If the term “warn” is found, the cell will be colored in orange
- and if the term “fail” is found, the cell wil be colored in yellow

v0.2 is available at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c

two snapshots of this version below 😉

Messages colorization:

Filtering messages (based on a regexp)

If you have any suggestion to improve these tools, send me an email or a comment. I will study them.

I hope this will help you 😉

11g, 12c, Development, java, perl, RAC, tools

← Oracle 12c: queryable Opatch How Oracle 11.2 clusterware starts, find voting file and ASM SPfile →

9 responses to “Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.”

Raheel Syed September 17, 2013 at 11:18 AM
Awesome !! Reblogged

Reply
Raheel Syed September 17, 2013 at 11:21 AM
Reblogged this on Raheel's Blog.

Reply
suvajit maity September 18, 2013 at 4:04 AM
this is truly inspirational…

Reply
orachrome (@orachrome) September 22, 2013 at 5:54 PM
Very useful tool !!

Reply
Yoann Mainguy December 20, 2016 at 10:43 AM
Cet outil est super. Le seul souci, par expérience, est une mauvaise gestion de la mémoire par certains outils. j’utilise MobaXTerm (avec une base Cygwin) et je rencontre souvent l’erreur suivante quand il y a trop de données:
Out of memory during request for XXXX bytes, total sbrk()

Basé sur un article de PerlMonks, j’avais modifié la taille du tas Cygwin en exécutant:
regtool -i set /HKLM/Software/Cygnus\ Solutions/Cygwin/heap_chunk_in_m 8192

Mais ce n’est pas suffisant. j’ai du aussi ajouter le paquet « rebase » qui fournit l’exécutable « peflags » et lancer:
peflags –cygwin-heap /bin/perl
peflags –cygwin-heap=8192 /bin/perl

Cela permet de contourner la limite initiale.

Attention toutefois, dans certains cas, cela finit par une erreur de segmentation (je n’ai pu chercher pourquoi pour l’instant, mais je suppose qu’il s’agit à nouveau d’un problème de volumétrie).

Reply
- Laurent April 12, 2017 at 1:40 PM
  J’avais également eu ces problèmes avec cygwin mais pas avec le perl embarqué dans le linux.
  
  Reply
Laurent December 20, 2016 at 10:47 AM
Merci Yoann pour l’info … 🙂

Reply
Ben April 12, 2017 at 1:37 PM
Hi,
does it also work for RAC 12.1?

Thanks
Ben

Reply
- Laurent April 12, 2017 at 1:39 PM
  Didn’t test. But it should work if you use log files now located on ADR for crs.
  Feel free to test and give your feedback.
  
  Reply