Data … as usual

All things about data by Laurent Leturgez

Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.

If you already faced RAC problems, you had to analyze log files from the entry points which are the alert.log files.

But as Oracle clusterware logs its events in many logfiles (usually one per process), and you have a logfile directory on each server, it can be difficult to detect the causes, and specially the root cause of a specific problem.

Recently I created a little perl script that will help me for this purpose. This script is named “crs_log_aggregator.pl” and is available for download at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c.

This script works like this. As input, it takes one or n directories to analyze recursively. Those directories are containing all crs log files of all cluster nodes you want to analyze.

For each directory, it parses each log files (except the alert.log) and each history log files (with file extension: l??). Parsing will get the timestamp mentioned in the read line, if there’s no timestamp (because message is written in many lines for example), the perl script will use the preceding timestamp. (If there were no timestamp for example, in the file header, it will flag the timestamp as ‘N/A’). Every analyzed line in the log file, is completed by its origin represented by the relative file path.

As a result, an analyzed line, is represented in the output file by three parts:

  • the timestamp of the message
  • the origin file of the message
  • the message

As a separator, I used a “<SEP>” tag.

So at this step, each line is recorded in a single unsorted file. If a filter expression has been given to the program as input, which is highly recommended, the filter is applied during the parsing process.

Next, the program will sort the output file by timestamp and will produce a single output file which is now filtered (or not) and this file can be analyzed.

Well, let’s see how it works:

For example, let’s consider a two nodes RAC (hostnames are rac1 and rac2).

As alert.log files are not analyzed, you need to analyze them before to see when the problem occurs.

In my case, I want to analyze 2 days: 2013-05-30 and 2013-05-31.

So first step is to copy is a filesystem, or on your desktop, all the logfiles.

I recommend to put in a single directory all the directories related to a single hostname. Here, I have two directories coming from both cluster nodes ($GRID_HOME/log) :

[oracle@rac1 aggregator]$ tree -d .
.
|-- rac1
|   |-- acfslog
|   |-- acfsrepl
|   |-- acfsreplroot
|   |-- acfssec
|   |-- admin
|   |-- agent
|   |   |-- crsd
|   |   |   |-- oraagent_oracle
|   |   |   |-- orarootagent_root
|   |   |   `-- scriptagent_oracle
|   |   `-- ohasd
|   |       |-- oraagent_oracle
|   |       |-- oracssdagent_root
|   |       |-- oracssdmonitor_root
|   |       `-- orarootagent_root
|   |-- client
|   |-- crflogd
|   |-- crfmond
|   |-- crsd
|   |-- cssd
|   |-- ctssd
|   |-- cvu
|   |   |-- cvulog
|   |   `-- cvutrc
|   |-- diskmon
|   |-- evmd
|   |-- gipcd
|   |-- gnsd
|   |-- gpnpd
|   |-- mdnsd
|   |-- ohasd
|   |-- racg
|   |   |-- racgeut
|   |   |-- racgevtf
|   |   `-- racgmain
|   `-- srvm
`-- rac2
    |-- acfslog
    |-- acfsrepl
    |-- acfsreplroot
    |-- acfssec
    |-- admin
    |-- agent
    |   |-- crsd
    |   |   |-- oraagent_oracle
    |   |   |-- orarootagent_root
    |   |   `-- scriptagent_oracle
    |   `-- ohasd
    |       |-- oraagent_oracle
    |       |-- oracssdagent_root
    |       |-- oracssdmonitor_root
    |       `-- orarootagent_root
    |-- client
    |-- crflogd
    |-- crfmond
    |-- crsd
    |-- cssd
    |-- ctssd
    |-- cvu
    |   |-- cvulog
    |   `-- cvutrc
    |-- diskmon
    |-- evmd
    |-- gipcd
    |-- gnsd
    |-- gpnpd
    |-- mdnsd
    |-- ohasd
    |-- racg
    |   |-- racgeut
    |   |-- racgevtf
    |   `-- racgmain
    `-- srvm

74 directories
[oracle@rac1 aggregator]$ ls -l
total 20
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:39 rac1
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:00 rac2
If you want some help about this tool, you can use the -h option:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -h

RAC Logs consolidator
---------------------------------------------

Usage

./crs_log_aggregator.pl -c dir1[,dir2,dir3,...,dirN] -o outputfile [-f filter]

[MANDATORY]: -c dir1,dir2,...,dirN
                 comma separated values of source directories to analyze
[MANDATORY]: -o outputfile
                 unique sorted and filtered output file
[OPTIONAL] : -f filter
                 regex filter to apply
[OPTIONAL] : -h
                 print this help

Examples:
$ ./crs_log_aggregator.pl -c data/ -o outfile
         will sort without any filter all log files (*.log, *.l??) recursively found in the data directory. outfile is produced in the current directory as a result.
$ ./crs_log_aggregator.pl -c data/rac1/cssd,data/rac1/crsd -o outfile
         will sort without any filter only the content of data/rac1/cssd,data/rac1/crsd  directories.
$ ./crs_log_aggregator.pl -c data/ -o outfile_filtered -f '2012-10-26|2012-10-27'
         will sort with a filter (regex) that will filter 2 days 2012-10-26 and 2012-10-27.

        more information on perl regex: http://perldoc.perl.org/perlre.html
Next, I will launch the program with the good options:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -c ./rac1,./rac2 -o output_file -f '2013-05-30|2013-05-31'

---------Alert.log : ./rac1/alertrac1.log excluded from analyze

---------Alert.log : ./rac2/alertrac2.log excluded from analyze

[oracle@rac1 aggregator]$ ls -lh output_file
-rw-r--r-- 1 oracle oinstall 243M Sep  8 15:16 output_file
Please note that, if an alert.log file is found, it will be excluded and mentioned in the standard output.
I highly recommend you to use a filter to reduce the amount of data to sort (It can burn lot of CPU and memory). The -f option is use to do this, and takes a regexp style filter (it ‘s used directly in the perl script, so if you are not aware with regexp used in perl script, you can have a look at this URL: http://perldoc.perl.org/perlre.html).
As a result, you have this kind of messages in the output file:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmRPCBroadcast: rpc(0x2a1002a), status(0), sendcount(0), filtered by specific properties:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmFreeRPCIndex: freeing rpc 673
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmDeleteGrock: (0x7f3f38637990) grock(EVMDMAIN2) deleted
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ COMMCRS][1105488192]clsc_connect: (0x7f23b40531f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_LOGD))
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]clsdmc_send error code: -7
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]Mond is being shutdown by default
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Daemon Select Thread exiting
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Skipping Agent Initiated a check action
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]isRunning is reset to false here
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss1: NTP default config file found
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss8: Return [0] and NTP status [2].
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 3 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 1 event.
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] Returning NETDATA: 1 interfaces
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] # 0 Interface 'eth2',ip='10.10.10.12',mac='08-00-27-92-f1-98',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ora.crf][1089526080] {0:0:613} [stop] (:CLSN00108:) clsn_agent::stop }
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [    AGFW][1089526080] {0:0:613} Command: stop for resource: ora.crf 1 1 completed with status: SUCCESS

Now, you can analyze the output file directly, but I have developed another app (in Java). This app is available at this URL: https://app.box.com/s/xi3h7bpssrqtkp55q43c.
To launch it, you need to use java 1.7.
This app will show in a more beautiful interface the result of the output file. Messages that are coming from the same file are colored with the same color:
[oracle@rac1 aggregator]$ java -jar Aggr_visualisator.jar
 Aggr_visualisator
Update:
  • in v0.2, I implemented a filter function. This filter is active on the “Message” column.
  • I implemented a colorization function for the “Message” column.
    • if the term “error” is found, the cell will be colored in red
    • If the term “warn” is found, the cell will be colored in orange
    • and if the term “fail” is found, the cell wil be colored in yellow
v0.2 is available at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c
two snapshots of this version below 😉
  • Messages colorization:
Aggr_visualisator_0.2_1
  • Filtering messages (based on a regexp)
Aggr_visualisator_0.2_2
If you have any suggestion to improve these tools, send me an email or a comment. I will study them.
I hope this will help you 😉
Advertisement

9 responses to “Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.

  1. suvajit maity September 18, 2013 at 4:04 AM

    this is truly inspirational…

  2. Yoann Mainguy December 20, 2016 at 10:43 AM

    Cet outil est super. Le seul souci, par expérience, est une mauvaise gestion de la mémoire par certains outils. j’utilise MobaXTerm (avec une base Cygwin) et je rencontre souvent l’erreur suivante quand il y a trop de données:
    Out of memory during request for XXXX bytes, total sbrk()

    Basé sur un article de PerlMonks, j’avais modifié la taille du tas Cygwin en exécutant:
    regtool -i set /HKLM/Software/Cygnus\ Solutions/Cygwin/heap_chunk_in_m 8192

    Mais ce n’est pas suffisant. j’ai du aussi ajouter le paquet « rebase » qui fournit l’exécutable « peflags » et lancer:
    peflags –cygwin-heap /bin/perl
    peflags –cygwin-heap=8192 /bin/perl

    Cela permet de contourner la limite initiale.

    Attention toutefois, dans certains cas, cela finit par une erreur de segmentation (je n’ai pu chercher pourquoi pour l’instant, mais je suppose qu’il s’agit à nouveau d’un problème de volumétrie).

  3. Laurent December 20, 2016 at 10:47 AM

    Merci Yoann pour l’info … 🙂

  4. Ben April 12, 2017 at 1:37 PM

    Hi,
    does it also work for RAC 12.1?

    Thanks
    Ben

Leave a Reply to Ben Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: