Data … as usual

All things about data by Laurent Leturgez

Category Archives: 11g

Oracle 11g, frequency histograms, and character columns …

Recently I worked on a performance issue on two Oracle 11g instances (11.2.0.3 and 11.2.0.4) and I hit a bug concerning histograms and character columns. Oracle 11g is now about 7 years old but it’s a version that is still used in lots of systems .. that’s why I decided to write this blog post.

Well, in Oracle 11g, you have two kind of histograms :

  • Frequency histograms: they are calculated when the number of distinct values (NDV) in a column is lower than the number of buckets in the histogram. Frequency histograms are the most accurate histogram, because they count the exact number of occurrences for each value.
  • Height balanced histograms: they are calculated when the NDV is greater than the number of buckets in the histogram. Height balanced histograms are less accurate but they give some precious information to the optimizer on data distribution.

Histograms help the cost optimizer to take the good decision when the CBO have to choose for an access path or a join. (You can find very interesting papers on this topic by Jonathan Lewis or Mohamed Houri)

The problem I met was related to frequency histograms on character columns (CHAR, VARCHAR2 and NVARCHAR2).

In the dictionary, there are two views that give interesting information about how data is distributed:

  • DBA_TAB_COL_STATISTICS: this view gives information about column statistics (density, NDV, sample_size etc.) and if there’s an histogram on the column.
  • DBA_HISTOGRAMS: this view gives information on column histograms and how data values are distributed

Let’s took an example:

LAURENT @ orcl >; create table foo (id number);

Table created.

LAURENT @ orcl > begin
 2 for i in 1..10
 3 loop
 4 insert into foo select i from dual connect by level <=i; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed. LAURENT @ orcl > commit;

Commit complete.

LAURENT @ orcl > select id,count(*) from foo group by id order by 1;

	ID   COUNT(*)
---------- ----------
	 1	    1
	 2	    2
	 3	    3
	 4	    4
	 5	    5
	 6	    6
	 7	    7
	 8	    8
	 9	    9
	10	   10

10 rows selected.

Now, we gather statistics with a frequency histogram (that’s why I chose 11 buckets):

LAURENT @ orcl > exec dbms_stats.gather_table_stats(ownname=>USER, tabname=>'FOO', method_opt=>'FOR COLUMNS ID SIZE 11', estimate_percent=>100);

PL/SQL procedure successfully completed.

LAURENT @ orcl > select table_name,column_name,num_distinct,density,num_nulls,sample_size,histogram
 2 from user_tab_col_statistics
 3 where table_name='FOO';

TABLE_NAME COLUMN_NAM NUM_DISTINCT    DENSITY  NUM_NULLS SAMPLE_SIZE HISTOGRAM
---------- ---------- ------------ ---------- ---------- ----------- ---------------
FOO        ID                   10 .009090909          0          55 FREQUENCY

Now let’s have a look to the DBA_HISTOGRAMS view (or USER or ALL depending on the context)


LAURENT @ orcl > select table_name,column_name,
  2  endpoint_number,endpoint_value,
  3  endpoint_actual_value
  4  from user_histograms
  5  where table_name='FOO';

TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_A
---------- ---------- --------------- -------------- ----------
FOO        ID                       1              1
FOO        ID                       3              2
FOO        ID                       6              3
FOO        ID                      10              4
FOO        ID                      15              5
FOO        ID                      21              6
FOO        ID                      28              7
FOO        ID                      36              8
FOO        ID                      45              9
FOO        ID                      55             10

The ENDPOINT_VALUE represents the column value, for example, and because it’s a frequency histogram, if we’d like to know how many values of 7 there is in the table, we have to take the endpoint_number for this value: 28 and to substract the endpoint_number of the preceding endpoint-value (6) … So there are 28-21=7 lines with the value 7.

Now, you’ve understood this basic … let’s take another example:


LAURENT @ orcl > begin
  2  for i in 1..10000
  3  loop
  4  case mod(i,10)
  5    when 0 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'01'));
  6    when 1 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'02'));
  7    when 2 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'03'));
  8    when 3 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'04'));
  9    when 4 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'05'));
 10    when 5 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'06'));
 11    when 6 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'07'));
 12    when 7 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'08'));
 13    when 8 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'09'));
 14    when 9 then insert into t values(mod(i,10), concat(lpad('X',30,'X'),'10'));
 15  end case;
 16  end loop;
 17  commit;
 18  end;
 19  /

PL/SQL procedure successfully completed.

LAURENT @ orcl > insert into t select 10,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' from dual connect by level <=100000; 

100000 rows created. 

LAURENT @ orcl > commit;

Commit complete.

LAURENT @ orcl > exec dbms_stats.gather_table_stats(ownname=>USER, tabname=>'T', method_opt=>'for all columns size 12', estimate_percent=>100);

PL/SQL procedure successfully completed.

LAURENT @ orcl > select table_name,column_name,density,histogram,sample_size from user_tab_col_statistics where table_name='T';

TABLE_NAME COLUMN_NAM    DENSITY HISTOGRAM       SAMPLE_SIZE
---------- ---------- ---------- --------------- -----------
T          ID         4.5455E-06 FREQUENCY            110000
T          V          4.5455E-06 FREQUENCY            110000

And if we have a look to the data distribution of the column V, no problem …


LAURENT @ orcl > select table_name,column_name,
  2  endpoint_number,endpoint_value,
  3  endpoint_actual_value
  4  from user_histograms
  5  where table_name='T' and column_name='V'
  6  order by ENDPOINT_ACTUAL_VALUE;

TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
---------- ---------- --------------- -------------- --------------------------------
T          V                     1000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX01
T          V                     2000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX02
T          V                     3000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX03
T          V                     4000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX04
T          V                     5000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX05
T          V                     6000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX06
T          V                     7000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX07
T          V                     8000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX08
T          V                     9000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX09
T          V                    10000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX10
T          V                   110000     4.5871E+35 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

11 rows selected.

The problem comes when the size of the character column exceeds 32 bytes:


LAURENT @ orcl > drop table t purge;

Table dropped.

LAURENT @ orcl > create table t (id number, v varchar2(64));

Table created.

LAURENT @ orcl > begin
for i in 1..10000
loop
case mod(i,10)
  when 0 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'01'));
  when 1 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'02'));
  when 2 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'03'));
  when 3 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'04'));
  when 4 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'05'));
  when 5 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'06'));
  when 6 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'07'));
  when 7 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'08'));
  when 8 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'09'));
  when 9 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'10'));
end case;
end loop;
commit;
end;
/

LAURENT @ orcl > insert into t select 10,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' from dual connect by level <=100000; 100000 rows created. LAURENT @ orcl > commit;

Commit complete.

LAURENT @ orcl > exec dbms_stats.gather_table_stats(ownname=>USER, tabname=>'T', method_opt=>'for all columns size 12', estimate_percent=>100);

PL/SQL procedure successfully completed.

LAURENT @ orcl > select table_name,column_name,density,histogram,sample_size from user_tab_col_statistics where table_name='T';

TABLE_NAME COLUMN_NAM    DENSITY HISTOGRAM       SAMPLE_SIZE
---------- ---------- ---------- --------------- -----------
T          ID         4.5455E-06 FREQUENCY            110000
T          V          4.5455E-06 FREQUENCY            110000

But … if we have a look to the DBA_HISTOGRAMS view …


LAURENT @ orcl > select table_name,column_name,
  2  endpoint_number,endpoint_value,
  3  endpoint_actual_value
  4  from user_histograms
  5  where table_name='T'
  6  order by COLUMN_NAME,ENDPOINT_ACTUAL_VALUE;

TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
---------- ---------- --------------- -------------- --------------------------------
T          ID                    1000              0
T          ID                    2000              1
T          ID                    3000              2
T          ID                    4000              3
T          ID                    5000              4
T          ID                    6000              5
T          ID                    7000              6
T          ID                    8000              7
T          ID                    9000              8
T          ID                   10000              9
T          ID                  110000             10
T          V                   110000     4.5871E+35

LAURENT @ orcl > select count(*) from t where v=lpad('X',32,'X')||'05';

  COUNT(*)
----------
      1000

LAURENT @ orcl > select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  aass8rg64h38b, child number 0
-------------------------------------
select count(*) from t where v=lpad('X',32,'X')||'05'

Plan hash value: 2966233522

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      2 |        |      2 |00:00:00.01 |    1416 |
|   1 |  SORT AGGREGATE    |      |      2 |      1 |      2 |00:00:00.01 |    1416 |
|*  2 |   TABLE ACCESS FULL| T    |      2 |   110K |   2000 |00:00:00.01 |    1416 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("V"='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX05')

19 rows selected.

LAURENT @ orcl > select count(*) from t where v='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';

  COUNT(*)
----------
    100000

LAURENT @ orcl > select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  adnfzsc6buqa3, child number 0
-------------------------------------
select count(*) from t where v='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

Plan hash value: 2966233522

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     708 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     708 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   110K |   100K |00:00:00.01 |     708 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("V"='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')

19 rows selected.

So, with this kind of difference between E-rows and A-rows, you can easily face performance problems, specially if this column has been indexed. So we have a frequency histogram and it doesn’t help us to determine the cardinality in the character column…

This behaviour has been reported in the Bug 18377553 : POOR CARDINALITY ESTIMATE WITH HISTOGRAMS AND VALUES > 32 BYTES.

In my case, the problem occurred with NVARCHAR2 columns (national character set was AL16UTF16), so when the column was containing more that 16 characters, performance problems occurred.

Hopefully, Oracle has published a one-off patch for this bug (available for 11.2.0.3 and 11.2.0.4 on linux x86-64), or as a workaround, you can remove histogram on these kind of columns (with the method_opt: FOR COLUMNS … SIZE 1)

In 12.1, the problem occurs only when the size of the character column exceeds 64 bits (32 characters under AL16UTF16 character set)


SQL> select * from v$version;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0
PL/SQL Release 12.1.0.2.0 - Production                                                    0
CORE	12.1.0.2.0	Production                                                        0
TNS for Linux: Version 12.1.0.2.0 - Production                                            0
NLSRTL Version 12.1.0.2.0 - Production                                                    0

SQL> create table t (id number, v nvarchar2(64));

Table created.

SQL> begin
for i in 1..10000
loop
case mod(i,10)
  when 0 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'01'));
  when 1 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'02'));
  when 2 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'03'));
  when 3 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'04'));
  when 4 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'05'));
  when 5 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'06'));
  when 6 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'07'));
  when 7 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'08'));
  when 8 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'09'));
  when 9 then insert into t values(mod(i,10), concat(lpad('X',32,'X'),'10'));
end case;
end loop;
commit;
end;
/

PL/SQL procedure successfully completed.

SQL> insert into t select 10,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' from dual connect by level <=100000; 100000 rows created. SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>USER, tabname=>'T', method_opt=>'for all columns size 12', estimate_percent=>100);

PL/SQL procedure successfully completed.

SQL> select table_name,column_name,density,histogram,sample_size from user_tab_col_statistics where table_name='T';

TABLE_NAME COLUMN_NAM    DENSITY HISTOGRAM       SAMPLE_SIZE
---------- ---------- ---------- --------------- -----------
T          ID         4.5455E-06 FREQUENCY            110000
T          V          4.5455E-06 FREQUENCY            110000

SQL> select table_name,column_name,
  2  endpoint_number,endpoint_value,
  3  endpoint_actual_value
  4  from user_histograms
  5  where table_name='T'
  6  order by COLUMN_NAME,ENDPOINT_NUMBER;

TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
---------- ---------- --------------- -------------- ----------------------------------
T          ID                    1000              0 0
T          ID                    2000              1 1
T          ID                    3000              2 2
T          ID                    4000              3 3
T          ID                    5000              4 4
T          ID                    6000              5 5
T          ID                    7000              6 6
T          ID                    8000              7 7
T          ID                    9000              8 8
T          ID                   10000              9 9
T          ID                  110000             10 10
T          V                   110000     1.7849E+33  X X X X X X X X X X X X X X X X X
                                                      X X X X X X X X X X X X X X X

12 rows selected.

Problem with 12.1 databases, the one-off patch has not been released yet and It will be fixed in 12.2.

Advertisements

How Oracle 11.2 clusterware starts, find voting file and ASM SPfile

On a recent discussion on linkedin RACSIG page, Jim Williams explained how ASM SPFILE and Voting files are discovered during OHASD startup phase.

With one of my colleague (Pierre Labrousse), we have built a little video which explain this.

You can see this video here :

Tools to aggregate Oracle CRS logfiles in one file, filter it, sort it, and display it.

If you already faced RAC problems, you had to analyze log files from the entry points which are the alert.log files.

But as Oracle clusterware logs its events in many logfiles (usually one per process), and you have a logfile directory on each server, it can be difficult to detect the causes, and specially the root cause of a specific problem.

Recently I created a little perl script that will help me for this purpose. This script is named “crs_log_aggregator.pl” and is available for download at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c.

This script works like this. As input, it takes one or n directories to analyze recursively. Those directories are containing all crs log files of all cluster nodes you want to analyze.

For each directory, it parses each log files (except the alert.log) and each history log files (with file extension: l??). Parsing will get the timestamp mentioned in the read line, if there’s no timestamp (because message is written in many lines for example), the perl script will use the preceding timestamp. (If there were no timestamp for example, in the file header, it will flag the timestamp as ‘N/A’). Every analyzed line in the log file, is completed by its origin represented by the relative file path.

As a result, an analyzed line, is represented in the output file by three parts:

  • the timestamp of the message
  • the origin file of the message
  • the message

As a separator, I used a “<SEP>” tag.

So at this step, each line is recorded in a single unsorted file. If a filter expression has been given to the program as input, which is highly recommended, the filter is applied during the parsing process.

Next, the program will sort the output file by timestamp and will produce a single output file which is now filtered (or not) and this file can be analyzed.

Well, let’s see how it works:

For example, let’s consider a two nodes RAC (hostnames are rac1 and rac2).

As alert.log files are not analyzed, you need to analyze them before to see when the problem occurs.

In my case, I want to analyze 2 days: 2013-05-30 and 2013-05-31.

So first step is to copy is a filesystem, or on your desktop, all the logfiles.

I recommend to put in a single directory all the directories related to a single hostname. Here, I have two directories coming from both cluster nodes ($GRID_HOME/log) :

[oracle@rac1 aggregator]$ tree -d .
.
|-- rac1
|   |-- acfslog
|   |-- acfsrepl
|   |-- acfsreplroot
|   |-- acfssec
|   |-- admin
|   |-- agent
|   |   |-- crsd
|   |   |   |-- oraagent_oracle
|   |   |   |-- orarootagent_root
|   |   |   `-- scriptagent_oracle
|   |   `-- ohasd
|   |       |-- oraagent_oracle
|   |       |-- oracssdagent_root
|   |       |-- oracssdmonitor_root
|   |       `-- orarootagent_root
|   |-- client
|   |-- crflogd
|   |-- crfmond
|   |-- crsd
|   |-- cssd
|   |-- ctssd
|   |-- cvu
|   |   |-- cvulog
|   |   `-- cvutrc
|   |-- diskmon
|   |-- evmd
|   |-- gipcd
|   |-- gnsd
|   |-- gpnpd
|   |-- mdnsd
|   |-- ohasd
|   |-- racg
|   |   |-- racgeut
|   |   |-- racgevtf
|   |   `-- racgmain
|   `-- srvm
`-- rac2
    |-- acfslog
    |-- acfsrepl
    |-- acfsreplroot
    |-- acfssec
    |-- admin
    |-- agent
    |   |-- crsd
    |   |   |-- oraagent_oracle
    |   |   |-- orarootagent_root
    |   |   `-- scriptagent_oracle
    |   `-- ohasd
    |       |-- oraagent_oracle
    |       |-- oracssdagent_root
    |       |-- oracssdmonitor_root
    |       `-- orarootagent_root
    |-- client
    |-- crflogd
    |-- crfmond
    |-- crsd
    |-- cssd
    |-- ctssd
    |-- cvu
    |   |-- cvulog
    |   `-- cvutrc
    |-- diskmon
    |-- evmd
    |-- gipcd
    |-- gnsd
    |-- gpnpd
    |-- mdnsd
    |-- ohasd
    |-- racg
    |   |-- racgeut
    |   |-- racgevtf
    |   `-- racgmain
    `-- srvm

74 directories
[oracle@rac1 aggregator]$ ls -l
total 20
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:39 rac1
drwxr-xr-x 24 oracle oinstall 4096 Sep 8 14:00 rac2
If you want some help about this tool, you can use the -h option:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -h

RAC Logs consolidator
---------------------------------------------

Usage

./crs_log_aggregator.pl -c dir1[,dir2,dir3,...,dirN] -o outputfile [-f filter]

[MANDATORY]: -c dir1,dir2,...,dirN
                 comma separated values of source directories to analyze
[MANDATORY]: -o outputfile
                 unique sorted and filtered output file
[OPTIONAL] : -f filter
                 regex filter to apply
[OPTIONAL] : -h
                 print this help

Examples:
$ ./crs_log_aggregator.pl -c data/ -o outfile
         will sort without any filter all log files (*.log, *.l??) recursively found in the data directory. outfile is produced in the current directory as a result.
$ ./crs_log_aggregator.pl -c data/rac1/cssd,data/rac1/crsd -o outfile
         will sort without any filter only the content of data/rac1/cssd,data/rac1/crsd  directories.
$ ./crs_log_aggregator.pl -c data/ -o outfile_filtered -f '2012-10-26|2012-10-27'
         will sort with a filter (regex) that will filter 2 days 2012-10-26 and 2012-10-27.

        more information on perl regex: http://perldoc.perl.org/perlre.html
Next, I will launch the program with the good options:
[oracle@rac1 aggregator]$ ./crs_log_aggregator.pl -c ./rac1,./rac2 -o output_file -f '2013-05-30|2013-05-31'

---------Alert.log : ./rac1/alertrac1.log excluded from analyze

---------Alert.log : ./rac2/alertrac2.log excluded from analyze

[oracle@rac1 aggregator]$ ls -lh output_file
-rw-r--r-- 1 oracle oinstall 243M Sep  8 15:16 output_file
Please note that, if an alert.log file is found, it will be excluded and mentioned in the standard output.
I highly recommend you to use a filter to reduce the amount of data to sort (It can burn lot of CPU and memory). The -f option is use to do this, and takes a regexp style filter (it ‘s used directly in the perl script, so if you are not aware with regexp used in perl script, you can have a look at this URL: http://perldoc.perl.org/perlre.html).
As a result, you have this kind of messages in the output file:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmRPCBroadcast: rpc(0x2a1002a), status(0), sendcount(0), filtered by specific properties:
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmFreeRPCIndex: freeing rpc 673
2013-05-31 09:14:59.573<SEP>../rac2/cssd/ocssd.log<SEP> [    CSSD][1111411008]clssgmDeleteGrock: (0x7f3f38637990) grock(EVMDMAIN2) deleted
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ COMMCRS][1105488192]clsc_connect: (0x7f23b40531f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_LOGD))
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]clsdmc_send error code: -7
2013-05-31 09:14:59.576<SEP>../rac1/crfmond/crfmond.log<SEP> [ default][1083722048]Mond is being shutdown by default
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Daemon Select Thread exiting
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]Skipping Agent Initiated a check action
2013-05-31 09:14:59.695<SEP>../rac1/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ USRTHRD][1102350656] {0:0:2} Thread:[DaemonCheck:crf]isRunning is reset to false here
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss1: NTP default config file found
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]sclsctss_gvss8: Return [0] and NTP status [2].
2013-05-31 09:15:00.270<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1096657216]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 2 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 3 event.
2013-05-31 09:15:00.314<SEP>../rac2/ctssd/octssd.l01<SEP> [    CTSS][1109387584]ctsselect_vermon7: Expecting clssgsevGRPPRIV event. Ignoring 1 event.
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] Returning NETDATA: 1 interfaces
2013-05-31 09:15:00.345<SEP>../rac2/gipcd/gipcd.l04<SEP> [ CLSINET][1093364032] # 0 Interface 'eth2',ip='10.10.10.12',mac='08-00-27-92-f1-98',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [ora.crf][1089526080] {0:0:613} [stop] (:CLSN00108:) clsn_agent::stop }
2013-05-31 09:15:00.354<SEP>../rac2/agent/ohasd/orarootagent_root/orarootagent_root.l01<SEP> [    AGFW][1089526080] {0:0:613} Command: stop for resource: ora.crf 1 1 completed with status: SUCCESS

Now, you can analyze the output file directly, but I have developed another app (in Java). This app is available at this URL: https://app.box.com/s/xi3h7bpssrqtkp55q43c.
To launch it, you need to use java 1.7.
This app will show in a more beautiful interface the result of the output file. Messages that are coming from the same file are colored with the same color:
[oracle@rac1 aggregator]$ java -jar Aggr_visualisator.jar
 Aggr_visualisator
Update:
  • in v0.2, I implemented a filter function. This filter is active on the “Message” column.
  • I implemented a colorization function for the “Message” column.
    • if the term “error” is found, the cell will be colored in red
    • If the term “warn” is found, the cell will be colored in orange
    • and if the term “fail” is found, the cell wil be colored in yellow
v0.2 is available at this URL : https://app.box.com/s/xi3h7bpssrqtkp55q43c
two snapshots of this version below 😉
  • Messages colorization:
Aggr_visualisator_0.2_1
  • Filtering messages (based on a regexp)
Aggr_visualisator_0.2_2
If you have any suggestion to improve these tools, send me an email or a comment. I will study them.
I hope this will help you 😉

Trace Oracle Internal triggers

When you create a replication between two tables, a snapshot log for your materialized views, etc. Oracle creates and manages internal triggers that execute SQL statements. For example, when you insert a row in a table with a snapshot log, sql statements are executed in the related snapshot log table (MLOG$ table).

But if you trace it with 10046 event, nothing will appear in the trace file.

SQL> create materialized view log on OBJ$CLONE;

Materialized view log created.

SQL> select * from mlog$_OBJ$CLONE;

no rows selected

SQL> @10046_on.sql

Session altered.

SQL> delete from OBJ$CLONE where obj#=3000;

1 row deleted.

SQL> insert into OBJ$CLONE select * from sys.obj$ where obj#=3000;

1 row created.

SQL> commit;

Commit complete.

SQL> @10046_off.sql
The tracefile doesn’t mention  any DML operation on the table MLOG$_OBJ$CLONE (see the tkprof output below):
SQL ID: 46m19q6258mvs Plan Hash: 4136978372

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE
  NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
  NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0)
FROM
 (SELECT /*+ NO_PARALLEL("MLOG$_OBJ$CLONE") FULL("MLOG$_OBJ$CLONE")
  NO_PARALLEL_INDEX("MLOG$_OBJ$CLONE") */ 1 AS C1, 1 AS C2 FROM
  "LAURENT"."MLOG$_OBJ$CLONE" "MLOG$_OBJ$CLONE") SAMPLESUB

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.05          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         14          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.00       0.05          0         14          0           2

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 2

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=7 pr=0 pw=0 time=204 us)
         4          4          5   TABLE ACCESS FULL MLOG$_OBJ$CLONE (cr=7 pr=0 pw=0 time=105 us cost=2 size=0 card=82)

********************************************************************************

SQL ID: 2ya3sb8sagzxw Plan Hash: 3957508444

delete from OBJ$CLONE
where
 obj#=3000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.07          0          8          0           0
Execute      1      0.00       0.00          0          3          8           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.01       0.07          0         11          8           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 90
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  OBJ$CLONE (cr=3 pr=0 pw=0 time=372 us)
         1          1          1   INDEX UNIQUE SCAN PK_OBJ$CLONE (cr=2 pr=0 pw=0 time=34 us cost=1 size=13 card=1)(object id 127619)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        3.68          3.68
********************************************************************************

SQL ID: 4n7u7f6u7kd2y Plan Hash: 1218588913

insert into OBJ$CLONE select * from sys.obj$ where obj#=3000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          8          4           0
Execute      1      0.00       0.00          0          5          8           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         13         12           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 90
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  (cr=5 pr=0 pw=0 time=518 us)
         1          1          1   TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=75 us cost=3 size=86 card=1)
         1          1          1    INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=46 us cost=2 size=0 card=1)(object id 36)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        2.44          2.44
********************************************************************************

SQL ID: 23wm3kz7rps5y Plan Hash: 0

commit

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          1           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0          0          1           0

Misses in library cache during parse: 0
Parsing user id: 90

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        4.10          4.10
********************************************************************************
but there are some datas in this table:
SQL> select * from mlog$_OBJ$CLONE;

      OBJ# SNAPTIME$ D O CHANGE_VECTOR$$                     XID$$
---------- --------- - - ------------------------------ ----------
      3000 01-JAN-00 D O 000000                         5.6296E+15
      3000 01-JAN-00 I N FEFFFF                         5.6296E+15
And internal triggers is created to do this:
SQL> select * from user_internal_triggers;

TABLE_NAME                     INTERNAL_TRIGGER_TY
------------------------------ -------------------
OBJ$CLONE                      MVIEW LOG
By searching in USER_INTERNAL_TRIGGERS  code, we can see that this view references internal triggers on ‘DEFERRED RPC QUEUE’, ‘MVIEW LOG’, ‘UPDATABLE MVIEW LOG’ and ‘CONTEXT’. (based on the TAB$ table’s TRIGFLAG column).
select dbms_metadata.get_ddl('VIEW','USER_INTERNAL_TRIGGERS','SYS') ddl from dual

DDL
------------------------------------------------------------------------------------------------------------------------

  CREATE OR REPLACE FORCE VIEW "SYS"."USER_INTERNAL_TRIGGERS" ("TABLE_NAME", "INTERNAL_TRIGGER_TYPE") AS
  select o.name, 'DEFERRED RPC QUEUE'
from sys.tab$ t, sys.obj$ o
where o.owner# = userenv('SCHEMAID')
      and t.obj# = o.obj#
      and bitand(t.trigflag,1) = 1
union
select o.name, 'MVIEW LOG'
from sys.tab$ t, sys.obj$ o
where o.owner# = userenv('SCHEMAID')
      and t.obj# = o.obj#
      and bitand(t.trigflag,2) = 2
union
select o.name, 'UPDATABLE MVIEW LOG'
from sys.tab$ t, sys.obj$ o
where o.owner# = userenv('SCHEMAID')
       and t.obj# = o.obj#
       and bitand(t.trigflag,4) = 4
union
select o.name, 'CONTEXT'
from sys.tab$ t, sys.obj$ o
where o.owner# = userenv('SCHEMAID')
      and t.obj# = o.obj#
      and bitand(t.trigflag,8) = 8
If you want to trace what is really done by the internal trigger, you have a bunch of undocumented events that will trace specific triggers (Note there are events for repcat or synchronous replication for which there is no trigflag definition in USER_INTERNAL_TRIGGERS view code):
10302, 00000, "trace create or drop internal trigger"
// *Document: NO
// *Cause:
// *Action:

10303, 00000, "trace loading of library cache for internal triggers"
// *Document: NO
// *Cause:
// *Action:

10304, 00000, "trace replication trigger"
// *Document: NO
// *Cause:
// *Action:

10305, 00000, "trace updatable materialized view trigger"
// *Document: NO
// *Cause:
// *Action:

10306, 00000, "trace materialized view log trigger"
// *Document: NO
// *Cause:
// *Action:

10307, 00000, "trace RepCat execution"
// *Document: NO
// *Cause:
// *Action:

10308, 00000, "replication testing event"
// *Document: NO
// *Cause:
// *Action:

10309, 00000, "Trigger Debug event"
// *Document: NO
// *Cause:
// *Action:
// *Comment: This event replaces the earlier event number 10250
//           which had multiple definitions

10310, 00000, "trace synchronous change table trigger"
// *Document: NO
// *Cause:
// *Action:
I have chosen the 10306 event to trace internal triggers on Materialized view log (but you can use others depending on what you want to trace):
SQL> create materialized view log on OBJ$CLONE;

Materialized view log created.

SQL> select * from user_internal_triggers;

TABLE_NAME                     INTERNAL_TRIGGER_TY
------------------------------ -------------------
OBJ$CLONE                      MVIEW LOG

SQL> alter session set events '10306 trace name context forever, level 10';

Session altered.

SQL> delete from OBJ$CLONE where obj#=3000;

1 row deleted.

SQL> insert into OBJ$CLONE select * from sys.obj$ where obj#=3000;

1 row created.

SQL> commit;

Commit complete.
The related trace file shows SQL statements now executed on MLOG$_OBJ$CLONE table:
kntgslm()+
  ccl_kntsl = 1
  kntzcid =
        1
  lcc_kntsl = 0
  kntzlcs =
  kntzlcm =
  cpk_kntsl = 1
  pkc_kntsl =
        1
  flg_kntsl = 270434
  len_kntsl = 220
  sql_kntsl = INSERT /*+ IDX(%d) */ INTO "LAURENT"."MLOG$_OBJ$CLONE" (dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"OBJ#") VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x%s,:1)
  len2_kntsl = 0
  sql2_kntsl =
kntgslm()-
kntkca()
  ccl_kntsl = 1
  kntzcid =
        1
  lcc_kntsl = 0
  kntzlcs =
  kntzlcm =
  cpk_kntsl = 1
  pkc_kntsl =
        1
  flg_kntsl = 270434
  len_kntsl = 220
  sql_kntsl = INSERT /*+ IDX(%d) */ INTO "LAURENT"."MLOG$_OBJ$CLONE" (dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"OBJ#") VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x%s,:1)
  len2_kntsl = 0
  sql2_kntsl =
kntxslt()
  ccl_kntsl = 1
  kntzcid =
        1
  lcc_kntsl = 0
  kntzlcs =
  kntzlcm =
  cpk_kntsl = 1
  pkc_kntsl =
        1
  flg_kntsl = 270434
  len_kntsl = 220
  sql_kntsl = INSERT /*+ IDX(%d) */ INTO "LAURENT"."MLOG$_OBJ$CLONE" (dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"OBJ#") VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x%s,:1)
  len2_kntsl = 0
  sql2_kntsl =

*** 2013-01-16 11:53:21.065
kntkca()
  ccl_kntsl = 1
  kntzcid =
        1
  lcc_kntsl = 0
  kntzlcs =
  kntzlcm =
  cpk_kntsl = 1
  pkc_kntsl =
        1
  flg_kntsl = 270434
  len_kntsl = 220
  sql_kntsl = INSERT /*+ IDX(%d) */ INTO "LAURENT"."MLOG$_OBJ$CLONE" (dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"OBJ#") VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x%s,:1)
  len2_kntsl = 0
  sql2_kntsl =
kntxslt()
  ccl_kntsl = 1
  kntzcid =
        1
  lcc_kntsl = 0
  kntzlcs =
  kntzlcm =
  cpk_kntsl = 1
  pkc_kntsl =
        1
  flg_kntsl = 270434
  len_kntsl = 220
  sql_kntsl = INSERT /*+ IDX(%d) */ INTO "LAURENT"."MLOG$_OBJ$CLONE" (dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"OBJ#") VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x%s,:1)
  len2_kntsl = 0
  sql2_kntsl =
You can notice we have 5 instructions in sql_kntsl for 2 rows modified. Each statement seems to be controlled by 2 internal functions:  kntkca() and kntxslt(). There is one more function used to start this sequence kntgslm().
If I trace only one DML instruction in my OBJ$CLONE table, the result is one call to kntgslm() followed by a call to kntkca() and kntxslt().
So the sequence might start with a call to kntgslm followed by a call to  kntkca() and kntxslt() per row in the source table.
Finally, there’s a last problem: the sql text mentioned in sql_kntsl. It seems to be incomplete because of %d or %s in text but with those events, we can find in the sql statements executed we didn’t fin in the sql trace of the original statement.

How ASM disk header block repair works

In this post, I will explain one of my last works about the ASM disk header block.

First, I will create a TEST tablespace in my orcl database. The TEST’s datafile will be managed by ASM.

[oracle@oel ~]$ . oraenv
ORACLE_SID = [+ASM] ? orcl
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@oel ~]$ sqlplus / as sysdba
SQL> create tablespace test datafile '+data' size 5M autoextend on maxsize unlimited;
Tablespace created.

SQL>  select instance_name from v$instance;
INSTANCE_NAME
----------------
orcl

SQL> select file_id,tablespace_name,file_name,status,online_status from dba_data_files
  2  /

   FILE_ID TABLESPACE_NAME                FILE_NAME                                          STATUS    ONLINE_
---------- ------------------------------ -------------------------------------------------- --------- -------
         4 USERS                          /u02/oradata/orcl/users01.dbf                      AVAILABLE ONLINE
         3 UNDOTBS1                       /u02/oradata/orcl/undotbs01.dbf                    AVAILABLE ONLINE
         2 SYSAUX                         /u02/oradata/orcl/sysaux01.dbf                     AVAILABLE ONLINE
         1 SYSTEM                         /u02/oradata/orcl/system01.dbf                     AVAILABLE SYSTEM
         5 EXAMPLE                        /u02/oradata/orcl/example01.dbf                    AVAILABLE ONLINE
         6 TEST                           +DATA/orcl/datafile/test.258.798578613             AVAILABLE ONLINE

6 rows selected.

My ASM instance manages two diskgroups DATA (external redundancy) and RL (normal redundancy).

DATA is a 2 disks diskgroup managed by asmlib.

SQL> select group_number,name,type,state from v$asm_diskgroup
  2  /

GROUP_NUMBER NAME TYPE   STATE
------------ ---- ------ -----------
           1 DATA EXTERN MOUNTED
           2 RL   NORMAL MOUNTED
SQL> select path from v$asm_disk where group_number=1;

PATH
---------------------------------
ORCL:ASM1
ORCL:ASM2

And now, we erase the header block of each disk. Header block is the first one of an ASM disk device, and its default size is 4096 bytes (This size is available in _asm_blksize undocumented parameter).

[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.2e-05 seconds, 186 MB/s
[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM2 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000336 seconds, 12.2 MB/s

Even my block headers destroyed, I can still write in my diskgroup and allocate extents in it.

SQL> alter database datafile 6 resize 10M;

Database altered.

SQL> create table t tablespace TEST as select * from dba_source;

Table created.

SQL> alter system checkpoint;

System altered.

SQL> select file#,checkpoint_time from v$datafile_header where file#=6;

     FILE# CHECKPOINT_TIME
---------- -------------------
         6 05/11/2012 19:27:43

Well, let’s try to restart the rdbms instance:

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1068937216 bytes
Fixed Size                  2235208 bytes
Variable Size             377488568 bytes
Database Buffers          683671552 bytes
Redo Buffers                5541888 bytes
Database mounted.
Database opened.
SQL> select count(*) from t;

  COUNT(*)
----------
    702070

Damned ! I can still restart it and read all file extents.

There’s something strange in this demo, my ASM disk header is invalid and I still can read files. I will verify with kfed my headers state (only one ASM disk is shown below):

oracle@oel ~]$ kfed read /dev/oracleasm/disks/ASM1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B469E002400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[oracle@oel ~]$ sudo /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/kfed read /dev/sdb1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B413559C400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: file not found; arguments: [kfbtTraverseBlock] [Invalid OSM block type] [] [0]

So KFED has confirmed my header blocks are invalid. Now, I will try to unmount and remount the diskgroup to see if there’s any effect on diskgroup mount operation.

[oracle@oel ~]$ sqlplus / as sysasm

SQL> alter diskgroup DATA dismount;

Diskgroup altered.

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

So ASM disk header seems to be important for mounting the diskgroup but without effect on asm extent allocation etc.

Now I will repair asm disks with kfed and repair operation to restore my asm header blocks:

[oracle@oel ~]$ kfed repair /dev/oracleasm/disks/ASM1
[oracle@oel ~]$ kfed repair /dev/oracleasm/disks/ASM2

[oracle@oel ~]$ sqlplus / as sysasm

SQL> alter diskgroup DATA mount;

Diskgroup altered.

So KFED is able to restore the header (with the repair operation).

[oracle@oel ~]$ . oraenv
ORACLE_SID = [+ASM] ? orcl
The Oracle base remains unchanged with value /u01/app/oracle

[oracle@oel ~]$ sqlplus / as sysdba

Connected to an idle instance.

SQL> startup
...
Database opened.
SQL> select count(*) from T;

  COUNT(*)
----------
    702070

And datas in the T table are still there … no problem !

Well, let’s summarize … I erased the header block of both asm disks in my diskgroup without any impact on file extent allocation. The only impact I had was on the mount capability. Next, I have restored the header block of each disk with a kfed repair operation … yes, but without any backup of the disk or the disk header. So I wonder … where was the backup of my header block?

First, I will try a very simple method. I will have a look at all metadata blocks in my asm disk device. I hope I will find another header block. For this purpose, i will use a basic shell script to analyze each block. If the script finds a block with the type KFBTYP_DISKHEAD, it will keep the block position, and print them at the end of script execution:

The basic script:

#!/bin/bash
#set -x
export ORAENV_ASK=NO
export ORACLE_SID=+ASM
. oraenv

i=0
ret=0
tab_cnt=0;
blk_typ='';

while [ "$blk_typ" != "KFBTYP_INVALID" ];
do
  blk_typ=`kfed read $1 blkn=$i | grep kfbh.type | awk '{print $5;}' | sed 's/^ *//g' | sed 's/ *$//g' `
  ret=$?
  if [ "$blk_typ" = "KFBTYP_DISKHEAD" ]; then
    t[$i]=$i
  fi
  let i=$i+1
done
echo "list of header block with KFBTYP_DISKHEAD type"
echo ${t[@]}

Execution results:

[oracle@oel ~]$ ./asm_surgery.sh /dev/oracleasm/disks/ASM1
The Oracle base remains unchanged with value /u01/app/oracle
list of header block with KFBTYP_DISKHEAD type
0 510
[oracle@oel ~]$ ./asm_surgery.sh /dev/oracleasm/disks/ASM2
The Oracle base remains unchanged with value /u01/app/oracle
list of header block with KFBTYP_DISKHEAD type
0 510

Ok, so there is a copy of the header block in the 510th block in my disk. Indeed, as Bane Radulović mentioned it in its blog, a backup copy of ASM disk header is in the second last block of allocation unit 1. So, if my AU is 1Mb and my block size 4096 bytes, a copy of my header block will be available in ((1048576 / 4096) * 2 – 1  = 511), as the block# starts at 0, it is located in the 510th block.

To double check this, I will erase the header block of my first ASM disk, and use kfed repair and strace to see what really happens:

[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.3e-05 seconds, 178 MB/s

[oracle@oel ~]$ strace kfed repair /dev/oracleasm/disks/ASM1
.../...

stat("/dev/oracleasm/disks/ASM1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0
access("/dev/oracleasm/disks/ASM1", F_OK) = 0
statfs("/dev/oracleasm/disks/ASM1", {f_type=0x958459f6, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open("/dev/oracleasm/disks/ASM1", O_RDWR) = 7
lseek(7, 2088960, SEEK_SET)             = 2088960
read(7, "\1\202\1\1\376\200<\206\371\7"..., 4096) = 4096
lseek(7, 0, SEEK_SET)                   = 0
read(7, ""..., 4096) = 4096
lseek(7, 0, SEEK_SET)                   = 0
write(7, "\1\202\1\1\200\302\206\371\7"..., 4096) = 4096
close(7)                                = 0

This is very interesting.

First operation, it opens the device and read the block after 2088960 bytes. But 2088960/4096 = 510, so it reads the 510th block of the disk. Next it reads the block at position 0 (header block), and then writes in it the content of the 510th block.

Well, now I know there’s a copy of each header block in a secret position !!! (block #510) and kfed uses this block to repair the main header block.

During my tests, I noticed that sometimes, asm disks lost their asm label (I don’t know why). As a consequence, repaired disks won’t be recreated by the oracleasm scandisks (or after a reboot).

[root@oel ~]# ls /dev/oracleasm/disks/*
/dev/oracleasm/disks/ASM1  /dev/oracleasm/disks/ASM2  /dev/oracleasm/disks/ASM3  /dev/oracleasm/disks/ASM4
[root@oel ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Cleaning disk "ASM1"
Scanning system for ASM disks...
[root@oel ~]# ls /dev/oracleasm/disks/*
/dev/oracleasm/disks/ASM2  /dev/oracleasm/disks/ASM3  /dev/oracleasm/disks/ASM4

According the fact that you know which device is corresponding to asmlib disk device, you can relabel this disk with oracleasm renamedisk command (be very careful with this command):

[root@oel ~]# oracleasm renamedisk -f /dev/sdb1 ASM1
Writing disk header: done
Instantiating disk "ASM1": done

[root@oel ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...

[root@oel ~]# ls /dev/oracleasm/disks/*
/dev/oracleasm/disks/ASM1  /dev/oracleasm/disks/ASM2  /dev/oracleasm/disks/ASM3  /dev/oracleasm/disks/ASM4