Sunday, December 09, 2007

Hiring DBAs based on certification

There was a question in the OTN database forum recently which said:

" Can any one advise me? I have joined a company as new senior dba.

i am not understanding what shall be done at beginning?

Can anybody advice me how to check all the database and what to do in the beginning ?

I have been reading all the stuff,documents fr. a week?"

The same poster also said:

" I have all th knowledge of oracle,I am ocp certified.

But fearing at the beginning.

how to proceede as this is my first job as a dba."

This is what can happen when a company hires someone for a position like "senior DBA" just based on some papers called certifications.

How he claims he has "all the knowledge of oracle" just by getting an OCP certification is another story.

Thursday, December 06, 2007

Consolidating oraInventories

I have always had trouble with dealing with multiple oraInventories and multiple oraInst.loc files in a server.

If there are lots of people installing products in a server or if you are handed an environment which is in such a mess it is a big trouble to get things tidied up (if the installer people used a central inventory instead of each one using their own inventories, you a re lucky).

You need to know which oraInst.loc points to which oraInventory and which products and versions are in that inventory. You need to put the right oraInst.loc in place in order to install a patchset or a patch. Or you need a way to make the central inventory know about all Oracle homes which exist but which are not in the inventory.

In such situations Oracle Universal Installer 10.2 comes to the rescue. There is a Metalink note I have just realized which is 438133.1. The note explains how to consolidate multiple Oracle homes in one central inventory. There is a new option of the installer in 10.2 which is -attachHome. Using a 10.2 installer and this option you can get all your homes included in the central inventory.

./runInstaller -silent -attachHome ORACLE_HOME=<home you want to add> ORACLE_HOME_NAME=<unique home name>

The steps are clearly explained in the note which basically are; using a 10.2 universal installer, setting oraInst.loc to point to the inventory which you want to use to consolidate everything in, setting $ORACLE_HOME to the home you want to include and running the installer with the -attachHome option.

This works for 9i and 10G.

Optimizer blog

As I have learnt from Greg Rahn's blog Structured Data, the people behind the optimizer started blogging. In their first technical post they talk about bind peeking in 11G. I think we will see lots of examples and explanations about the optimizer in there. Their blog is here.

Wednesday, December 05, 2007

Hard Disk Reference

There was a reference to a site about the hard disk a few days ago in the oracle-l list from Ranko Mosic. I have gone through half of it for now, I can say the referenced pages are a great guide for understanding the basics of the hard disk, its parts, inner workings, etc... Trying to tune I/O without knowing about the hard disk basics can lead a DBA to the wrong path easily, this site is a must read in my opinion.

Friday, November 30, 2007

Rotate your logs

If you are using Linux and not rotating your alert logs, listener logs, any log actually, or rotating them with your own scripts, check out logrotate. I did not know this existed. From Dizwell.

Wednesday, November 28, 2007

Bind peeking change in 10g

A new note called "10g Upgrade Companion" has been published in Metalink recently. The note talks about the upgrade from 9i to 10g and lists recommended patches, behavior changes and best practices. While I was going through it I saw something I was not aware of before. In the "behavior changes" section it says: "Bind peeking has been extended to binds buried inside expressions.".

I have talked about bind variable peeking and a behavior change of that feature in 11g before.

In 9i bind variables in expressions were not peeked at, 10g changes this. Now bind variables in expressions are also peeked at.

To see the change in action I have done a simple test. Let's start with the behavior in 9i.


SQL> create table t (col1 number,col2 varchar2(30));

Table created.

SQL> insert into t select 1,object_name from all_objects where rownum<=10000;

10000 rows created.

SQL> update t set col1=0 where rownum=1;

1 row updated.

SQL> create index tind on t(col1);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',cascade=>true,method_opt=>'for columns col1 size 10');

PL/SQL procedure successfully completed.

Now I have a table with 10,000 rows with skewed data distribution, one row has col1=0, all other rows have col1=1.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

After enabling the trace I execute the following two blocks.


declare
b1 varchar2(1);
begin
b1:=0;
for t in (select * from t where col1=to_number(b1)) loop
null;
end loop;
end;
/

PL/SQL procedure successfully completed.

declare
b1 varchar2(1);
begin
b1:=1;
for t in (select * from t where col1=to_number(b1)) loop
null;
end loop;
end;
/

PL/SQL procedure successfully completed.


What I expect in 9i is; since the bind variable b1 is used in an expression it will not be peeked at, the optimizer will think that b1=0 will return 5000 rows since the number of distinct values for the column is 2 and since there are 10000 rows in the table and it will full scan the table.

When we tkprof the resulting trace file with the "aggregate=no" option what we see is:

SELECT *
FROM
T WHERE COL1=TO_NUMBER(:B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.01 0 48 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.01 0 48 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL T

SELECT *
FROM
T WHERE COL1=TO_NUMBER(:B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 10000 0.35 0.26 0 10003 0 9999
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10002 0.35 0.26 0 10003 0 9999

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
9999 TABLE ACCESS FULL T

It chose a full table scan for b1=0 and used it for the two queries.

If we run the same test in 10g what we see is:


SELECT *
FROM
T WHERE COL1=TO_NUMBER(:B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 3 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 43 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY INDEX ROWID T (cr=3 pr=0 pw=0 time=101 us)
1 INDEX RANGE SCAN TIND (cr=2 pr=0 pw=0 time=43 us)(object id 44327)



SELECT *
FROM
T WHERE COL1=TO_NUMBER(:B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 100 0.01 0.01 0 256 0 9999
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 102 0.01 0.01 0 256 0 9999

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 43 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
9999 TABLE ACCESS BY INDEX ROWID T (cr=256 pr=0 pw=0 time=269978 us)
9999 INDEX RANGE SCAN TIND (cr=120 pr=0 pw=0 time=139980 us)(object id 44327)

It peeked at the bind variable in the expression, found out that b1=0 will return one row and decided to use the index. After that since the plan is fixed now it used the same plan for b1=1 also.

This is one of many changes in 10g that can change the execution plans after the upgrade from 9i. Be aware of this if you have bind variables in expressions in your sql statements. Who does not have actually?

Wednesday, October 31, 2007

fast=true for adding columns in 11G

Before 11G adding new columns with default values to tables with millions of rows was a very time consuming and annoying task. Oracle had to update all the rows with the default value when adding the column.

11G introduced a fast=true feature for adding columns with default values. In 11G when you want to add a not null column with a default value, the operation completes in the blink of an eye (if there are no locks on the table of course). Now it does not have to update all the rows, it just keeps the information as meta-data in the dictionary. So adding a new column becomes just a few dictionary operations.

How does this work? How does it return the data if it does not store in the table's blocks?

To answer some questions let's see it in action.


YAS@11G>create table t_default (col1 number);

Table created.

YAS@11G>insert into t_default select 1 from dual;

1 row created.

YAS@11G>commit;

Commit complete.

Now we have a table with one column which has one row in it. Let's add a not null column with a default value.

YAS@11G>alter table t_default add(col_default number default 0 not null);

Table altered.

Now let's have a look at what is in the table's blocks for this one row.

YAS@11G>select dbms_rowid.rowid_block_number(rowid) from t_default where col1=1;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------
172

YAS@11G>alter system dump datafile 4 block 172;

System altered.

The trace file shows the first column but not the one we added after. It did not update the blocks with the new column's default value.

block_row_dump:
tab 0, row 0, @0x1f92
tl: 6 fb: --H-FL-- lb: 0x1 cc: 1
col 0: [ 2] c1 02
end_of_block_dump

If we insert a new row using the default value again.

YAS@11G>insert into t_default values(2,DEFAULT);

1 row created.

YAS@11G>commit;

Commit complete.

YAS@11G>select dbms_rowid.rowid_block_number(rowid) from t_default where col1=2;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------
176

YAS@11G>alter system dump datafile 4 block 176;

System altered.

Now the trace file shows the second column also. So Oracle does not update the rows when adding the column. But for subsequent inserts it inserts the default value for the column if no value is supplied. This means the metadata updates are only used when adding the column, after that updates and inserts modify the block for that column also.

block_row_dump:
tab 0, row 0, @0x1f90
tl: 8 fb: --H-FL-- lb: 0x1 cc: 2
col 0: [ 2] c1 03
col 1: [ 1] 80
end_of_block_dump

So, what happens when we query the table after adding a not null column with a default value?

Now we know that the rows at the time of the add column operation are not updated, how does it return the value for that column?

Let's do a simple query and see the trace file.

YAS@11G>exec dbms_stats.gather_table_stats(user,'T_DEFAULT');

PL/SQL procedure successfully completed.

YAS@11G>alter session set events '10046 trace name context forever, level 12';

Session altered.

YAS@11G>select * from t_default where col1=1;

COL1 COL_DEFAULT
---------- -----------
1 0

YAS@11G>select * from t_default where col1=2;

COL1 COL_DEFAULT
---------- -----------
2 0

YAS@11G>exit

I have just queried the row which was inserted before the add column operation (whose new column is not updated but stored in the dictionary as metadata) and after that the second row which is inserted after the add column operation (whose new column is updated in the table block).

The part in the raw trace related to the first row shows that just after my query there are some queries from the data dictionary, like the one from ecol$ just below my query.

PARSING IN CURSOR #3 len=36 dep=0 uid=60 oct=3 lid=60 tim=1193743350604471 hv=3796946483 ad='2b3dbc20'

sqlid='dbjjphgj51mjm'
select * from t_default where col1=1
END OF STMT
PARSE #3:c=7998,e=8401,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=1193743350604449
BINDS #3:
EXEC #3:c=0,e=73,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1193743350604794
WAIT #3: nam='SQL*Net message to client' ela= 6 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743350604856
FETCH #3:c=0,e=107,p=0,cr=3,cu=0,mis=0,r=1,dep=0,og=1,tim=1193743350605014
WAIT #3: nam='SQL*Net message from client' ela= 1304 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743350606363
FETCH #3:c=1000,e=992,p=0,cr=4,cu=0,mis=0,r=0,dep=0,og=1,tim=1193743350607750
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=66986 op='TABLE ACCESS FULL T_DEFAULT (cr=7 pr=0 pw=0 time=0 us cost=3 size=5 card=1)'
WAIT #3: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743350607916

*** 2007-10-30 13:22:34.800
WAIT #3: nam='SQL*Net message from client' ela= 4192534 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743354800488
=====================
PARSING IN CURSOR #3 len=97 dep=1 uid=0 oct=3 lid=0 tim=1193743354801333 hv=2759248297 ad='2b3983f0' sqlid='aa35g82k7dkd9'
select binaryDefVal, length(binaryDefVal) from ecol$ where tabobj# = :1 and colnum = :2
END OF STMT
PARSE #3:c=0,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1193743354801320
BINDS #3:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
kxsbbbfp=002fefb0 bln=22 avl=04 flg=05
value=66986
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
kxsbbbfp=002fefc8 bln=22 avl=02 flg=01
value=2


But the part for the second row does not show those dictionary queries.


PARSING IN CURSOR #4 len=36 dep=0 uid=60 oct=3 lid=60 tim=1193743354802685 hv=704792472 ad='32569228'

sqlid='7dq6h0np04jws'
select * from t_default where col1=2
END OF STMT
PARSE #4:c=1999,e=1898,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=1193743354802668
BINDS #4:
EXEC #4:c=0,e=71,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1193743354802925
WAIT #4: nam='SQL*Net message to client' ela= 6 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743354802987
FETCH #4:c=0,e=119,p=0,cr=7,cu=0,mis=0,r=1,dep=0,og=1,tim=1193743354803159
WAIT #4: nam='SQL*Net message from client' ela= 561 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743354803722
FETCH #4:c=0,e=24,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1193743354803817
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=66986 op='TABLE ACCESS FULL T_DEFAULT (cr=7 pr=0 pw=0 time=0 us cost=3 size=5 card=1)'
WAIT #4: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743354803971

*** 2007-10-30 13:22:35.872
WAIT #4: nam='SQL*Net message from client' ela= 1068525 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1193743355872534
XCTEND rlbk=0, rd_only=1
=====================
PARSING IN CURSOR #3 len=234 dep=1 uid=0 oct=6 lid=0 tim=1193743355873381 hv=1907731640 ad='2b3d6300'

sqlid='209fr01svbb5s'
update sys.aud$ set action#=:2, returncode=:3, logoff$time=cast(SYS_EXTRACT_UTC(systimestamp) as date), logoff$pread=:4,

logoff$lread=:5, logoff$lwrite=:6, logoff$dead=:7, sessioncpu=:8 where sessionid=:1 and entryid=1 and action#=100
END OF STMT

So this means if we query the table for rows whose related columns are maintained in the dictionary we get those values not from the table, but from the dictionary. This is explained in the documentation as:

"However, subsequent queries that specify the new column are rewritten so that the default value is returned in the result set."

What is this ecol$ table that the database queries to get the default column value?

ecol$ is a new dictionary table in 11G. It is created in $ORACLE_HOME/rdbms/admin/dcore.bsq which is called from $ORACLE_HOME/rdbms/admin/sql.bsq which contains all scripts used for creating basic dictionary tables for database operation.

In dcore.bsq it is explained as:

"This table is an extension to col$ and is used (for now) to store the default value with which a column was added"

How do these additional dictionary queries effect my queries?

To see if these additional dictionary queries effect the original query performance let's create two tables each with two columns, one with a not null column with a default value added after some rows are inserted, and a copy of it with the columns in place when creating the table. And query these tables and see their trace.


YAS@11G>r
1* drop table t_default

Table dropped.

YAS@11G>create table t_default (col1 number);

Table created.

YAS@11G>insert into t_default select level from dual connect by level<=10000;

10000 rows created.

YAS@11G>commit;


Commit complete.

YAS@11G>alter table t_default add(col_default number default 0 not null);

Table altered.

YAS@11G>exec dbms_stats.gather_table_stats(user,'T_DEFAULT');

PL/SQL procedure successfully completed.

YAS@11G>alter session set events '10046 trace name context forever, level 12';

Session altered.

YAS@11G>begin
2 for t in (select * from t_default) loop
3 null;
4 end loop;
5 end;
6 /

PL/SQL procedure successfully completed.

YAS@11G>exit

I have selected all rows after adding the column. Here is the trace file output. I am just including the part with the original sql and the summary part.

SELECT *
FROM
T_DEFAULT


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 101 0.01 0.01 0 119 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 103 0.02 0.02 0 119 0 10000

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 8 0.00 0.00 0 0 0 0
Execute 8 0.00 0.00 0 3 3 1
Fetch 109 0.01 0.01 0 133 0 10003
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 125 0.02 0.02 0 136 3 10004

Misses in library cache during parse: 1

2 user SQL statements in session.
7 internal SQL statements in session.
9 SQL statements in session.


I have 119 reads for the table and 139-119=20 reads from dictionary tables. I have 7 dictionary sqls each executed one time.

Try the same with a table with the same data which has the columns in place in creation.

YAS@11G>create table t_normal (col1 number,col_default number);

Table created.

YAS@11G>insert into t_normal select level,0 from dual connect by level<=10000;

10000 rows created.

YAS@11G>exec dbms_stats.gather_table_stats(user,'T_NORMAL');


PL/SQL procedure successfully completed.

YAS@11G>commit;

Commit complete.

YAS@11G>alter session set events '10046 trace name context forever, level 12';

Session altered.

YAS@11G>begin
2 for t in (select * from t_normal) loop
null;
end loop;
end;
3 4 5 6
7 /

PL/SQL procedure successfully completed.

YAS@11G>exit

SELECT *
FROM
T_NORMAL


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 101 0.00 0.01 0 119 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 103 0.01 0.01 0 119 0 10000

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 3 3 1
Fetch 101 0.00 0.01 0 119 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 105 0.01 0.01 0 122 3 10001

Misses in library cache during parse: 1

2 user SQL statements in session.
1 internal SQL statements in session.
3 SQL statements in session.

I have again 119 reads for the table. But now there is only one dictionary sql (which is an update on aud$ which was also there for the former case). I have now 6 dictionary reads instead of 20.

As you can see there is little overhead for querying the dictionary for the default values of columns. No matter how many rows you have, those dictionary sqls are run only once for your query. But the gain you have from this feature is huge, availability is dramatically increased, meaning now you can sleep more instead of waiting for an add column operation.

There are some restrictions for this fast=true switch which are explained here:

"However, the optimized behavior is subject to the following restrictions:
*The table cannot have any LOB columns. It cannot be index-organized, temporary, or part of a cluster. It also cannot be a queue table, an object table, or the container table of a materialized view.
*The column being added cannot be encrypted, and cannot be an object column, nested table column, or a LOB column."

Thursday, September 20, 2007

ORA-04043 in mount mode

There was a question at the OTN Database-General forum today about a problem when trying to describe the view dba_tablespaces.

The poster was getting an ORA-04043 error.


SQL> desc dba_tablespaces
ERROR:
ORA-04043: object dba_tablespaces does not exist

The first thing I thought about this was that the instance might have been in mount mode. I tried it on a database in mount stage and I got the error.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 404723928 bytes
Fixed Size 735448 bytes
Variable Size 234881024 bytes
Database Buffers 167772160 bytes
Redo Buffers 1335296 bytes
Database mounted.
SQL> desc dba_tablespaces
ERROR:
ORA-04043: object dba_tablespaces does not exist

I was expecting to be able to see the view after opening the database, but...

SQL> alter database open;
Database altered.

SQL> desc dba_tablespaces
ERROR:
ORA-04043: object dba_tablespaces does not exist

I could not query it either.

SQL> select * from dba_tablespaces;
select * from dba_tablespaces
*
ERROR at line 1:
ORA-00942: table or view does not exist

A quick search in Metalink returned note 296235.1 which refers to the bug 2365821 and says that if you describe any dba_* view in mount mode you cannot describe the same view even after opening the database. The only solution is to restart the database.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 404723928 bytes
Fixed Size 735448 bytes
Variable Size 234881024 bytes
Database Buffers 167772160 bytes
Redo Buffers 1335296 bytes
Database mounted.
SQL> alter database open;

Database altered.

SQL> desc dba_tablespaces
Name Null? Type
----------------------------------------- -------- ----------------------------
TABLESPACE_NAME NOT NULL VARCHAR2(30)
BLOCK_SIZE NOT NULL NUMBER
INITIAL_EXTENT NUMBER
NEXT_EXTENT NUMBER
MIN_EXTENTS NOT NULL NUMBER
MAX_EXTENTS NUMBER
PCT_INCREASE NUMBER
MIN_EXTLEN NUMBER
STATUS VARCHAR2(9)
CONTENTS VARCHAR2(9)
LOGGING VARCHAR2(9)
FORCE_LOGGING VARCHAR2(3)
EXTENT_MANAGEMENT VARCHAR2(10)
ALLOCATION_TYPE VARCHAR2(9)
PLUGGED_IN VARCHAR2(3)
SEGMENT_SPACE_MANAGEMENT VARCHAR2(6)
DEF_TAB_COMPRESSION VARCHAR2(8)

I was not aware of this behaviour till now. I tested this on 9.2 but the bug seems not fixed in any version.

Wednesday, August 29, 2007

Unique indexes on materialized views

One of the uses of materialized views is replication. Mviews can be used to replicate a table to another database to prevent users from accessing several databases through database links. This can improve the performance of queries which frequently access that table by removing the latency of the database link.

Today the refresh job of one of the mviews we use for this purpose started getting "ORA-00001 unique constraint violated" error. This was an mview with a unique index on it. After a simple search in Metalink I have found the reason of this behavior.

I will provide a simple test case to reproduce the problem.


SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE 9.2.0.8.0 Production
TNS for Solaris: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

SQL> create table master as select rownum r,'ROW'||rownum line from all_objects where rownum<=5;
Table created.

SQL> alter table master add primary key(r);


Table altered.

SQL> create unique index master_ind on master(line);

Index created.

SQL> create materialized view log on master with primary key;

Materialized view log created.

SQL> create materialized view snap refresh fast with primary key as select * from master;

Materialized view created.

SQL> create unique index snap_ind on snap(line);

Index created.

SQL> select * from master;

R LINE
---------- -------------------------------------------
1 ROW1
2 ROW2
3 ROW3
4 ROW4
5 ROW5

After these steps we have now one master table, an mview log on it and a fast refreshable mview of the master table. Now we make some updates to the master table to switch the LINE column's data of two rows.

SQL> update master set line='DUMMY' where r=1;

1 row updated.

SQL> update master set line='ROW1' where r=2;

1 row updated.

SQL> update master set line='ROW2' where r=1;

1 row updated.

SQL> commit;

Commit complete.

SQL> select * from master;

R LINE
---------- -------------------------------------------
1 ROW2
2 ROW1
3 ROW3
4 ROW4
5 ROW5

Now when we try to refresh the mview we get the error.

SQL> exec dbms_mview.refresh('SNAP');
BEGIN dbms_mview.refresh('SNAP'); END;

*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (YAS.SNAP_IND) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 820
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 877
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 858
ORA-06512: at line 1

If we drop the unique index and recreate it as nonunique we can refresh the mview without errors.

SQL> drop index snap_ind;

Index dropped.

SQL> create index snap_ind on snap(line);

Index created.

SQL> exec dbms_mview.refresh('SNAP');

PL/SQL procedure successfully completed.

SQL> select * from snap;

R LINE
---------- -------------------------------------------
1 ROW2
2 ROW1
3 ROW3
4 ROW4
5 ROW5

The reason of this error is explained in Metalink note 284101.1. According to that note the refresh mechanism may temporarily leave the data in an inconsistent state during the refresh. The modifications are not made with the same order as the modifications to the master table. Because of this creating unique indexes, foreign keys or primary keys on mviews can cause problems in the refresh process. This totally makes sense as it is better to keep the integrity checks on the master table rather than the mview. An mview is just a copy of the master table, so if we define the integrity constraints on the master table the mview will take care of itself.

This behaviour reproduces in 10G also.

Wednesday, August 15, 2007

Invisible indexes in 11G

11G has a new feature called Invisible Indexes. An invisible index is invisible to the optimizer as default. Using this feature we can test a new index without effecting the execution plans of the existing sql statements or we can test the effect of dropping an index without dropping it.

We can create an index as invisible or we can alter an index to become invisible.


YAS@11G>drop table t;

Table dropped.

YAS@11G>create table t as select * from all_objects;

Table created.

YAS@11G>create index t_ind on t(object_name) invisible;

Index created.

USER_INDEXES has a new column named VISIBILITY to indicate whether the index is visible or not.

YAS@11G>select index_name,VISIBILITY from user_indexes where index_name='T_IND';

INDEX_NAME VISIBILIT
------------------------------ ---------
T_IND INVISIBLE

There is a new initialization parameter modifiable at system or session level called OPTIMIZER_USE_INVISIBLE_INDEXES. This parameter is FALSE as default, meaning the optimizer will not consider invisible indexes when evaluating alternative execution plans.

YAS@11G>show parameter visible

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_use_invisible_indexes boolean FALSE

We can alter an existing index to become invisible or visible.

YAS@11G>alter index t_ind visible;

Index altered.

YAS@11G>select index_name,VISIBILITY from user_indexes where index_name='T_IND';

INDEX_NAME VISIBILIT
------------------------------ ---------
T_IND VISIBLE

YAS@11G>alter index t_ind invisible;

Index altered.

YAS@11G>select index_name,VISIBILITY from user_indexes where index_name='T_IND';

INDEX_NAME VISIBILIT
------------------------------ ---------
T_IND INVISIBLE

We can see that the invisible index is not considered by the optimizer using a 10053 trace.

YAS@11G>alter session set events '10053 trace name context forever, level 1';

Session altered.

YAS@11G>explain plan for select * from t where object_name='STANDARD';

Explained.

In the trace file the index is marked as UNUSABLE and is not considered.

BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: T Alias: T (NOT ANALYZED)
#Rows: 80129 #Blks: 981 AvgRowLen: 100.00
Index Stats::
Index: T_IND Col#: 2
LVLS: 2 #LB: 323 #DK: 38565 LB/K: 1.00 DB/K: 1.00 CLUF: 33236.00
UNUSABLE

And the plan is:

YAS@11G>explain plan for select * from t where object_name='STANDARD';

Explained.

YAS@11G>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 1738 | 268 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 11 | 1738 | 268 (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OBJECT_NAME"='STANDARD')

Note
-----
- dynamic sampling used for this statement

17 rows selected.

We can change the parameter OPTIMIZER_USE_INVISIBLE_INDEXES to see if the index will be used.

YAS@11G>alter session set optimizer_use_invisible_indexes=true;

Session altered.

YAS@11G>explain plan for select * from t where object_name='STANDARD';

Explained.

YAS@11G>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1376202287

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 202 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 2 | 202 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IND | 2 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_NAME"='STANDARD')

14 rows selected.

YAS@11G>alter session set optimizer_use_invisible_indexes=false;

Session altered.

If we make the index visible we see that the index is considered and used.

YAS@11G>alter index t_ind visible;

Index altered.

YAS@11G>explain plan for select * from t where object_name='STANDARD';

Explained.

YAS@11G>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1376202287

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 316 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 2 | 316 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IND | 2 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_NAME"='STANDARD')

Note
-----
- dynamic sampling used for this statement

18 rows selected.

Trying to gather statistics on an invisible index using either gather_index_stats or gather_table_stats with the cascade option gets an error.

YAS@11G>alter index t_ind invisible;

Index altered.

YAS@11G>exec dbms_stats.gather_index_stats(user,'T_IND');
BEGIN dbms_stats.gather_index_stats(user,'T_IND'); END;

*
ERROR at line 1:
ORA-00904: : invalid identifier
ORA-06512: at "SYS.DBMS_STATS", line 14037
ORA-06512: at "SYS.DBMS_STATS", line 14060
ORA-06512: at line 1

YAS@11G>exec dbms_stats.gather_table_stats(user,'T',cascade=>true);
BEGIN dbms_stats.gather_table_stats(user,'T',cascade=>true); END;

*
ERROR at line 1:
ORA-00904: : invalid identifier
ORA-06512: at "SYS.DBMS_STATS", line 17806
ORA-06512: at "SYS.DBMS_STATS", line 17827
ORA-06512: at line 1

YAS@11G>alter index t_ind visible;

Index altered.

YAS@11G>exec dbms_stats.gather_index_stats(user,'T_IND');

PL/SQL procedure successfully completed.

YAS@11G>exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.

I did not search if this is a bug or intended behaviour but we need a way to collect statistics for an invisible index, otherwise how can we test if it will be used or not? There is a way to collect statistics for an invisible index, which is to set the parameter OPTIMIZER_USE_INVISIBLE_INDEXES.

YAS@11G>alter session set optimizer_use_invisible_indexes=true;

Session altered.

YAS@11G>exec dbms_stats.gather_index_stats(user,'T_IND');

PL/SQL procedure successfully completed.

YAS@11G>alter session set optimizer_use_invisible_indexes=false;

Session altered.

Tuesday, August 14, 2007

Bind variable peeking in 11G

In one of my recent posts I mentioned about Gregory Guillou's posts about bind variable peeking in 11G. I did a simple test and research on this.

First here is the test case.


YAS@11G>create table t(id number,name varchar2(30));

Table created.

YAS@11G>insert into t select mod(rownum,2),object_name from all_objects;

64901 rows created.

YAS@11G>commit;

Commit complete.

YAS@11G>create index t_ind on t(id);

Index created.

YAS@11G>drop table t;

Table dropped.

YAS@11G>create table t(id number,name varchar2(30));

Table created.

YAS@11G>insert into t select mod(rownum,2),object_name from all_objects;

64901 rows created.

YAS@11G>insert into t select 2,'test' from dual;

1 row created.

YAS@11G>create index t_ind on t(id);

Index created.

YAS@11G>exec dbms_stats.gather_table_stats(user,'T',cascade=>true,method_opt=>'for columns id size 254');

PL/SQL procedure successfully completed.

Now we have a table with three distinct values in its ID column, values 0 and 1 have more than 30,000 rows each and value 2 has only one row.

YAS@11G>var v_id number;
YAS@11G>exec :v_id := 0;

PL/SQL procedure successfully completed.

YAS@11G>set lines 200
YAS@11G>column PLAN_TABLE_OUTPUT format a100
YAS@11G>set pages 50
YAS@11G>select count(name) from t where id = :v_id;

COUNT(NAME)
-----------
32450

YAS@11G>select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID gsmm31bu4zyca, child number 0
-------------------------------------
select count(name) from t where id = :v_id

Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 103 (100)| |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
|* 2 | TABLE ACCESS FULL| T | 32433 | 633K| 103 (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("ID"=:V_ID)

19 rows selected.




A plan using a full table scan is used for this value. Prior to 11G this plan would be fixed and used for all other bind variables regardless of their value if you have CURSOR_SHARING=EXACT, if cursor_sharing is not set to EXACT things are different.
In 11G if we run the same sql with a different value things change.

YAS@11G>exec :v_id := 2;

PL/SQL procedure successfully completed.

YAS@11G>select count(name) from t where id = :v_id;

COUNT(NAME)
-----------
1

YAS@11G>select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID gsmm31bu4zyca, child number 1
-------------------------------------
select count(name) from t where id = :v_id

Plan hash value: 2602990223

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T | 1 | 20 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IND | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("ID"=:V_ID)

20 rows selected.


Notice that I had to run this sql several times for this plan to change. After several runs of the same statement with the value 2 the plan changes to an index range scan. Why is this so?

This is a change in behaviour in bind variable peeking in 11G described here. No need to rephrase what the documentation says here, just to quote:

"To ensure the optimal choice of cursor for a given bind value, Oracle Database uses bind-aware cursor matching. The system monitors the data access performed by the query over time, depending on the bind values. If bind peeking takes place, and a histogram is used to compute selectivity of the predicate containing the bind variable, then the cursor is marked as a bind-sensitive cursor. Whenever a cursor is determined to produce significantly different data access patterns depending on the bind values, that cursor is marked as bind-aware, and Oracle Database will switch to bind-aware cursor matching to select the cursor for that statement. When bind-aware cursor matching is enabled, plans are selected based on the bind value and the optimizer's estimate of its selectivity. With bind-aware cursor matching, it is possible that a SQL statement with user-defined bind variable will have multiple execution plans, depending on the bind values."


The view V$SQL_CS_STATISTICS gives information about the cursor statistics. We can use that to see the effect of this change.

YAS@11G>alter system flush shared_pool;

System altered.

YAS@11G>exec :v_id := 0;

PL/SQL procedure successfully completed.

YAS@11G>select /* test */ count(name) from t where id = :v_id;

COUNT(NAME)
-----------
32450

YAS@11G>select sql_text,sql_id from v$sql where upper(sql_text) like 'SELECT%TEST%';

SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID
-------------
select /* test */ count(name) from t where id = :v_id
12dcwm6639zmr

select sql_text,sql_id from v$sql where upper(sql_text) like 'SELECT%TEST%'
51hmrnzmuaswy


YAS@11G>select child_number,BIND_SET_HASH_VALUE,PEEKED,EXECUTIONS,ROWS_PROCESSED,BUFFER_GETS
2 from V$SQL_CS_STATISTICS
3 where sql_id='12dcwm6639zmr';

CHILD_NUMBER BIND_SET_HASH_VALUE P EXECUTIONS ROWS_PROCESSED BUFFER_GETS
------------ ------------------- - ---------- -------------- -----------
0 1475158189 Y 1 32451 380

YAS@11G>exec :v_id := 2;

PL/SQL procedure successfully completed.

YAS@11G>select /* test */ count(name) from t where id = :v_id;

COUNT(NAME)
-----------
1

YAS@11G>select child_number,BIND_SET_HASH_VALUE,PEEKED,EXECUTIONS,ROWS_PROCESSED,BUFFER_GETS
2 from V$SQL_CS_STATISTICS
3 where sql_id='12dcwm6639zmr';

CHILD_NUMBER BIND_SET_HASH_VALUE P EXECUTIONS ROWS_PROCESSED BUFFER_GETS
------------ ------------------- - ---------- -------------- -----------
1 2064090006 Y 1 3 3
0 1475158189 Y 1 32451 380


As you can see there are two child cursors for this sql because of the different bind values.

First 11G installation

I have finished my first 11G installation using Vmware Server and Oracle Enterprise Linux 5 and created my first 11G database. The installation is no different than the 10G installation. There are a few differences in the database creation. I have created the database using dbca.

When it comes to setting the SGA and PGA sizes it asks if I want to use automatic memory management or not. Using automatic memory management we can set a single value for both SGA and PGA and the database adjusts the memory automatically. If this option is not selected it reverts back to the 10G behaviour.

There is a new option related to the security settings. 11G asks if you want to use enhanced 11G security settings. This option includes settings related to password management and auditing. If you select this option it enables these password policies in the default profile and starts auditing these privileges.

There is a new step asking if you want to enable automatic maintenance tasks. By selecting this option you enable automatic tasks such as statistics gathering, automatic segment advisor and sql tuning advisor.

Other than these the database creation is like 10G.

Friday, August 10, 2007

11G excitement in the community

Today when I was walking through the OraNA feed I saw that nearly all of the posts were about 11G being available for download. 11G is starting to get hot it seems. I am sure we will start seeing lots of posts about the new features soon.

Gregory Guillou of the Pythian Group has two posts about bind variable peeking in 11G, here and here. This is an interesting change in the cursor sharing behavior. I will test this and see how it behaves myself if I can finish my download. It is going really slow for now.

By the way the zip file is more than twice the size of the file for 10G R2. 10G R2 installation zip file was about 700MB while this is 1.7GB.

11G Available for Linux

11G for Linux is available for download here and I have just started downloading it.

At about the same time the first three parts of The Top New Features series by Arup Nanda has been published: database replay, partitioning and schema management. You can subscribe to this series' rss feed here.

Friday, July 13, 2007

Modifying sprepins.sql for Custom Statspack Reports III

Rows per Exec

There is a section titled "SQL ordered by Gets" in statspack reports. This section reports the sql statements performing most of the buffer gets.

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------


One important thing missing from these columns is the number of rows processed per execution for each sql statement. With a simple change we can add a column named "rows per exec" showing this information. By knowing how many rows the sql processes I can decide if the number of buffer gets is acceptable or not. If I have a sql statement processing a few rows but performns lots of buffer gets then I need to see if something can be done about that sql.


If the sql involves calculating aggregates like group by statements then this information may not be that useful, you need to do a more detailed analysis of the sql statement. But for simple sql statements with no aggregates this can be helpful.


After this change that section looks like the following:


CPU Elapsd
Buffer Gets Executions Gets per Exec Rows per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- -------------- ------ -------- --------- ----------


To make this change find this line in sqprepins.sql.


-- SQL statements ordered by buffer gets


Remove the lines between that line and this line.


-- SQL statements ordered by physical reads


In between these two lines insert the following code. Notice the lines in bold.

ttitle lef 'SQL ordered by Gets for ' -
'DB: ' db_name ' Instance: ' inst_name ' '-
'Snaps: ' format 999999 begin_snap ' -' format 999999 end_snap -
skip 1 -
'-> End Buffer Gets Threshold: ' ebgt -
skip 1 -
'-> Note that resources reported for PL/SQL includes the ' -
'resources used by' skip 1 -
' all SQL statements called within the PL/SQL code. As ' -
'individual SQL' skip 1 -
' statements are also reported, it is possible and valid ' -
'for the summed' skip 1 -
' total % to exceed 100' -
skip 2;

-- Bug 1313544 requires this rather bizarre SQL statement
set lines 95
set underline off;
col aa format a196 heading -
' CPU Elapsd| Buffer Gets Executions Gets per Exec Rows per Exec %Total Time (s) Time (s) Hash Value |--------------- ------------ -------------- -------------- ------ -------- --------- ----------'

column hv noprint;
break on hv skip 1;

select aa, hv
from ( select /*+ ordered use_nl (b st) */
decode( st.piece
, 0
, lpad(to_char((e.buffer_gets - nvl(b.buffer_gets,0))
,'99,999,999,999')
,15)||' '||
lpad(to_char((e.executions - nvl(b.executions,0))
,'999,999,999')
,12)||' '||
lpad((to_char(decode(e.executions - nvl(b.executions,0)
,0, to_number(null)
,(e.buffer_gets - nvl(b.buffer_gets,0)) /
(e.executions - nvl(b.executions,0)))
,'999,999,990.0'))
,14) ||' '||
--Reports rows processed per exec
lpad((to_char(100*(e.rows_processed - nvl(b.rows_processed,0))/(e.executions - nvl(b.executions,0))
,'999,999,990.0'))
, 14)
||' '||
lpad((to_char(100*(e.buffer_gets - nvl(b.buffer_gets,0))/:gets
,'990.0'))
, 6) ||' '||
lpad( nvl(to_char( (e.cpu_time - nvl(b.cpu_time,0))/1000000
, '9990.00')
, ' '),8) || ' ' ||
lpad( nvl(to_char( (e.elapsed_time - nvl(b.elapsed_time,0))/1000000
, '99990.00')
, ' '),9) || ' ' ||
lpad(e.hash_value,10)||''||
decode(e.module,null,st.sql_text
,rpad('Module: '||e.module,95)||st.sql_text)
, st.sql_text) aa
, e.hash_value hv
from stats$sql_summary e
, stats$sql_summary b
, stats$sqltext st
where b.snap_id(+) = :bid
and b.dbid(+) = e.dbid
and b.instance_number(+) = e.instance_number
and b.hash_value(+) = e.hash_value
and b.address(+) = e.address
and b.text_subset(+) = e.text_subset
and e.snap_id = :eid
and e.dbid = :dbid
and e.instance_number = :inst_num
and e.hash_value = st.hash_value
and e.text_subset = st.text_subset
and st.piece < &&num_rows_per_hash
and e.executions > nvl(b.executions,0)
order by (e.buffer_gets - nvl(b.buffer_gets,0)) desc, e.hash_value, st.piece
)
where rownum < &&top_n_sql;



Rollback Segments

The sections named "Rollback Segment Stats" and "Rollback Segment Storage" provides information about individual rollback segments. The first one gives information like undo bytes written, wraps, extends, shrinks, etc... The second one gives information about the average, maximum and optimum sizes of the rollback segments. I do not remember an occasion where I used these sections for databases with automatic undo management. So, I remove these sections as I do not them for automatic undo. This change reduces my report file size from about 500KB to about 100KB. This is not a major change but helps me keep unused data out of the report.

To make this change find the lines "-- Rollback segment" and "-- Undo Segment" and comment or remove the lines between them.


The last three posts were about modifications to sprepins.sql for custom statspack reports. Actually there are more changes I use to customize my reports to my needs but I think these three posts can help you in starting if you want to customize sprepins.sql.


Tuesday, May 29, 2007

Modifying sprepins.sql for Custom Statspack Reports II

My last post was about modifying $ORACLE_HOME/rdbms/admin/sprepins.sql which is the script that produces the statspack report.

Another change I make in my sprepins.sql is about the section "Top 5 Timed Events". In 9.2 this section lists the wait events and CPU time and their percentages contributing to the database time. This way we can see which wait event caused most of the waits for that period or if it is CPU time that takes most of the time. 8.1.7 has a section titled "Top 5 Wait Events" and does not include CPU time in this section. I originally made this change for 8.1.7 to include CPU time in that section but I am still using it in 9.2 with a few changes. This last version is tested in 9.2.

In this modification I divide the CPU time into 3 parts as described in Metalink note 232443.1. as "parse time cpu", "recursive cpu usage" and "cpu other". "parse time cpu" is the time spent in parsing, "recursive cpu usage" is the time spent in recursive calls such as plsql calls from sql and "cpu other" is the remaining cpu time. This way when I look at the Top 10 Timed Events section (I list 10 events not 5) I can see which part consumes more time and if it is a problem or not.

After this change you will have a new section titled "Top 10 Timed Events" like below.


Top 10 Timed Events
~~~~~~~~~~~~~~~~~~~ Wait
Event Time (s) % TOTAL TIME
-------------------------------------------- ------------ ------------
SQL*Net message from dblink 4,700 36.01
cpu other 2,928 22.43
recursive cpu usage 2,479 19
SQL*Net more data to client 587 4.49
enqueue 456 3.49
sbtwrite2 372 2.85
async disk IO 354 2.71
log file parallel write 261 2
db file sequential read 244 1.87
single-task message 188 1.44


To make this change find the following lines in sprepins.sql.

set heading on;
repfooter left -
'-------------------------------------------------------------';

--
-- Top Wait Events


Just above this add the following lines.

----------------------------------------TOP 10 TIMED EVENTS--------------------------------------
set heading on;
repfooter left -

col event format a44 heading 'Top 10 Timed Events|~~~~~~~~~~~~~~~~~~~|Event';
col time_waited format 999,999,990 heading 'Wait|Time (s)' just c;
col pct format 999.99 heading '% Total|Time';

SELECT NAME "EVENT",VALUE "TIME_WAITED",ROUND(PCT_TOTAL*100,2) "% TOTAL TIME"
FROM (SELECT NAME, VALUE, ratio_to_report (VALUE) OVER () pct_total
FROM (SELECT s1.NAME, (s1.VALUE - s2.VALUE)/100 VALUE
FROM stats$sysstat s1, stats$sysstat s2
WHERE s1.NAME IN
('recursive cpu usage',
'parse time cpu')
AND s1.NAME = s2.NAME
AND s1.snap_id = :eid
AND s2.snap_id = :bid
UNION ALL
SELECT 'cpu other', diff/100
FROM (SELECT NAME, VALUE,
VALUE
- (LEAD (VALUE, 1) OVER (ORDER BY NAME))
- (LEAD (VALUE, 2) OVER (ORDER BY NAME))
diff
FROM (SELECT s1.NAME, s1.VALUE - s2.VALUE VALUE
FROM stats$sysstat s1, stats$sysstat s2
WHERE s1.NAME IN
('recursive cpu usage',
'parse time cpu',
'CPU used by this session'
)
AND s1.NAME = s2.NAME
AND s1.snap_id = :eid
AND s2.snap_id = :bid))
WHERE NAME = 'CPU used by this session'
UNION ALL
SELECT event, time_waited
FROM (SELECT s1.event,
(s1.time_waited_micro - nvl(s2.time_waited_micro,0))/1000000 time_waited,
ratio_to_report ( s1.time_waited_micro
- nvl(s2.time_waited_micro,0)
) OVER () pct
FROM stats$system_event s1, stats$system_event s2
WHERE s1.event = s2.event
AND s1.snap_id = :eid
AND s2.snap_id = :bid
AND s1.event NOT IN (SELECT event
FROM stats$idle_event
)))
ORDER BY pct_total DESC)
WHERE ROWNUM <= 10; ---------------------------------------END TOP 10 TIMED EVENTS-------------------------


With the help of this section I can see what the top 10 events are including the parts of CPU time and drill down from there to find the cause.

The next post will be about adding a "Rows Processed" column to the "SQL ordered by Gets" section and the removal of the sections "Rollback Segment Stats" and "Rollback Segment Storage" for databases with automatic undo management.

Thursday, May 24, 2007

Modifying sprepins.sql for Custom Statspack Reports I

I use STATSPACK extensively for comparing two time periods, comparing a time period to a baseline, monitoring trends and diagnosing performance problems. It provides important data about the database performance. The code producing statspack reports is open and you can modify it to your needs.

You run $ORACLE_HOME/rdbms/admin/spreport.sql to produce statspack reports. spreport.sql calls sprepins.sql which is the script that produces the report. You can easily modify sprepins.sql to add, remove and change sections in the report. By sections I mean parts in the report that start with headers like "SQL ordered by Gets", "Instance Activity Stats", "Load Profile", etc... I use a modified sprepins.sql to produce statspack reports that suit my needs. I want to share those modifications in a few posts here. You can use these modifications, modify them to your needs or even introduce new changes like these. Note that these changes are tested on 9.2 only.

The first two modifications I made are; entering number of days to list snapshots for and using a standard name for report outputs.

Number of Days

When spreport.sql is run it prints all the available statspack snapshots and asks for a begin snapshot id and an end snapshot id to produce a report that outputs the delta values between these snapshots. If you keeps days, weeks or even months of data in the PERFSTAT schema the listing of all snapshots can take time and you need to wait for a lot of pages to flow. To prevent this I add a new prompt that asks the number of days to list snapshots for (awrrpt.sql which produces the AWR report in 10G does this also). To make this change find the line with this sentence in sprepins.sql.

-- Ask for the snapshots Id's which are to be compared

This is the part where it lists the snapshots. Above this line put these lines.


set termout on
prompt
prompt Specify Number of Days
prompt ~~~~~~~~~~~~~~~~~~~~~~
accept num_days number default 1 prompt 'Enter number of days:'


This will ask you to enter the number of days to list and put your input into the variable num_days. If you do not enter something the default is 1 day. Now change the select statement after these lines to select only the snapshots for num_days number of days. The bold line below is the line you need to add.

select to_char(s.startup_time,' dd Mon "at" HH24:mi:ss') instart_fmt
, di.instance_name inst_name
, di.db_name db_name
, s.snap_id snap_id
, to_char(s.snap_time,'dd Mon YYYY HH24:mi') snapdat
, s.snap_level lvl
, substr(s.ucomment, 1,60) commnt
from stats$snapshot s
, stats$database_instance di
where s.dbid = :dbid
and di.dbid = :dbid
and s.instance_number = :inst_num
and di.instance_number = :inst_num
and di.dbid = s.dbid
and di.instance_number = s.instance_number
and di.startup_time = s.startup_time
and s.snap_time>=trunc(sysdate-&&num_days+1)
order by db_name, instance_name, snap_id;


Now if you save sprepins.sql and run spreport.sql it will ask the number of days:

Specify Number of Days
~~~~~~~~~~~~~~~~~~~~~~
Enter number of days:


After that it will print the snapshots for that number of days and you can continue as usual by entering snapshot ids.

Report Names

After entering the begin snapshot id and the end snapshot id sprepins.sql asks you for a report name and it will use a default report name like sp_1_2.lst if you do not enter a name (where 1 is the begin snapshot id and 2 is the end snapshot id).

To produce a standard report name I remove the report name prompt and use a default name. With this modification I have report names like PROD_22_05_2007_13_30__13_45.LST where PROD is the instance name. This way I can quickly understand which instance and which time period the report is produced for. This name tells that this report is for the instance PROD, reporting the period between 22-MAY-2007 13:30 and 22-MAY-2007 13:45.

To make this change remove or comment the following lines in sprepins.sql.

prompt
prompt Specify the Report Name
prompt ~~~~~~~~~~~~~~~~~~~~~~~
prompt The default report file name is &dflt_name.. To use this name,
prompt press to continue, otherwise enter an alternative.

set heading off;
column report_name new_value report_name noprint;
select 'Using the report name ' || nvl('&&report_name','&dflt_name')
, nvl('&&report_name','&dflt_name') report_name
from sys.dual;
spool &report_name;

In place of these lines enter the following.

column begin_time new_value begin_time noprint;
column end_time new_value end_time noprint;

select to_char(snap_time,'DD_MM_YYYY_HH24_MI') begin_time
from stats$snapshot
where snap_id=&begin_snap;

select to_char(snap_time,'_HH24_MI') end_time
from stats$snapshot
where snap_id=&end_snap;

set escape on
spool &&inst_name\_&begin_time\_&end_time;


Be aware that if you are taking a report for a period spanning two days the report name will be misleading as I only take the hour-minute part for the end snapshot. This name standardization makes keeping and referencing old reports easier.

The next post will be about changing the "Top 10 Timed Events" section of the statspack report.