Thursday, December 22, 2005

Touch count bug

I came upon this comment from Jonathan Lewis. He talks about a bug related to the touch count. If you are full scanning some tables and you are using versions before 10g, be aware that they are not kept in the buffer cache no matter how frequently you scan them.

Here is a simple test case in 9.2.0.7.

create table t as select object_id from user_objects where rownum=1;
SQL> select data_object_id from user_objects2  where object_name='T';
DATA_OBJECT_ID
--------------
20505
SQL> select distinct dbms_rowid.rowid_block_number(rowid) from t;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------
11970

SQL> begin
2 for t in (select * from t) loop
3 null;
4 end loop;
5 end;
6 /

PL/SQL procedure successfully completed.

SQL> select file#,dbablk,tch from x$bh where obj=20505;

FILE# DBABLK TCH
---------- ---------- ----------
1 11969 3
1 11970 0


After scanning the small table 10 times we see that the touch count for the block is 0.

So, how can we keep the small parameter tables which are scanned frequently? One thing to consider is the use of the file system cache as Jonathan Lewis mentiones in the same link. If you are using a file system cache than it is likely that these tables are in that cache and can be retrieved into the database buffer cache when requested.

Another option is to use a keep pool that is sized according to the sizes of the tables we want to keep and alter the tables to use that cache. This way we can keep the tables cached although scanning them does not increment the touch count.

Thursday, November 24, 2005

Be careful about bind peeking

I have been trying to tune a plsql package which runs for more than 1.5 hours. When looking at the raw sql trace file, i saw that a select statement was causing thousands of db file sequential reads for some of the bind variables. The pseudo-code is like this:


Cursor c1 is select <col list> from <master table> order by <col>;
Cursor c2(v1 in number) is select <col list> from <table> where <col>=v_1;
Open c1;
loop
Fetch from c1;
Process c1 row;
Open c2(<value from c1>);
Loop
Fetch from c2;
Process c2 row;
End loop;
End loop;

The problem sql statement was the select in cursor c2. It came out that it was performing well for some parameters and running awful for some parameters. The problem is that for the first time c2 is opened it produces an execution plan for the related parameter value and uses that plan for all the next calls to the same statement. This is called bind variable peeking. So if the next value for the bind variable is not similar to the previous one in terms of statistics (number of matching rows, etc...) the same plan may not be appropriate for it.

If you a have a column that has a skewed data distribution, a histogram on it and if you are using bind variables against that column you may suffer from the same problem.

Here is a very very simplified test case of the problem.

SQL> create table bind_test as select 'MANY' object_name,object_id from all_objects;

Table created.

SQL> insert into bind_test select * from bind_test;

24221 rows created.

SQL> r
1* insert into bind_test select * from bind_test

48442 rows created.

SQL> r
1* insert into bind_test select * from bind_test

96884 rows created.

SQL> commit;

Commit complete.

SQL> create index bind_test_ind on bind_test(object_name);

Index created.

SQL> insert into bind_test select 'FEW',object_id from all_objects where rownum<=10
2 ;

10 rows created.

SQL> commit;

Commit complete.

SQL> select object_name,count(*) from bind_test group by object_name;

OBJE COUNT(*)
---- ----------
FEW 10
MANY 193768

SQL> EXEC dbms_stats.gather_table_stats(ownname=>null,tabname=>'BIND_TEST',cascade=>true,method_opt=>'for all indexed columns size 2');

PL/SQL procedure successfully completed.

Now, i have a table with many rows that have ‘MANY’ as the object_name and very few rows that have ‘FEW’ as the object_name and a histogram on the object_name column.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> declare
2 b1 varchar2(4);
3 begin
4 b1 := 'FEW';
5 for t in (select * from bind_test where object_name=b1) loop
6 null;
7 end loop;
8 b1 := 'MANY';
9 for t in (select * from bind_test where object_name=b1) loop
10 null;
11 end loop;
12 end;
13 /

I run the same sql first for the value ‘FEW’ and then for the value ‘MANY’. The trace output shows:

SELECT *
FROM
BIND_TEST WHERE OBJECT_NAME=:B1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.01 0.01 0 0 0 0
Fetch 193770 5.33 5.70 398 387539 0 193768
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 193774 5.34 5.71 398 387539 0 193768

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 27 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
0 TABLE ACCESS BY INDEX ROWID BIND_TEST
0 INDEX RANGE SCAN BIND_TEST_IND (object id 82756)

It uses the index on object_name which is good for the value ‘FEW’ but bad for the value ‘MANY’. Because when the statement was parsed the bind variable had the value ‘FEW’ and the execution plan is produced for that value. Subsequent runs of the statement used the same plan even if the bind variable’s value has completely different statistics.

If you have a similar situation, you can use literals instead of bind variables. For this, you have to hard code the values as literals to the statements or you can use dynamic sql and produce the sql with literals at run time. Be aware of the consequences (parsing, many different sqls in the shared pool, etc...).

If the performance of the query for most of the bind values is ok with a specific plan and negligibly worse for the other values with the same plan, you can use hints to force the same plan for all of the bind values (unlike the above simple example). Or you can use stored outlines instead of hints. This was the way i chose. I created a stored outline to force the same plan for the bind values. The execution time dropped to less than 15 minutes from more than 1 hour.

Any other solutions to the bind peeking problem i could not remember?

Tuesday, November 15, 2005

Bulk Collect

Executing sql statements in plsql programs causes a context switch between the plsql engine and the sql engine. Too many context switches may degrade performance dramatically. In order to reduce the number of these context switches we can use a feature named bulk binding. Bulk binding lets us to transfer rows between the sql engine and the plsql engine as collections. Bulk binding is available for select, insert, delete and update statements.

Bulk collect is the bulk binding syntax for select statements. All examples below are simplified versions of the live cases i have seen.

One of the things i usuallly come accross is that developers usually tend to use cursor for loops to process data. They declare a cursor, open it, fetch from it row by row in a loop and process the row they fetch.

Declare
Cursor c1 is select column_list from table_name>;
Rec1 c1%rowtype;
Begin
Open c1;
Loop;
Fetch c1 into r1;
Exit when c1%notfound;
--process rows...
End loop;
End;
Here is a simple test case to compare the performance of fetching row by row and using bulk collect to fetch all rows into a collection.
SQL> create table t_all_objects as select * from all_objects;

Table created.

SQL> insert into t_all_objects select * from t_all_objects;

3332 rows created.

SQL> r
1* insert into t_all_objects select * from t_all_objects

6664 rows created.

---replicated a couple of times

SQL> select count(*) from t_all_objects;

COUNT(*)
----------
213248

SQL> declare
cursor c1 is select object_name from t_all_objects;
2 3 rec1 c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into rec1;
8 exit when c1%notfound;
9
10 null;
11
12 end loop;
13 end;
14 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:44.75

SQL> declare
2 cursor c1 is select object_name from t_all_objects;
3 type c1_type is table of c1%rowtype;
4 rec1 c1_type;
5 begin
6 open c1;
7
8 fetch c1 bulk collect into rec1;
9
10
11 end;
12 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.32
As can be clearly seen, bulk collecting the rows shows a huge performance improvement over fetching row by row.

The above method (which fetched all the rows) may not be applicable to all cases. When there are many rows to process, we can limit the number of rows to bulk collect, process those rows and fetch again. Otherwise process memory gets bigger and bigger as you fetch the rows.
SQL> declare
2 cursor c1 is select object_name from t_all_objects;
3 type c1_type is table of c1%rowtype;
4 rec1 c1_type;
5 begin
6 open c1;
7 loop
8 fetch c1 bulk collect into rec1 limit 200;
9 for i in 1..rec1.count loop
10 null;
11 end loop;
12 exit when c1%notfound;
13 end loop;
14
15
16 end;
17 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.07
Detailed documentation for bulk collect can be found in Chapter 5 - PL/SQL User’s Guide and Reference.
The above examples are on 9.2.

Wednesday, October 12, 2005

Synonyms for remote synonyms

In my last post i have mentioned about the sql*net message from client waits in Oracle Forms. I was working on the same problem and found out that the problem was related to the synonym the form accesses.

I have a synonym that points to a remote synonym in another database which points to a table in another schema in the same remote database.

In the remote database, i have a table on user_b which i want to to select from user_a. So, i create a synonym for it at user_a.

User_A: create synonym tab1 for user_b.tab1; --(assuming select privilege on user_b.tab1 has been granted to user_a)

In the local database (which the form connects):

Create database link remotedb.world
Connect to user_a identified by user_a
Using ‘remotedb’;

Create synonym tab1 for tab1@remotedb.world;

Then i create a simple form which connects to the user at the local database and uses tab1 as its base table. The form queries one row from that table and updates that row. The sql trace output for this form shows (the sql trace contains a select for update statement for one row and an update statement for that row):

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     167        0.00          0.00
  SQL*Net message from client                   167        0.07          0.26
  SQL*Net message to dblink                      59        0.00          0.00
  SQL*Net message from dblink                    59        0.09          0.23
  latch free                                      1        0.00          0.00
  log file sync                                   1        0.12          0.12

I see that there are 167 waits for the sql*net message from client event and 59 waits for the sql*net message from db link event . The time waited here is much much worse in remote clients where the bandwidth is not much.

In the local database, if i recreate the synonym by specifying the table owner explicitly, there is a boost in performance:

Create or replace synonym tab1 for user_B.tab1@remotedb.world;

When i run the form again:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      14        0.00          0.00
  SQL*Net message from client                    14        0.31          0.48
  SQL*Net message to dblink                       8        0.00          0.00
  SQL*Net message from dblink                     8        0.02          0.06

Look at the number of waits, they are much much fewer than before.

So, if you are using Oracle Forms and use a synonym to a remote synonym as your base table, check if that synonym is created with the table owner specified or not. We had a huge performance improvement by recreating the synonym by specifying the table owner as you see.

(Both the local and remote databases are 9.2.0.5 in this test).


Tuesday, September 20, 2005

SQL*Net message from client

Sql*net message from client is generally considered an idle event as the Oracle server process is waiting for the client process. Some dbas do not pay any attention to this event. It is good this seems to be changing.

This case i have come into shows the importance of this event.
User queries a simple base block form which gets one row from a table (which contains that single row), changes one column and commits the change. The user is complaining about the performance of this form and says he always waits 5-6 seconds for this update. The sql trace for this session shows the following sql statements:

Select row from table;
Select row from table where rowid=:1 for update;
Update table set col=value where rowid=:2;

Each sql statement’s statistics show that all their elapsed times were 0.00. But here are the totals at the end of the trace file:

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.01       0.00          0          0          0           0
Execute      7      0.00       0.01          0          0          0           1
Fetch        5      0.00       0.00          0         12          0           5
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       16      0.01       0.02          0         12          0           6

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     172        0.00          0.00
  SQL*Net message from client                   172        0.68          6.20
  latch free                                      1        0.00          0.00
  SQL*Net message to dblink                       8        0.00          0.00
  SQL*Net message from dblink                     8        0.00          0.01
  log file sync                                   1        0.00          0.00

Here we see why he is talking about the 5 seconds response time of this simple form. Sql*net message from client waits account for 6 seconds of elapsed time. There are 172 waits for this event. Surely any other action than reducing sql*net message from clients here can not make any improvements in performance. When we quit using base block and populating the block using a “select col_list into ... “ statement and then updating the row, we reduced the number of sql*net message from clients from 172 to 36 and response time from 6.20 sec to 1 sec.

Tuesday, September 13, 2005

How to make $1 million

Look at this idea to make one million dollars. A British student launched a web page and sells pixels. Here is what he is trying to do. Let’s see if he will be able to make it. Seems so. (from The Guardian online blog)

Monday, September 12, 2005

Oracle buys Siebel

Oracle buys Siebel and becomes No 1 in the CRM market.

Thursday, September 01, 2005

New Orleans hit by Katrina

Katrina hits New Orleans. I was there in May. It is sad to see such a beatiful city like this. I love the city and the people. It is ironic that the most famous drink of the city is also named Hurricane.

We do not see hurricanes here in Turkey but we know earthquakes well. The quakes in 1999 left thousands of people dead and more homeless.

I can understand what the people there are going through. Hard times.

Friday, August 26, 2005

Configuring Shared Server

We have recently configured shared server in some of our databases. Steps to follow to enable shared server are below.

The following configuration is on Enterprise Edition 9.2.0.5. The numbers used in the settings shown are only examples. We are running an OLTP system with about 2000 concurrent users with these parameters without any problem.

The parameters you need to set for the shared server and their default values are:

NAME                           VALUE
------------------------------ ----------
circuits                       0
dispatchers
max_dispatchers                5
max_shared_servers             20
mts_circuits                   0
mts_dispatchers
mts_max_dispatchers            5
shared_server_sessions         0
shared_servers                 0

The ones starting with mts are for backward compatibility, setting circuits, dispatchers and max_dispatchers or mts_circuits, mts_dispatchers and mts_max_dispatchers is the same and is up to you.

If we go over these parameters one by one:

Circuits: This is a static parameter and it defaults to the sessions parameter when the shared server is used. You can leave it to the defalt value if your users are not using database links extensively. Because every session connected through the shared server adds to the circuits count for each distinct database link it opens. In our case we had sessions=3500 and circuits=3500 derived from that sessions parameter. But each user queried from two or three database links, so when the session count was about 1500 we exhausted circuits and began to get “error 18 creating virtual circuit” errors. So, you may need to set this parameter explicitly to a higher value if you are using database links.

Dispatchers: This parameter is said to be dynamic in Database Reference but if you have it as null in your database like the above settings you can not set it dynamically.

SQL> alter system set dispatchers='(protocol=tcp)(dispatchers=2)';
alter system set dispatchers='(protocol=tcp)(dispatchers=2)'
*
ERROR at line 1:
ORA-00105: dispatching mechanism not configured for network protocol
(ADDRESS=(PARTIAL=YES)(PROTOCOL=tcp))

You can not even set it with scope=spfile.

SQL> alter system set dispatchers='(protocol=tcp)(dispatchers=2)' scope=spfile;
alter system set dispatchers='(protocol=tcp)(dispatchers=2)' scope=spfile
                                                             *
ERROR at line 1:
ORA-02065: illegal option for ALTER SYSTEM

If you have it null on your system, the only way to set this is to create a pfile from the spfile and put it in there and then shutdown the database, create a new spfile and startup.

Shared_servers: This is a dynamic parameter and defaults to 1 when shared server is used.

Max_shared_servers: This is a static parameter and defaults to 20 or 2*shared_servers.

The steps we followed were:

SQL> alter system set circuits = 10000 scope=spfile;

System altered.

SQL> alter system set max_shared_servers=200 scope=spfile;

System altered.

SQL> alter system set shared_servers=50 scope=spfile;

System altered.

SQL> create pfile from spfile;                                  

File created.

We have inserted the following line to the pfile created.

dispatchers='(protocol=tcp)(dispatchers=5)'

Then shutdown, create spfile from pfile and startup the instance.

Thursday, August 25, 2005

TARs

I wonder if Oracle is aware of the users’ feelings about the quality of support for TARs. I am doing everything to save myself from opening a TAR and dealing with it. I open a TAR, Support requests some logs and traces, i supply them. 2 weeks pass, i update the TAR asking for a response, they ask for more traces. 2 more weeks pass, again i ask for a response. Then look at what Support says:
“We have noticed that the logs you sent are missing some lines, can you upload all the logs available?”
This is where words end. Every TAR becomes a new challange to deal with. I do not know what Oracle is thinking about the quality of support.

Monday, August 22, 2005

Magic Exceptions Into

One of our developers sent me a procedure code that removes duplicate records from a table. The code was something like this:

Cursor c1 is select * from tab1;
Open c1;
Loop
Fetch c1 into r1;
delete from tab1
where col1=r1.col1
and rownum=1;
commit;
end loop;

This was supposed to delete about 600M rows. Surely this is not the way to go for removing duplicate records from a table.

There are more than one way to remove duplicate records. Here is a simplified test case.

Create table t_dups as select * from all_objects;
insert into t_dups as select * from t_dups;

DELETE FROM t_dups a
WHERE ROWID<>(SELECT MAX(ROWID)
FROM t_dups b
WHERE a.object_id=b.object_id);

This does the work in a reasonable time if we have an index on object_id. But in this case we did not have an index on the related column. Without an index on object_id, here are the stats:

DELETE FROM t_dups a
WHERE ROWID<>(SELECT MAX(ROWID) FROM t_dups b WHERE a.object_id=b.object_id)



call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.06 0 0 0 0
Execute 1 342.16 335.17 0 2306810 7113 6396
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 342.17 335.23 0 2306810 7113 6396



Instead of creating an index on a big table just for the purpose of removing duplicates, we can use the exceptions into clause. The EXCEPTIONS table can be created using $ORACLE_HOME/rdbms/admin/utlexcpt.sql.

alter table t_dups add constraint cons01 unique(object_id) exceptions into exceptions;
*
ERROR at line 1:
ORA-02299: cannot validate (CONS01) - duplicate keys found

This puts the duplicate rows into the EXCEPTIONS table. The thing to be careful about is that it puts rowids of all the duplicates including the rows we want to keep. So, if you have two object_id’s that are the same, both are in EXCEPTIONS not just one of them.

As we have the rowids of the duplicate rows, we can remove one of each duplicate by using the following sql:

delete from t_dups
where rowid in (select minrid from (SELECT object_id,MIN(e.row_id) minrid
FROM EXCEPTIONS e,t_dups t

WHERE t.ROWID=e.row_id
GROUP BY object_id)
)



call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.10 0 1 0 0
Execute 1 2.27 2.27 0 19429 7247 6396
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 2.30 2.38 0 19430 7247 6396


2.38 seconds compared to 335.23 seconds! So, using the exceptions into clause for removing duplicates is a good alternative when we do not have an index on the related columns.

First Post and Blogger Word Add-in

I have been thinking about starting a blog about my day-to-day Oracle work. I have been trying to make my mind clear about what i would be posting. Hard thing is to begin, continuing posting will be easy i hope.

I will try to publish anything i find interesting mostly related to the Oracle database and rarely non Oracle related stuff. Any corrections to my English are welcome. So, here is one more blog added to the Oracle blogosphere.

Last week Google has provided a free Word add-in to publish blog posts. This post is written using that.