Tuesday, May 29, 2007

Modifying sprepins.sql for Custom Statspack Reports II

My last post was about modifying $ORACLE_HOME/rdbms/admin/sprepins.sql which is the script that produces the statspack report.

Another change I make in my sprepins.sql is about the section "Top 5 Timed Events". In 9.2 this section lists the wait events and CPU time and their percentages contributing to the database time. This way we can see which wait event caused most of the waits for that period or if it is CPU time that takes most of the time. 8.1.7 has a section titled "Top 5 Wait Events" and does not include CPU time in this section. I originally made this change for 8.1.7 to include CPU time in that section but I am still using it in 9.2 with a few changes. This last version is tested in 9.2.

In this modification I divide the CPU time into 3 parts as described in Metalink note 232443.1. as "parse time cpu", "recursive cpu usage" and "cpu other". "parse time cpu" is the time spent in parsing, "recursive cpu usage" is the time spent in recursive calls such as plsql calls from sql and "cpu other" is the remaining cpu time. This way when I look at the Top 10 Timed Events section (I list 10 events not 5) I can see which part consumes more time and if it is a problem or not.

After this change you will have a new section titled "Top 10 Timed Events" like below.


Top 10 Timed Events
~~~~~~~~~~~~~~~~~~~ Wait
Event Time (s) % TOTAL TIME
-------------------------------------------- ------------ ------------
SQL*Net message from dblink 4,700 36.01
cpu other 2,928 22.43
recursive cpu usage 2,479 19
SQL*Net more data to client 587 4.49
enqueue 456 3.49
sbtwrite2 372 2.85
async disk IO 354 2.71
log file parallel write 261 2
db file sequential read 244 1.87
single-task message 188 1.44


To make this change find the following lines in sprepins.sql.

set heading on;
repfooter left -
'-------------------------------------------------------------';

--
-- Top Wait Events


Just above this add the following lines.

----------------------------------------TOP 10 TIMED EVENTS--------------------------------------
set heading on;
repfooter left -

col event format a44 heading 'Top 10 Timed Events|~~~~~~~~~~~~~~~~~~~|Event';
col time_waited format 999,999,990 heading 'Wait|Time (s)' just c;
col pct format 999.99 heading '% Total|Time';

SELECT NAME "EVENT",VALUE "TIME_WAITED",ROUND(PCT_TOTAL*100,2) "% TOTAL TIME"
FROM (SELECT NAME, VALUE, ratio_to_report (VALUE) OVER () pct_total
FROM (SELECT s1.NAME, (s1.VALUE - s2.VALUE)/100 VALUE
FROM stats$sysstat s1, stats$sysstat s2
WHERE s1.NAME IN
('recursive cpu usage',
'parse time cpu')
AND s1.NAME = s2.NAME
AND s1.snap_id = :eid
AND s2.snap_id = :bid
UNION ALL
SELECT 'cpu other', diff/100
FROM (SELECT NAME, VALUE,
VALUE
- (LEAD (VALUE, 1) OVER (ORDER BY NAME))
- (LEAD (VALUE, 2) OVER (ORDER BY NAME))
diff
FROM (SELECT s1.NAME, s1.VALUE - s2.VALUE VALUE
FROM stats$sysstat s1, stats$sysstat s2
WHERE s1.NAME IN
('recursive cpu usage',
'parse time cpu',
'CPU used by this session'
)
AND s1.NAME = s2.NAME
AND s1.snap_id = :eid
AND s2.snap_id = :bid))
WHERE NAME = 'CPU used by this session'
UNION ALL
SELECT event, time_waited
FROM (SELECT s1.event,
(s1.time_waited_micro - nvl(s2.time_waited_micro,0))/1000000 time_waited,
ratio_to_report ( s1.time_waited_micro
- nvl(s2.time_waited_micro,0)
) OVER () pct
FROM stats$system_event s1, stats$system_event s2
WHERE s1.event = s2.event
AND s1.snap_id = :eid
AND s2.snap_id = :bid
AND s1.event NOT IN (SELECT event
FROM stats$idle_event
)))
ORDER BY pct_total DESC)
WHERE ROWNUM <= 10; ---------------------------------------END TOP 10 TIMED EVENTS-------------------------


With the help of this section I can see what the top 10 events are including the parts of CPU time and drill down from there to find the cause.

The next post will be about adding a "Rows Processed" column to the "SQL ordered by Gets" section and the removal of the sections "Rollback Segment Stats" and "Rollback Segment Storage" for databases with automatic undo management.

Thursday, May 24, 2007

Modifying sprepins.sql for Custom Statspack Reports I

I use STATSPACK extensively for comparing two time periods, comparing a time period to a baseline, monitoring trends and diagnosing performance problems. It provides important data about the database performance. The code producing statspack reports is open and you can modify it to your needs.

You run $ORACLE_HOME/rdbms/admin/spreport.sql to produce statspack reports. spreport.sql calls sprepins.sql which is the script that produces the report. You can easily modify sprepins.sql to add, remove and change sections in the report. By sections I mean parts in the report that start with headers like "SQL ordered by Gets", "Instance Activity Stats", "Load Profile", etc... I use a modified sprepins.sql to produce statspack reports that suit my needs. I want to share those modifications in a few posts here. You can use these modifications, modify them to your needs or even introduce new changes like these. Note that these changes are tested on 9.2 only.

The first two modifications I made are; entering number of days to list snapshots for and using a standard name for report outputs.

Number of Days

When spreport.sql is run it prints all the available statspack snapshots and asks for a begin snapshot id and an end snapshot id to produce a report that outputs the delta values between these snapshots. If you keeps days, weeks or even months of data in the PERFSTAT schema the listing of all snapshots can take time and you need to wait for a lot of pages to flow. To prevent this I add a new prompt that asks the number of days to list snapshots for (awrrpt.sql which produces the AWR report in 10G does this also). To make this change find the line with this sentence in sprepins.sql.

-- Ask for the snapshots Id's which are to be compared

This is the part where it lists the snapshots. Above this line put these lines.


set termout on
prompt
prompt Specify Number of Days
prompt ~~~~~~~~~~~~~~~~~~~~~~
accept num_days number default 1 prompt 'Enter number of days:'


This will ask you to enter the number of days to list and put your input into the variable num_days. If you do not enter something the default is 1 day. Now change the select statement after these lines to select only the snapshots for num_days number of days. The bold line below is the line you need to add.

select to_char(s.startup_time,' dd Mon "at" HH24:mi:ss') instart_fmt
, di.instance_name inst_name
, di.db_name db_name
, s.snap_id snap_id
, to_char(s.snap_time,'dd Mon YYYY HH24:mi') snapdat
, s.snap_level lvl
, substr(s.ucomment, 1,60) commnt
from stats$snapshot s
, stats$database_instance di
where s.dbid = :dbid
and di.dbid = :dbid
and s.instance_number = :inst_num
and di.instance_number = :inst_num
and di.dbid = s.dbid
and di.instance_number = s.instance_number
and di.startup_time = s.startup_time
and s.snap_time>=trunc(sysdate-&&num_days+1)
order by db_name, instance_name, snap_id;


Now if you save sprepins.sql and run spreport.sql it will ask the number of days:

Specify Number of Days
~~~~~~~~~~~~~~~~~~~~~~
Enter number of days:


After that it will print the snapshots for that number of days and you can continue as usual by entering snapshot ids.

Report Names

After entering the begin snapshot id and the end snapshot id sprepins.sql asks you for a report name and it will use a default report name like sp_1_2.lst if you do not enter a name (where 1 is the begin snapshot id and 2 is the end snapshot id).

To produce a standard report name I remove the report name prompt and use a default name. With this modification I have report names like PROD_22_05_2007_13_30__13_45.LST where PROD is the instance name. This way I can quickly understand which instance and which time period the report is produced for. This name tells that this report is for the instance PROD, reporting the period between 22-MAY-2007 13:30 and 22-MAY-2007 13:45.

To make this change remove or comment the following lines in sprepins.sql.

prompt
prompt Specify the Report Name
prompt ~~~~~~~~~~~~~~~~~~~~~~~
prompt The default report file name is &dflt_name.. To use this name,
prompt press to continue, otherwise enter an alternative.

set heading off;
column report_name new_value report_name noprint;
select 'Using the report name ' || nvl('&&report_name','&dflt_name')
, nvl('&&report_name','&dflt_name') report_name
from sys.dual;
spool &report_name;

In place of these lines enter the following.

column begin_time new_value begin_time noprint;
column end_time new_value end_time noprint;

select to_char(snap_time,'DD_MM_YYYY_HH24_MI') begin_time
from stats$snapshot
where snap_id=&begin_snap;

select to_char(snap_time,'_HH24_MI') end_time
from stats$snapshot
where snap_id=&end_snap;

set escape on
spool &&inst_name\_&begin_time\_&end_time;


Be aware that if you are taking a report for a period spanning two days the report name will be misleading as I only take the hour-minute part for the end snapshot. This name standardization makes keeping and referencing old reports easier.

The next post will be about changing the "Top 10 Timed Events" section of the statspack report.

Monday, May 21, 2007

Performance of Autotrace

I have been having some performance problems when using autotrace in some production systems. I finally had time to check out why.

I enabled sql trace in my autotrace session to see what was causing the slow response. As you may already know setting autotrace on opens another session other than the session you are executing your query in. This second session reports statistics of the main session which executes your query. This second session is the one I have enabled sql trace for.

In the original session, I set the timing on to see the elapsed times and run a simple query which returns immediately.

SQL> set timing on
SQL> select sid from v$mystat where rownum=1;

SID
----------
20892

Elapsed: 00:00:00.00
SQL> select * from dual;

D
-
X

Elapsed: 00:00:00.00

Now I set autotrace on.

SQL> set autot trace

This opens another session for me. You can now query v$session and find out that you have 2 sessions, one with sid=20892 which is the main session and another one with a different sid. I enable sql trace for that other session to see what it is doing. From another session run the following to enable sql trace.

SQL> exec dbms_system.SET_SQL_TRACE_IN_SESSION(133,9526,true);

PL/SQL procedure successfully completed.


Now from the main session I execute the same simple query again and set autotrace off after that.


SQL> select * from dual;

Elapsed: 00:01:14.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'DUAL'




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
205 bytes sent via SQL*Net to client
273 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> set autot off
Look at the elapsed time! It is more than 1 minutes with autotrace on. Now when I look at the trace file produced I see that the sql running this long is the following.


SELECT PT.VALUE
FROM
SYS.V_$SESSTAT PT WHERE PT.SID=:1 AND PT.STATISTIC# IN (7,40,41,42,115,239,
240,241,245,246) ORDER BY PT.STATISTIC#


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 4 73.41 73.03 0 0 0 20
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 73.41 73.03 0 0 0 20

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 32

Rows Row Source Operation
------- ---------------------------------------------------
10 SORT ORDER BY
10 FILTER
10 FIXED TABLE FULL X$KSUSESTA
1 FIXED TABLE FULL X$KSUSGIF
The view V_$SESSTAT is based on two fixed tables, named as X$KSUSESTA and X$KSUSGIF. X$KSUSESTA is a large object holding the statistics for all sessions. So, if you have thousands of sessions it may take a long time to query this fixed table and autotrace suffers from that. Here is the number of rows from this view in my instance.

SQL> select count(*) from X$KSUSESTA;

COUNT(*)
----------
7000000