Thursday, May 24, 2012

Strange ORA-14404, or not?

I was trying to drop a tablespace which I know there were no segments in it. A simple query from dba_segments returns no rows which means there are no segments allocated in this tablespace. But strangely I got this:


SQL> drop tablespace psapsr3old including contents and datafiles;
drop tablespace psapsr3old including contents and datafiles
*
ERROR at line 1:
ORA-14404: partitioned table contains partitions in a different tablespace

How come I cannot drop a tablespace with no segments in it?

Enter deferred segment creation. Things were simpler on 9i or 10g. The database I get this error on is an 11.2.0.2 database. Starting with 11.2.0.1 things may fool you when it comes to segments in the database. 11.2 brought us the feature called "deferred segment creation" which means no segments are created if you do not insert any data into the tables you created. Have a look at the note "11.2 Database New Feature Deferred Segment Creation [Video] (Doc ID 887962.1)" about this feature. It is there to save disk space in case you have lots of tables without data in them. In 11.2.0.1 it was only for non-partitioned heap tables, starting with 11.2.0.2 it is also used for partitioned tables.


Coming back to my problem, even if there were no segments reported in dba_segments there are tables and partitions created in this tablespace without their segments created yet. If we look at the tables and partitions in that tablespace:

SQL> select segment_created,count(*) from dba_tables
  2  where tablespace_name='PSAPSR3OLD'
  3  group by segment_created;


SEG   COUNT(*)
--- ----------
NO       13482


SQL> select segment_created,count(*) from dba_tab_partitions
  2  where tablespace_name='PSAPSR3OLD'
  3  group by segment_created;


SEGM   COUNT(*)
---- ----------
NO         1237

There are thousands of objects in there.

What is the solution then?

Obviously it is to get rid of these objects by moving them to a different tablespace. The standard "alter table move" and "alter table move partition" commands do the job. Then the question becomes; will a move table operation create the segment in the new tablespace? If you are on 11.2.0.1, yes it will, defeating the whole purpose of this feature. If you are on 11.2.0.2 it will not create the segments. This is explained in the note "Bug 8911160 - ALTER TABLE MOVE creates segment with segment creation deferred table (Doc ID 8911160.8)".


After everything is moved you can safely drop the tablespace without this error.


UPDATE: Gokhan Atil made me aware of Randolf Geist's post about the same issue. See that post here.

Thursday, February 24, 2011

Direct NFS Clonedb

Direct NFS Clonedb is a feature in 11.2 that you can use to clone your databases. Kevin Closson explains what it is in this post. In his demo videos he is using a perl script to automate the process of generating the necessary scripts. That script is not publicly available as of today but the MOS note 1210656.1 explains how to do the clone manually without the perl script.

Tim Hall also has a step by step guide on how to do the cloning in this post. He also uses the perl script mentioned above.

We have been testing backups and clones on Exadata connected to a 7410 ZFS Storage Appliance, I wanted to share our test on Direct NFS Clonedb. This test is on a quarter rack x2-2 connected to a 7410 storage via Infiniband. A RAC database will be cloned as a single instance database and the clone database will opened in one db node.

Enable Direct NFS on Exadata


For security, as of today default Exadata installation disables some services needed by NFS. To use NFS on Exadata db nodes we enabled those services first.


service portmap start
service nfslock start
chkconfig --level 345 portmap on
chkconfig --level 345 nfslock on

Recent Exadata installations come with Direct NFS (dNFS) enabled, you can check if you have it enabled by looking at the database alert log. When the database is started you can see this line in the alert log if you have dNFS enabled.

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0

If it is not enabled you can use this command after stopping the database to enable it.

make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk dnfs_on

Mount the NFS share

I am assuming 7410 is configured and NFS sharing is up on it at this point. To mount the NFS share you can use a mount command like this on Exadata db nodes.

mount 192.168.27.105:/export/backup1 /backup -o rw,bg,hard,intr,rsize=131072,wsize=131072,intr,nodev,nosuid,actimeo=0

Back up the source database

You can use OS copies or RMAN image copies to back up the database for use in the cloning process. Here are the commands we used, do not forget to create the target directory before.

sql 'alter database begin backup';
backup as copy database format '/backup/clone_backup/%U';
sql 'alter database end backup';

Prepare the clone db

To start the clone database we need an init.ora file and a create controlfile script. You can back up the source database's control file to a text file and use it. In the source database run this to get the script, this will produce a script under the udump directory (/u01/app/oracle/diag/rdbms/dbm/dbm1/trace in Exadata).

SQL> alter database backup controlfile to trace;

Database altered.

After editing the script this is the one we can use for the clone database.

CREATE CONTROLFILE REUSE SET DATABASE "clone" RESETLOGS  NOARCHIVELOG
    MAXLOGFILES 1024
    MAXLOGMEMBERS 5
    MAXDATAFILES 32767
    MAXINSTANCES 32
    MAXLOGHISTORY 33012
LOGFILE
  GROUP 1 (
    '/u01/app/oradata/clone/redo01.log'
  ) SIZE 4096M BLOCKSIZE 512,
  GROUP 2 (
    '/u01/app/oradata/clone/redo02.log'
  ) SIZE 4096M BLOCKSIZE 512,
  GROUP 3 (
    '/u01/app/oradata/clone/redo03.log'
  ) SIZE 4096M BLOCKSIZE 512
DATAFILE
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS1_FNO-3_bnm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSTEM_FNO-1_blm5ajro',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSAUX_FNO-2_bmm5ajro',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS2_FNO-4_bom5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-GNS_DATA01_FNO-7_bqm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-5_bpm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-9_bsm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_btm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-8_brm5ajrp'
CHARACTER SET AL32UTF8
;

/u01/app/oradata/clone is a directory on the local disks, you can also use NFS for redo logs if you want to. The DATAFILE section lists the image copies we have just produced using RMAN. You can get this list using this sql, be careful about the completion time because you may have previous image copies in the same directory.

select name,completion_time from V$BACKUP_COPY_DETAILS;

Now we need an init.ora file, we can just copy the source database's file and edit it.

SQL> create pfile='/backup/clone.ora' from spfile;

File created.

Since the source database is a RAC database you need to remove parameters related to RAC (like cluster_database, etc...). You also need to change the paths to reflect the new clone database, like in the parameter control_files. Here is the control_files parameter in this test.

*.control_files='/u01/app/oradata/clone/control.ctl'

I also use a local directory, not NFS, for the control file.

There is one parameter you need to add when cloning a RAC database to a single instance database.

_no_recovery_through_resetlogs=TRUE

If you do not set this parameter you will get an error when you try to open the clone database with resetlogs. MOS note 334899.1 explains why we need to set this. If you do not set this this is the error you will get when opening the database.

RMAN> sql 'alter database open resetlogs';

sql statement: alter database open resetlogs
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of sql command on default channel at 02/22/2011 16:13:07
RMAN-11003: failure during parse/execution of SQL statement: alter database open resetlogs
ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled

Now we are ready to create the clone database.

Create the clone db

After preparing the init.ora file and the create controlfile script we can create the database.

export ORACLE_SID=clone
SQL> startup nomount pfile='/backup/clone.ora';
ORACLE instance started.

Total System Global Area 3991842816 bytes
Fixed Size                  2232648 bytes
Variable Size             754978488 bytes
Database Buffers         3087007744 bytes
Redo Buffers              147623936 bytes

SQL>  @cr_control

Control file created.

Now we need to rename the datafiles and point them to a NFS location we want. dbms_dnfs is the package needed for this.

begin
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS1_FNO-3_bnm5ajrp','/backup/oradata/undotbs1.263.740679581');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSTEM_FNO-1_blm5ajro','/backup/oradata/system.261.740679559');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSAUX_FNO-2_bmm5ajro','/backup/oradata/sysaux.262.740679571');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS2_FNO-4_bom5ajrp','/backup/oradata/undotbs2.265.740679601');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-GNS_DATA01_FNO-7_bqm5ajrp','/backup/oradata/gns_data01.264.741356977');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-5_bpm5ajrp','/backup/oradata/users.266.740679611');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-9_bsm5ajrp','/backup/oradata/users.274.741357097');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_btm5ajrp','/backup/oradata/users.275.741357121');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-8_brm5ajrp','/backup/oradata/users.273.741357075');
end;
/




The first parameter to dbms_dnfs is the backup image copy name we set in the controlfile script, the second parameter is the target filename which should reside in NFS. You can create this script using the following sql on the source database.

select 'dbms_dnfs.clonedb_renamefile('''||b.name||''',''/backup/oradata/'||
       substr(d.file_name,instr(d.file_name,'/',-1)+1)||''');'
from v$backup_copy_details b,dba_data_files d
where b.file#=d.file_id
and b.completion_time>sysdate-1/24;

If you have multiple image copies be careful about the completion_time predicate. In this example I am looking at the image copies of the last hour.

If the target location in the dbms_dnfs call is not on NFS here is what you will get:

SQL> exec dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_b1m59uem','/u01/app/oradata/clone/users.275.741357121');
BEGIN dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_b1m59uem','/u01/app/oradata/clone/users.275.741357121'); END;

*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-17513: dNFS package call failed
ORA-06512: at "SYS.X$DBMS_DNFS", line 10
ORA-06512: at line 1

I got this error in my first try and searched in MOS and Google for some time with no results. Then I realized that my target locations was not on NFS but was on local disk.

As we renamed the datafiles and they are pointing at the NFS location, now we can open the clone database.

SQL> alter database open resetlogs;

Database altered.

At this point we have a clone database ready to use.

The target directory we used shows our files.

[oracle@dm01db01 oradata]$ ls -l
total 98523
-rw-r-----+ 1 oracle dba 10737426432 Feb 22 16:44 gns_data01.264.741356977
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 sysaux.262.740679571
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 system.261.740679559
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 undotbs1.263.740679581
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 undotbs2.265.740679601
-rw-r-----+ 1 oracle dba 10737426432 Feb 22 16:44 users.266.740679611
-rw-r-----+ 1 oracle dba  2147491840 Feb 22 16:44 users.273.741357075
-rw-r-----+ 1 oracle dba  2147491840 Feb 22 16:44 users.274.741357097
-rw-r-----+ 1 oracle dba  2147491840 Feb 22 16:44 users.275.741357121

Even though ls shows their size equal to the source database file sizes, du shows us the truth.

[oracle@dm01db01 oradata]$ du -hs *
129K    gns_data01.264.741356977
74M     sysaux.262.740679571
2.8M    system.261.740679559
15M     undotbs1.263.740679581
5.4M    undotbs2.265.740679601
129K    users.266.740679611
129K    users.273.741357075
129K    users.274.741357097
129K    users.275.741357121

The datafile sizes are minimal since we did not do any write activity on the clone database yet. These sizes will get bigger after some activity.

Tuesday, January 04, 2011

Database smart flash cache wait events

When you enable the database smart flash cache and start using it you will see new wait events related to that. These events help to find out if the problem is about the flash cache or not.

The ones I faced till now are "db flash cache single block physical read", "db flash cache multiblock physical read" and "write complete waits: flash cache". These are from a 11.2.0.1 database using the F5100 flash array as the database smart flash cache.

db flash cache single block physical read

"db flash cache single block physical read" is the flash cache equivalent of "db file sequential read". Read waits from the flash cache are not accounted for in the "db file sequential read" event and have their own wait event. The following is from an AWR report of 30 mins from a database using the database smart flash cache.


Top 5 Timed Foreground Events
    EventWaitsTime(s)Avg wait (ms)% DB timeWait Class
    db file sequential read649,57615,4692447.33User I/O
    DB CPU 10,451 31.98 
    db flash cache single block physical read11,217,3035,044015.43User I/O
    buffer busy waits71,3231,845265.65Concurrency
    log file switch (checkpoint incomplete)2,3052511090.77Configuration


    There are over 11 million "db flash cache single block physical read" waits which took about 0.44ms on average (AWR reports it as 0ms). "db file sequential read" waits are over 600,000. This means we had a high flash cache hit ratio, most of the reads were coming from the flash cache, not the disks.

    This is the wait event histogram from the same AWR report.




    % of Waits
    Event
    Total Waits
    <1ms
    <2ms
    <4ms
    <8ms
    <16ms
    <32ms
    <=1s
    >1s
    db flash cache single block physical read
    11.2M
    99.0
    .9
    .1
    .0
    .0
    .0
    .0


    99% of all flash cache single block reads were under 1ms, none of them are over 4ms.

    db flash cache multiblock physical read


    "db flash cache multiblock physical read" is the flash cache equivalent of "db file scattered read". It is the event we see when we are reading multiple blocks from the flash cache. The AWR report I am using in this post does not contain many multiblock operations but here is the event from the foreground wait events section.


    Event
    Waits
    %Time -outs
    Total Wait Time (s)
    Avg wait (ms)
    Waits /txn
    % DB time
    db flash cache multiblock physical read
    1,048
    0
    1
    1
    0.00
    0.00


    We had 1048 waits of 1ms on average.



     
    % of Waits
    Event
    Total Waits
    <1ms
    <2ms
    <4ms
    <8ms
    <16ms
    <32ms
    <=1s
    >1s
    db flash cache multiblock physical read
    1171
    83.2
    12.4
    3.2
    .9
    .3




    The wait event histogram shows that again most of the waits are below 1ms with some being up to 16ms.

    write complete waits: flash cache

    This is the wait event we see when DBWR is trying to write a block from the buffer cache to the flash cache and a session wants to modify that block. The session waits on this event and goes on with the update after DBWR finishes his job. You can find a good explanation and a good example for this in Guy Harrison's post. Here is the event and its histogram in the same AWR report I have.


    Event
    Waits
    %Time -outs
    Total Wait Time (s)
    Avg wait (ms)
    Waits /txn
    % DB time
    write complete waits: flash cache
    345
    0
    1
    4
    0.00
    0.00





    % of Waits
    Event
    Total Waits
    <1ms
    <2ms
    <4ms
    <8ms
    <16ms
    <32ms
    <=1s
    >1s
    write complete waits: flash cache
    345
    22.9
    23.2
    27.2
    19.7
    5.5
    .3
    1.2


    I had 345 waits in 30 mins with an average time of 4ms. But the important thing is the contribution of this event to DB time was only 1 second in a 30 minute workload.

    You can see that this event starts climbing up in some situations, especially if you are having a problem with your flash cache device and writes to it start to take longer.

    Here is one case where poor DBWR processes are struggling to write to the flash cache. Database control was showing high waits in the "Configuration" class and those waits were "write complete waits", "free buffer waits" and "write complete waits: flash cache". This was all because my flash cache device was a single conventional HDD, not even an SSD. After changing the db_flash_cache_file parameter to use the F5100 array this picture was not seen anymore.



    Thursday, December 30, 2010

    Sizing the SGA or the buffer cache when enabling the database flash cache

    The documentation about using the database flash cache feature recommends increasing db_cache_size or sga_target or memory_target, whichever you are using, to account for the metadata for the blocks kept in the flash cache. This is because for each block in the flash cache, some metadata is kept in the buffer cache. The recommendation is to add 100 bytes for a single instance database and 200 bytes for RAC multiplied by the number of blocks that can fit in the flash cache. So you need to calculate how many blocks the flash cache can keep (by dividing db_flash_cache_size in bytes with the block size) and increase the buffer cache.

    If you do not do this and the buffer cache size is too low, it is automatically increased to cater for this metadata. It also means you will have less space for actual data blocks if you do not increase your buffer cache size.

    I happened to learn this adjustment by chance and it gave me a chance to calculate exactly how much space the flash cache metadata needs.

    This is on a single instance 11.2.0.1 database on an M4000 server running Solaris 10.

    I start with db_cache_size=320m (I am not using sga_target or memory_target because I want to control the size of the buffer cache explicitly) and db_flash_cache_size=20g, the instance starts up without any errors or warnings but the alert log shows:


    The value of parameter db_cache_size is below the required minimum
    The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
    WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 335544320. Either decrease the size of L2 cache, or increase the pool size to
    671088640

    If you look at the db_cache_size at this point it shows 448m. The database automatically increased 320m to 448m. It also warns me that most of this space will be used for the flash cache metadata. This is a server with 32 CPUs (cores actually) so I multiply this by 4m, it makes 128m which is the space that will be used for actual data blocks. The remaining 320m will be used for the flash cache metadata. I have 20g of flash cache and my block size is 8K, this means 2,621,440 blocks can fit in there. Let's see how much space is needed for the metadata on one block, since I have 320m for the metadata I convert it to bytes and divide by the number of blocks, 320*1024*1024/2621440, which gives me 128 bytes.

    The documentation states 100 bytes for a single instance database but it is actually a little bit higher.

    Another case to verify. This time I start with db_cache_size=448m and db_flash_cache_size=60g. Similar messages are written to the alert log again.


    The value of parameter db_cache_size is below the required minimum
    The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
    WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 469762048. Either decrease the size of L2 cache, or increase the pool size to
    2013265920

    When I look at db_cache_size now I see that it is increased to 1088m.

    Of the 1088m buffer cache, again 128m will be used for data blocks, the remaining 960m is for the flash cache metadata. 60g of flash can hold 7,864,320 blocks, doing the math again tells me that the metadata for a single block is again 128 bytes.

    If you are starting with a small buffer cache, remember to check the alert log and the current size of the buffer cache. If it is already high and you do not see any adjustments be aware that you will use 128 bytes for the metadata for each block. This means you will have less memory for data blocks. It is a good practice to calculate this need beforehand and size the buffer cache accordingly.

    Wednesday, December 29, 2010

    Using an ASM diskgroup as flash cache

    We have been testing the F5100 flash array in our humble lab (borrowed that term from a colleague, he knows who he is). There are two ways to use it, one is to place your datafiles on it, the other is to use it as the database flash cache.

    The database flash cache feature came with 11.2 and is a way to extend the SGA. It is not the same thing as the flash cache in Exadata, read Kevin Closson's this post to find out what the difference is. F5100 is one of the products you can use as the flash cache, the other is the F20 card.

    It is possible and (may be the best option) to use ASM to configure the flash cache. The documentation states that you can use a filename or an ASM diskgroup name for db_flash_cache_file parameter which is the parameter to enable the flash cache feature.

    So, how do we do this?

    The first step is creating an ASM diskgroup on the flash devices. In my test, for simplicity I use one flash device which is /dev/rdsk/c1t9d0s0. This is just one flash module (of size 24GB) from F5100. The process for creating the diskgroup is no different than creating a diskgroup on conventional disks, just make sure your asm_diskstring includes the flash device path. By using asmca I created a diskgroup named FLASH using external redundancy on this single flash device.

    Then following the documentation I set the parameters to enable the flash cache.


    SQL> alter system set db_flash_cache_size=20G scope=spfile;


    System altered.


    SQL> alter system set db_flash_cache_file='+FLASH' scope=spfile;

    System altered.

    Now time to restart to make the new parameters effective.

    SQL> startup force;
    ORACLE instance started.

    Total System Global Area 2606465024 bytes
    Fixed Size                  2150840 bytes
    Variable Size            2113932872 bytes
    Database Buffers          469762048 bytes
    Redo Buffers               20619264 bytes
    Database mounted.
    ORA-03113: end-of-file on communication channel
    Process ID: 17560
    Session ID: 2471 Serial number: 3

    The instance fails to start. Looking at the alert log file we see this.

    Errors in file /u01/app/oracle/diag/rdbms/flash/flash/trace/flash_dbw0_18053.trc:
    ORA-01565: error in identifying file '+FLASH'
    ORA-17503: ksfdopn:2 Failed to open file +FLASH
    ORA-15045: ASM file name '+FLASH' is not in reference form
    DBW0 (ospid: 18053): terminating the instance due to error 1565

    So it seems that we need to specify the actual file name, not the name of the diskgroup. Let's see what we have in the diskgroup FLASH using amscmd.

    ASMCMD> pwd
    +FLASH/FLASH/FLASHFILE
    ASMCMD> ls
    bufpool#2147472020.256.739041793

    Even if we got an error when starting the instance there is a file created under +FLASH/FLASH/FLASHFILE. The first FLASH in the path is the diskgroup name, the second is my database name.

    What we need to do is set this file as the db_flash_cache_file parameter to be able to start the instance with flash cache enabled. I edit my text init.ora file and change the parameter.

    db_flash_cache_file='+FLASH/FLASH/FLASHFILE/bufpool#2147472020.256.739041793'

    Now we can start the instance.

    SQL> create spfile from pfile='/tmp/x.ora';

    File created.

    SQL> startup
    ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
    ORACLE instance started.

    Total System Global Area 2606465024 bytes
    Fixed Size                  2150840 bytes
    Variable Size            2113932872 bytes
    Database Buffers          469762048 bytes
    Redo Buffers               20619264 bytes
    Database mounted.
    Database opened.
    SQL> show parameter flash_cache

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    db_flash_cache_file                  string      +FLASH/flash/flashfile/bufpool
                                                     #2147472020.256.739041793
    db_flash_cache_size                  big integer 20G

    The instance is up and I can see that the flash cache is enabled and using the correct ASM diskgroup.

    The documentation states that we can use the diskgroup name but as we saw it needs some correction.

    A problem and a lesson

    The path I followed to this point is a little embarrassing and teaches a lesson about setting parameters.

    At first I used db_flash_cache_file='FLASH' assuming it would use the ASM diskgroup. After restarting the instance I immediately started a workload on the database to see the effects of the flash cache. Here is what I saw in the Enterprise Manager performance page.


    The waits of class "Configuration" were holding back the database. When I clicked on "Configuration" link I saw this.



    The system was mostly waiting on "free buffer waits", "write complete waits" and "write complete waits: flash cache". This is because of DBWR. Someone has to write the blocks from the buffer cache to the flash when the buffer cache is full. This process is DBWR. Since I love using the DTrace toolkit I used the iosnoop script in that toolkit to find out what DBWR was doing. iosnoop can show you what file a process reading from or writing to most. So I ran iosnoop for one of the DBWR processes.

    bash-3.00# ./iosnoop -p 16631
      UID   PID D     BLOCK   SIZE       COMM PATHNAME
      101 16631 W    175311   8704     oracle
      101 16631 W 246883104   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
      101 16631 W 246883104   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
      101 16631 W    175328   8704     oracle
      101 16631 W 246883120   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
      101 16631 W 246883120   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
      101 16631 W    175345   8704     oracle
      101 16631 W 246883136   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
      101 16631 W 246883136   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH

    The file it is trying to write to is $ORACLE_HOME/dbs/FLASH and since that directory is on a slow local disk it is taking a long time and causing the waits.

    If I had looked at the db_flash_cache_file parameter after restarting the database to see if it was alright I would have seen this before starting the workload. So, once more it teaches to check if the parameter was set the way you want it before taking further action.