Thursday, December 30, 2010

Sizing the SGA or the buffer cache when enabling the database flash cache

The documentation about using the database flash cache feature recommends increasing db_cache_size or sga_target or memory_target, whichever you are using, to account for the metadata for the blocks kept in the flash cache. This is because for each block in the flash cache, some metadata is kept in the buffer cache. The recommendation is to add 100 bytes for a single instance database and 200 bytes for RAC multiplied by the number of blocks that can fit in the flash cache. So you need to calculate how many blocks the flash cache can keep (by dividing db_flash_cache_size in bytes with the block size) and increase the buffer cache.

If you do not do this and the buffer cache size is too low, it is automatically increased to cater for this metadata. It also means you will have less space for actual data blocks if you do not increase your buffer cache size.

I happened to learn this adjustment by chance and it gave me a chance to calculate exactly how much space the flash cache metadata needs.

This is on a single instance 11.2.0.1 database on an M4000 server running Solaris 10.

I start with db_cache_size=320m (I am not using sga_target or memory_target because I want to control the size of the buffer cache explicitly) and db_flash_cache_size=20g, the instance starts up without any errors or warnings but the alert log shows:


The value of parameter db_cache_size is below the required minimum
The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 335544320. Either decrease the size of L2 cache, or increase the pool size to
671088640

If you look at the db_cache_size at this point it shows 448m. The database automatically increased 320m to 448m. It also warns me that most of this space will be used for the flash cache metadata. This is a server with 32 CPUs (cores actually) so I multiply this by 4m, it makes 128m which is the space that will be used for actual data blocks. The remaining 320m will be used for the flash cache metadata. I have 20g of flash cache and my block size is 8K, this means 2,621,440 blocks can fit in there. Let's see how much space is needed for the metadata on one block, since I have 320m for the metadata I convert it to bytes and divide by the number of blocks, 320*1024*1024/2621440, which gives me 128 bytes.

The documentation states 100 bytes for a single instance database but it is actually a little bit higher.

Another case to verify. This time I start with db_cache_size=448m and db_flash_cache_size=60g. Similar messages are written to the alert log again.


The value of parameter db_cache_size is below the required minimum
The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 469762048. Either decrease the size of L2 cache, or increase the pool size to
2013265920

When I look at db_cache_size now I see that it is increased to 1088m.

Of the 1088m buffer cache, again 128m will be used for data blocks, the remaining 960m is for the flash cache metadata. 60g of flash can hold 7,864,320 blocks, doing the math again tells me that the metadata for a single block is again 128 bytes.

If you are starting with a small buffer cache, remember to check the alert log and the current size of the buffer cache. If it is already high and you do not see any adjustments be aware that you will use 128 bytes for the metadata for each block. This means you will have less memory for data blocks. It is a good practice to calculate this need beforehand and size the buffer cache accordingly.

Wednesday, December 29, 2010

Using an ASM diskgroup as flash cache

We have been testing the F5100 flash array in our humble lab (borrowed that term from a colleague, he knows who he is). There are two ways to use it, one is to place your datafiles on it, the other is to use it as the database flash cache.

The database flash cache feature came with 11.2 and is a way to extend the SGA. It is not the same thing as the flash cache in Exadata, read Kevin Closson's this post to find out what the difference is. F5100 is one of the products you can use as the flash cache, the other is the F20 card.

It is possible and (may be the best option) to use ASM to configure the flash cache. The documentation states that you can use a filename or an ASM diskgroup name for db_flash_cache_file parameter which is the parameter to enable the flash cache feature.

So, how do we do this?

The first step is creating an ASM diskgroup on the flash devices. In my test, for simplicity I use one flash device which is /dev/rdsk/c1t9d0s0. This is just one flash module (of size 24GB) from F5100. The process for creating the diskgroup is no different than creating a diskgroup on conventional disks, just make sure your asm_diskstring includes the flash device path. By using asmca I created a diskgroup named FLASH using external redundancy on this single flash device.

Then following the documentation I set the parameters to enable the flash cache.


SQL> alter system set db_flash_cache_size=20G scope=spfile;


System altered.


SQL> alter system set db_flash_cache_file='+FLASH' scope=spfile;

System altered.

Now time to restart to make the new parameters effective.

SQL> startup force;
ORACLE instance started.

Total System Global Area 2606465024 bytes
Fixed Size                  2150840 bytes
Variable Size            2113932872 bytes
Database Buffers          469762048 bytes
Redo Buffers               20619264 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 17560
Session ID: 2471 Serial number: 3

The instance fails to start. Looking at the alert log file we see this.

Errors in file /u01/app/oracle/diag/rdbms/flash/flash/trace/flash_dbw0_18053.trc:
ORA-01565: error in identifying file '+FLASH'
ORA-17503: ksfdopn:2 Failed to open file +FLASH
ORA-15045: ASM file name '+FLASH' is not in reference form
DBW0 (ospid: 18053): terminating the instance due to error 1565

So it seems that we need to specify the actual file name, not the name of the diskgroup. Let's see what we have in the diskgroup FLASH using amscmd.

ASMCMD> pwd
+FLASH/FLASH/FLASHFILE
ASMCMD> ls
bufpool#2147472020.256.739041793

Even if we got an error when starting the instance there is a file created under +FLASH/FLASH/FLASHFILE. The first FLASH in the path is the diskgroup name, the second is my database name.

What we need to do is set this file as the db_flash_cache_file parameter to be able to start the instance with flash cache enabled. I edit my text init.ora file and change the parameter.

db_flash_cache_file='+FLASH/FLASH/FLASHFILE/bufpool#2147472020.256.739041793'

Now we can start the instance.

SQL> create spfile from pfile='/tmp/x.ora';

File created.

SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area 2606465024 bytes
Fixed Size                  2150840 bytes
Variable Size            2113932872 bytes
Database Buffers          469762048 bytes
Redo Buffers               20619264 bytes
Database mounted.
Database opened.
SQL> show parameter flash_cache

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_flash_cache_file                  string      +FLASH/flash/flashfile/bufpool
                                                 #2147472020.256.739041793
db_flash_cache_size                  big integer 20G

The instance is up and I can see that the flash cache is enabled and using the correct ASM diskgroup.

The documentation states that we can use the diskgroup name but as we saw it needs some correction.

A problem and a lesson

The path I followed to this point is a little embarrassing and teaches a lesson about setting parameters.

At first I used db_flash_cache_file='FLASH' assuming it would use the ASM diskgroup. After restarting the instance I immediately started a workload on the database to see the effects of the flash cache. Here is what I saw in the Enterprise Manager performance page.


The waits of class "Configuration" were holding back the database. When I clicked on "Configuration" link I saw this.



The system was mostly waiting on "free buffer waits", "write complete waits" and "write complete waits: flash cache". This is because of DBWR. Someone has to write the blocks from the buffer cache to the flash when the buffer cache is full. This process is DBWR. Since I love using the DTrace toolkit I used the iosnoop script in that toolkit to find out what DBWR was doing. iosnoop can show you what file a process reading from or writing to most. So I ran iosnoop for one of the DBWR processes.

bash-3.00# ./iosnoop -p 16631
  UID   PID D     BLOCK   SIZE       COMM PATHNAME
  101 16631 W    175311   8704     oracle
  101 16631 W 246883104   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
  101 16631 W 246883104   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
  101 16631 W    175328   8704     oracle
  101 16631 W 246883120   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
  101 16631 W 246883120   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
  101 16631 W    175345   8704     oracle
  101 16631 W 246883136   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
  101 16631 W 246883136   8192     oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH

The file it is trying to write to is $ORACLE_HOME/dbs/FLASH and since that directory is on a slow local disk it is taking a long time and causing the waits.

If I had looked at the db_flash_cache_file parameter after restarting the database to see if it was alright I would have seen this before starting the workload. So, once more it teaches to check if the parameter was set the way you want it before taking further action.

Monday, December 27, 2010

What does ISDEFAULT mean in v$parameter?

There is a common misunderstanding among DBAs about the column ISDEFAULT in the view v$parameter. Some think that when this column is TRUE it means the current value of the parameter is the default value. This leads to wrong conclusions and sometimes wrong settings for even production environments.

The documentation says this about this column:

"Indicates whether the parameter is set to the default value (TRUE) or the parameter value was specified in the parameter file (FALSE)"

This explanation is not a clear one and different people may understand different things from it.

This column is dependent on the setting in the parameter file when the instance is started. It does not show if the current value is the default or not. It only shows if the parameter is set in the parameter file or not. When it is TRUE it means that you did not set this in the parameter file when starting the instance. When it is FALSE it means this parameter was set in the parameter file when starting the instance.

Here is a real life case about this. When changing a few parameters an Exadata DBA accidentally sets the parameter cell_offload_processing to FALSE using an alter system command. When he tries to take back the settings he looks at v$parameter.ISDEFAULT to find out if cell_offload_processing=FALSE is the default setting. He sees that ISDEFAULT=TRUE and arrives at the wrong conclusion that cell_offload_processing=FALSE is the default value and leaves the parameter that way. This causes all Exadata storage offloading to be disabled and may cause query times to go over the roof.

Let's look at this with an example on 11.2.0.2. This is from a database created with default parameters.


SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        TRUE                           TRUE      FALSE

We see that the parameter is set to TRUE, ISMODIFIED is false meaning we did not modify the parameter and ISDEFAULT is TRUE meaning we did not set this parameter in the parameter file.

Now let's change it.


SQL> alter system set cell_offload_processing=FALSE;


System altered.


SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        FALSE                          TRUE      SYSTEM_MOD

After we set it FALSE we see that ISMODIFIED reflected the change, but ISDEFAULT is still TRUE. From this if a DBA concludes that this is the default value and takes action based on that, the result will be wrong. As we did not set this parameter in the parameter file when starting the instance, ISDEFAULT still shows TRUE.

Let's do a restart and look at v$parameter again.


SQL> startup force;
ORACLE instance started.


Total System Global Area  839282688 bytes
Fixed Size                  2231128 bytes
Variable Size             583009448 bytes
Database Buffers          251658240 bytes
Redo Buffers                2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        FALSE                          FALSE     FALSE

Now we see that ISDEFAULT is FALSE because the parameter was in the parameter file.

What if we set it the parameter to the default value.


SQL> alter system set cell_offload_processing=TRUE;


System altered.


SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        TRUE                           FALSE     SYSTEM_MOD

As you see even if the parameter has the default value now, ISDEFAULT still shows FALSE.

Again let's restart and look again.


SQL> startup force;
ORACLE instance started.


Total System Global Area  839282688 bytes
Fixed Size                  2231128 bytes
Variable Size             583009448 bytes
Database Buffers          251658240 bytes
Redo Buffers                2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        TRUE                           FALSE     FALSE

The parameter has the default value and ISDEFAULT is still FALSE after a restart. This is again because even if we set the parameter to the default value it was in the parameter file anyway.

What's the correct way to fix this? If we reset the parameter instead of setting it explicitly it will revert back to the default value.


SQL> alter system reset cell_offload_processing;


System altered.


SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        TRUE                           FALSE     FALSE


SQL> startup force;
ORACLE instance started.


Total System Global Area  839282688 bytes
Fixed Size                  2231128 bytes
Variable Size             583009448 bytes
Database Buffers          251658240 bytes
Redo Buffers                2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
  2  where name='cell_offload_processing';


NAME                           VALUE                          ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing        TRUE                           TRUE      FALSE

Now we are back to the defaults as the reset command removed the parameter from the parameter file.

So, do not count on the ISDEFAULT column when you are trying to find if the current value of a parameter is the default value or not. The documentation is the most reliable source to find out the default values of the parameters.

Friday, December 24, 2010

Creating listeners on the private network interfaces of RAC nodes or use that Exadata Infiniband network

In general usage, everyone configures tns listeners on the public interfaces of RAC nodes so that clients can connect through the public network. Conventionally the private network interfaces are used for the interconnect traffic so most people do not open them to the application because the application traffic may interfere with the interconnect messaging.

But what if we have a high-speed and high-bandwidth interconnect network that some applications can also use for fast communication to the nodes? Can we create tns listeners on the private interfaces, if we can how? This high-speed interconnect network is especially true for Exadata where we have an Infiniband private network that is used for RAC interconnect and for the communication between the storage nodes and the database nodes. Since it is a high-speed low-latency network the application servers or ETL servers can connect to that network and use it for fast data transfer.

To do this we need to create tns listeners on the private interfaces so that clients can connect through sqlnet. The following steps will show how I did this. I did this configuration on a half rack Database Machine but since I do not have access to that now the example here is based on a test Virtualbox system (which runs 11.2.0.2 RAC) so Infiniband is not here but the steps are the same for Exadata. Also DNS is not used, all addresses are defined in the /etc/hosts files of the nodes.

Here is my /etc/hosts file for this setup.


127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6


# Public
192.168.56.2   rac1.localdomain        rac1
192.168.56.3   rac2.localdomain        rac2
# Private
192.168.0.101   rac1-priv.localdomain   rac1-priv
192.168.0.102   rac2-priv.localdomain   rac2-priv
# Virtual
192.168.56.4   rac1-vip.localdomain    rac1-vip
192.168.56.5   rac2-vip.localdomain    rac2-vip
# SCAN
192.168.56.6   rac-scan.localdomain rac-scan

I have two nodes, rac1 and rac2. The private network is 192.168.0 and the public network is 192.168.56. After the default installation I have the default listeners created on the public interfaces, there is only one SCAN address as this is a test setup.

Assume I have an ETL server that is connected to the private network which needs to connect to the database through the private interface. What I need is a listener per node that is listening on the private IP addresses.

If you start with netca ( from the grid home because that is where the listeners run on 11.2) and try to create the listeners you will see that you will not be able to select the private network.


It will show only the public network because this selection is based on your virtual IP definitions. Since I do not have any VIPs on the private network I do not see it.

So the first step is to create VIPs on the private interfaces. I start by adding the new VIPs to the /etc/hosts files. These lines should be added to both nodes' /etc/hosts file.


# Private VIPs
192.168.0.103   rac1-ib.localdomain rac1-ib
192.168.0.104   rac2-ib.localdomain rac2-ib


With root I run "srvctl add vip" from the grid home to create the VIPs.


[root@rac1 ~]# cd /u01/app/11.2.0/grid/bin/
[root@rac1 bin]# ./srvctl add vip -n rac1 -A 192.168.0.103/255.255.255.0/eth1 -k 2
[root@rac1 bin]# ./srvctl add vip -n rac2 -A 192.168.0.104/255.255.255.0/eth1 -k 2

I created a VIP for each node on the eth1 interface which is the private interface and I have specified "-k 2" to indicate that the network number is 2. You can use "srvctl add vip -h" to see what options you have.

At this step if you look at the "ifconfig -a" output you will not see the new VIPs up because we have not started them up yet. Let's start them now.

[root@rac1 bin]# ./srvctl start vip -i rac1-ib

Now you will see the new IP up in the "ifconfig -a" output. This is the related line.


eth1:2    Link encap:Ethernet  HWaddr 08:00:27:8B:69:FB
          inet addr:192.168.0.103  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

We need to start the new VIP for the second node also. Go to the second node and repeat the same for that.

[root@rac2 bin]# ./srvctl start vip -i rac2-ib

We can now start netca and we see the new subnet there. You can create the listener by selecting the new subnet. I named mine as LISTENER_IB and chose 1522 as the port number.


After netca completes here is what I see on rac1.


[oracle@rac1 ~]$ lsnrctl status listener_ib


LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 13:36:20


Copyright (c) 1991, 2010, Oracle.  All rights reserved.


Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_IB
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                24-DEC-2010 13:34:46
Uptime                    0 days 0 hr. 1 min. 35 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/rac1/listener_ib/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.103)(PORT=1522)))
The listener supports no services
The command completed successfully

The listener is up and running but it does support any services yet. Here is my init.ora parameters related to the listener.

SQL> show parameter listen


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
listener_networks                    string
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=192.
                                                 168.56.4)(PORT=1521))))
remote_listener                      string      rac-scan:1521

I can change the local_listener parameter and add this listener so that my database registers with it. After that as the default installation sets the SCAN listener as the remote_listener SCAN will be able to direct connections to this listener as well. But, there is a problem with this. What happens if SCAN directs the connection from the ETL server to the public interface instead of the private one? Or vice-versa, what happens if it directs connections from the public network clients to the private interface? Users will get errors because they cannot reach the private network and the ETL server will get errors because it cannot reach the public network.

The correct way to register my database to the listeners is to use the listener_networks parameter. listener_networks is a new 11.2 parameter and serves the purpose of cross-registration when you have listeners on multiple networks. Basically with it you can say, "register my local listeners on the public interfaces to the SCAN listener, register my local listeners on the private interface to each other".

This way clients using SCAN will connect to the public interfaces and clients using the private network will connect to the private interfaces. Let's do it now.

Not to clutter the listener parameters with long tns descriptions let's add the definitions to the tnsnames.ora file and use the names instead. On both nodes's tnsnames.ora file residing in the database home I add these lines. Remember to change the hostnames to rac2 for the ones other ORCL_IB when editing the file on rac2.


ORCL_IB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-ib)(PORT = 1522))
  )


ORCL_PUBLIC_LOCAL=
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521))
  )


ORCL_PRIVATE_LOCAL=
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
  )

ORCL_PUBLIC_LOCAL will be used as the local listener for the public network, ORCL_PRIVATE_LOCAL will be used as the local listener for the private network, ORCL_IB will be used as the remote listener for the private network so that the local private listeners can register to each other and the SCAN address will be used as the remote listener for the public network.

Now time to change the parameters.

SQL> alter system set listener_networks='((name=public_network)(local_listener=orcl_public_local)(remote_listener=rac-scan:1521))','((name=priv_network)(local_listener=orcl_private_local)(remote_listener=orcl_ib))';

System altered.

SQL> alter system set remote_listener='';

System altered.

SQL> show parameter listen

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
listener_networks                    string      ((name=public_network)(local_l
                                                 istener=orcl_public_local)(rem
                                                 ote_listener=rac-scan:1521)),
                                                 ((name=priv_network)(local_lis
                                                 tener=orcl_private_local)(remo
                                                 te_listener=rac-scan:1521))
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=192.
                                                 168.56.4)(PORT=1521))))
remote_listener                      string

Now let's see what is registered to the new listener listener_ib.

[oracle@rac1 ~]$ lsnrctl status listener_ib

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 15:03:17

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_IB
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                24-DEC-2010 13:34:46
Uptime                    0 days 1 hr. 28 min. 32 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/rac1/listener_ib/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.103)(PORT=1522)))
Services Summary...
Service "orcl" has 2 instance(s).
  Instance "orcl1", status READY, has 2 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully

Both instances are registered and can be served by this listener. Also have a look at the scan listener to make sure we did not make a mess.

[oracle@rac1 ~]$ lsnrctl status listener_scan1

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 15:04:47

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                24-DEC-2010 12:06:02
Uptime                    0 days 2 hr. 58 min. 45 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/11.2.0/grid/log/diag/tnslsnr/rac1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.6)(PORT=1521)))
Services Summary...
Service "orcl" has 2 instance(s).
  Instance "orcl1", status READY, has 1 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully

It is OK too.

Now we can use this tnsnames entry to connect through the private network.

ORCL_ETL =
  (DESCRIPTION =
    (LOAD_BALANCE=on)
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-ib)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
    )
  )

It took quite a while for me to get this in writing, if you use this procedure and see problems or if you think there are points not explained clearly, let me know so I can edit the post, this way it will be more useful to others who may need it.

Wednesday, December 01, 2010

Applying Grid Infrastructure Patch Set 1 (GI PSU1) to non-RAC installations

If you are trying to patch an existing 11.2.0.1 installation on a non-RAC GI (also called single instance HA) system be aware that some steps will be different from the patch readme or the documentation.

Here is what I experienced.

The documentation for Grid Infrastructure 11.2 states that if you want to upgrade from 11.2.0.1 to 11.2.0.2 you need to apply some patches first.

"To upgrade existing 11.2.0.1 Oracle Grid Infrastructure installations to Oracle Grid Infrastructure 11.2.0.2, you must first do at least one of the following:

  • Patch the release 11.2.0.1 Oracle Grid Infrastructure home with the 9413827 and 9706490 patches.
  • Install Oracle Grid Infrastructure Patch Set 1 (GI PSU1) or Oracle Grid Infrastructure Patch Set 2 (GI PSU2)."
If you choose to apply GI PSU1 and follow the README there is a section that tells you to prepare the GI home.

"2.2.5 Prepare GI home for patch


      As an owner of Oracle GI home software owner, run the command:
      %/bin/srvctl stop home -o -s -n "

When I ran this on a single instance system I got:

[oracle@oel5 patches]$ srvctl stop home -o /u01/app/product/11.2.0/grid -s /tmp/stat.txt
PRCH-1002 : Failed  to stop resources running from  crs home /u01/app/product/11.2.0/grid
PRCH-1030 : one or more resources failed to stop PRCH-1026 : Failed to stop ASM
PRCD-1027 : Failed to retrieve database orcl
PRCD-1035 : orcl is the database unique name of a single instance database, not a cluster database

This is a problem with srvctl and this step must be skipped for non-RAC GI installations as the next step will stop GI. After this README tells you to unlock the grid home:

"You must invoke this script as root user to open protection on GI software files for patch application
      #/crs/install/rootcrs.pl -unlock"

rootcrs.pl is the script used for RAC environments. If you have non-RAC GI the script you need to use is roothas.pl. After changing rootcrs.pl with roothas.pl and unlocking the grid home you can continue with applying the patch.

MOS note 1089476.1 explains the steps to patch a non-RAC GI home. Keep this note in mind when you are working on such a system.