Showing posts with label flash cache. Show all posts
Showing posts with label flash cache. Show all posts

Tuesday, January 04, 2011

Database smart flash cache wait events

When you enable the database smart flash cache and start using it you will see new wait events related to that. These events help to find out if the problem is about the flash cache or not.

The ones I faced till now are "db flash cache single block physical read", "db flash cache multiblock physical read" and "write complete waits: flash cache". These are from a 11.2.0.1 database using the F5100 flash array as the database smart flash cache.

db flash cache single block physical read

"db flash cache single block physical read" is the flash cache equivalent of "db file sequential read". Read waits from the flash cache are not accounted for in the "db file sequential read" event and have their own wait event. The following is from an AWR report of 30 mins from a database using the database smart flash cache.


Top 5 Timed Foreground Events
    EventWaitsTime(s)Avg wait (ms)% DB timeWait Class
    db file sequential read649,57615,4692447.33User I/O
    DB CPU 10,451 31.98 
    db flash cache single block physical read11,217,3035,044015.43User I/O
    buffer busy waits71,3231,845265.65Concurrency
    log file switch (checkpoint incomplete)2,3052511090.77Configuration


    There are over 11 million "db flash cache single block physical read" waits which took about 0.44ms on average (AWR reports it as 0ms). "db file sequential read" waits are over 600,000. This means we had a high flash cache hit ratio, most of the reads were coming from the flash cache, not the disks.

    This is the wait event histogram from the same AWR report.




    % of Waits
    Event
    Total Waits
    <1ms
    <2ms
    <4ms
    <8ms
    <16ms
    <32ms
    <=1s
    >1s
    db flash cache single block physical read
    11.2M
    99.0
    .9
    .1
    .0
    .0
    .0
    .0


    99% of all flash cache single block reads were under 1ms, none of them are over 4ms.

    db flash cache multiblock physical read


    "db flash cache multiblock physical read" is the flash cache equivalent of "db file scattered read". It is the event we see when we are reading multiple blocks from the flash cache. The AWR report I am using in this post does not contain many multiblock operations but here is the event from the foreground wait events section.


    Event
    Waits
    %Time -outs
    Total Wait Time (s)
    Avg wait (ms)
    Waits /txn
    % DB time
    db flash cache multiblock physical read
    1,048
    0
    1
    1
    0.00
    0.00


    We had 1048 waits of 1ms on average.



     
    % of Waits
    Event
    Total Waits
    <1ms
    <2ms
    <4ms
    <8ms
    <16ms
    <32ms
    <=1s
    >1s
    db flash cache multiblock physical read
    1171
    83.2
    12.4
    3.2
    .9
    .3




    The wait event histogram shows that again most of the waits are below 1ms with some being up to 16ms.

    write complete waits: flash cache

    This is the wait event we see when DBWR is trying to write a block from the buffer cache to the flash cache and a session wants to modify that block. The session waits on this event and goes on with the update after DBWR finishes his job. You can find a good explanation and a good example for this in Guy Harrison's post. Here is the event and its histogram in the same AWR report I have.


    Event
    Waits
    %Time -outs
    Total Wait Time (s)
    Avg wait (ms)
    Waits /txn
    % DB time
    write complete waits: flash cache
    345
    0
    1
    4
    0.00
    0.00





    % of Waits
    Event
    Total Waits
    <1ms
    <2ms
    <4ms
    <8ms
    <16ms
    <32ms
    <=1s
    >1s
    write complete waits: flash cache
    345
    22.9
    23.2
    27.2
    19.7
    5.5
    .3
    1.2


    I had 345 waits in 30 mins with an average time of 4ms. But the important thing is the contribution of this event to DB time was only 1 second in a 30 minute workload.

    You can see that this event starts climbing up in some situations, especially if you are having a problem with your flash cache device and writes to it start to take longer.

    Here is one case where poor DBWR processes are struggling to write to the flash cache. Database control was showing high waits in the "Configuration" class and those waits were "write complete waits", "free buffer waits" and "write complete waits: flash cache". This was all because my flash cache device was a single conventional HDD, not even an SSD. After changing the db_flash_cache_file parameter to use the F5100 array this picture was not seen anymore.



    Thursday, December 30, 2010

    Sizing the SGA or the buffer cache when enabling the database flash cache

    The documentation about using the database flash cache feature recommends increasing db_cache_size or sga_target or memory_target, whichever you are using, to account for the metadata for the blocks kept in the flash cache. This is because for each block in the flash cache, some metadata is kept in the buffer cache. The recommendation is to add 100 bytes for a single instance database and 200 bytes for RAC multiplied by the number of blocks that can fit in the flash cache. So you need to calculate how many blocks the flash cache can keep (by dividing db_flash_cache_size in bytes with the block size) and increase the buffer cache.

    If you do not do this and the buffer cache size is too low, it is automatically increased to cater for this metadata. It also means you will have less space for actual data blocks if you do not increase your buffer cache size.

    I happened to learn this adjustment by chance and it gave me a chance to calculate exactly how much space the flash cache metadata needs.

    This is on a single instance 11.2.0.1 database on an M4000 server running Solaris 10.

    I start with db_cache_size=320m (I am not using sga_target or memory_target because I want to control the size of the buffer cache explicitly) and db_flash_cache_size=20g, the instance starts up without any errors or warnings but the alert log shows:


    The value of parameter db_cache_size is below the required minimum
    The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
    WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 335544320. Either decrease the size of L2 cache, or increase the pool size to
    671088640

    If you look at the db_cache_size at this point it shows 448m. The database automatically increased 320m to 448m. It also warns me that most of this space will be used for the flash cache metadata. This is a server with 32 CPUs (cores actually) so I multiply this by 4m, it makes 128m which is the space that will be used for actual data blocks. The remaining 320m will be used for the flash cache metadata. I have 20g of flash cache and my block size is 8K, this means 2,621,440 blocks can fit in there. Let's see how much space is needed for the metadata on one block, since I have 320m for the metadata I convert it to bytes and divide by the number of blocks, 320*1024*1024/2621440, which gives me 128 bytes.

    The documentation states 100 bytes for a single instance database but it is actually a little bit higher.

    Another case to verify. This time I start with db_cache_size=448m and db_flash_cache_size=60g. Similar messages are written to the alert log again.


    The value of parameter db_cache_size is below the required minimum
    The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
    WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 469762048. Either decrease the size of L2 cache, or increase the pool size to
    2013265920

    When I look at db_cache_size now I see that it is increased to 1088m.

    Of the 1088m buffer cache, again 128m will be used for data blocks, the remaining 960m is for the flash cache metadata. 60g of flash can hold 7,864,320 blocks, doing the math again tells me that the metadata for a single block is again 128 bytes.

    If you are starting with a small buffer cache, remember to check the alert log and the current size of the buffer cache. If it is already high and you do not see any adjustments be aware that you will use 128 bytes for the metadata for each block. This means you will have less memory for data blocks. It is a good practice to calculate this need beforehand and size the buffer cache accordingly.