Db2 Catalog Cache – unrecorded overflows

All application Db2 processing initially requires access to the catalog cache for CONNECT processing. Then, use of the following SQL statements will require access to the catalog cache for authorisation checking

  • any ALTER statement; AUDIT; CALL; CLOSE; COMMENT; any CREATE statement; DECLARE CURSOR; DECLARE GLOBAL TEMPORARY TABLE; DELETE; DROP; EXECUTE; EXECUTE IMMEDIATE; EXPLAIN; FETCH; any FLUSH statement; any GRANT; INSERT; LOCK TABLE; MERGE; OPEN; PREPARE; REFRESH TABLE; any RENAME; any REVOKE; SELECT; SET EVENT MONITOR STATE; SET INTEGRITY; SET PASSTHRU; SET SESSION AUTHORIZATION; SET USAGE LIST STATE; SET variable; TRANSFER OWNERSHIP; TRUNCATE; UPDATE; VALUES INTO;

In addition to the above staements, other statements have a caveat of " … No privileges are required to invoke the <statement> statement. However, the authorization ID of the statement must hold the necessary privileges to invoke the SQL statements and search condition that are embedded in the <statement> statement".

The catalog cache is one of the most important memory areas in a Db2 database, if not the most important area. With a poorly performing catalog cache your whole system is performing poorly.

Background

I identified the anomaly of unrecorded overflows whilst performing a case study with an off-site Db2 analysis service I've designed, MetricsAnalysis for IBM Db2. There were 4 analysis phases in the case study, with changes being applied between the phases. A summary for the analysis of the memory consumption of the catalog cache is is follows

  1. For the first phase, the initial size of the catalog cache was 256 pages but the workload used 352 pages with the high water mark being 720 pages. As expected, the monitor element CAT_CACHE_OVERFLOWS indicated there were overflows (2).

  2. For the second phase, the initial size of the catalog cache remained at 256 pages but the workload used 496 pages with the high water mark being 752 pages. As expected, CAT_CACHE_OVERFLOWS indicated there were overflows (3).

  3. For the third phase, the initial size of the catalog cache was increased to 800 pages. This time, the workload used 1328 pages with the high water mark being 1328 pages. UNEXPECTEDLY, there were no CAT_CACHE_OVERFLOWS indicated.

  4. For the fourth and final phase, the initial size of the catalog cache remained at 800 pages. This time, the workload used 1232 pages with the high water mark being 1376 pages. Yet again, there were no CAT_CACHE_OVERFLOWS indicated.

In the third and fourth phases, the fact Db2 records that it used more memory than configured but doesn't record an overflow doesn't seem right.

Catalog Cache Contents

catalogcache contentsThe diagram illustrates the Db2 internal format areas that are stored in the catalog cache for Db2 LUW 10.5.7, obtained using the command: db2pd -db <dbname> -catalogcache

The top of the output from db2pd contains a summary of the memory (not shown)

  • Configured Size - The number of bytes as specified by the catalogcache_sz database configuration parameter.
  • Current Size - The current number of bytes used in the catalog cache.
  • Maximum Size - The maximum amount of memory that is available to the cache (up to the maximum database global memory).
  • High Water Mark - The largest physical size reached during processing.

Not all of the areas contained entries. All, except for the *AUTH, have a status field which means that you can have multiple entries with a different status (Valid/Invalid/Soft Invalid).

Analysis

The first thing to analyse was the processing within MetricsAnalysis for IBM Db2, to see if it has introduced the anomaly.

MetricsAnalysis for IBM Db2 collects 23 sets of data, approximately 2100 data points, and processes them into a bespoke data warehouse and data mart enhancing the collected data into 70 objects with over 11000 data points. This facilitates a Best Practice multi-dimensional analysis every time.

There was nothing to show the anomaly was introduced.

Next I checked the IBM documentation to see if there was a hint that this can occur, but there was nothing I could find to describe this scenario.

I then checked if there is any outstanding maintenance that addresses the issue. Nothing.

After a few varied searches I found a reference on the Columbia University of New York website, https://www1.columbia.edu/sec/acis/db2/db2help/db2h2184.htm. "Once a transaction has experienced a catalog cache overflow, all subsequent attempts by the same transaction to insert table descriptor information into the catalog cache will report an overflow."

This describes exactly the behaviour observed. The storage area overflows but this is not recorded if the transaction causing the overflow doesn't try to insert again into the catalog cache.

What are the monitoring implications and how can you track if this scenario is occurring?

Monitoring Implications

There are a number of things to be considered if your system is experiencing a catalog cache overflow and CAT_CACHE_OVERFLOWS is not being incremented. The way in which you have configured your system to monitor for this may be masking the overflows.

  • db2diag scraping

If you are using a log file scraper or the db2diag utility and you are looking for message ADM4000W to identify the condition, but it is not triggered for the above reason, you will miss the fact the overflow is occurring.

  • cat_cache_overflows

If you are using the cat_cache_overflows monitor element, but it is not incremented for the above reason, you will miss the fact the overflow is occurring.

Tracking the condition

To track the condition you will need to

  1. Retrieve the database configured size (catalogcache_sz) and convert to bytes.
  2. Obtain the memory high water mark used for the catalog cache and convert to bytes
  3. Compare to see if an overflow has occurred

Use SQL like the following to get the catalogcache_sz in bytes

SELECT VALUE * 4096 AS CATALOGCACHE_SZ_BYTES
FROM SYSIBMADM.DBCFG
WHERE NAME ='catalogcache_sz'

or use the Db2 CLP command and multiply the result by 4096 to get the number of bytes.

db2 get db cfg for <dbname>|grep -i catalogcache_sz

Next, obtain the high water mark used for the catalog cache. Use the following SQL to obtain the memory used by the catalog cache in bytes, it is reported in KB.

SELECT MEMORY_POOL_USED * 1024 AS CAT_CACHE_USED_BYTES, MEMORY_POOL_USED_HWM * 1024 AS CAT_CACHE_USED_HWM_BYTES
FROM TABLE( MON_GET_MEMORY_POOL('DATABASE', CURRENT SERVER, -1))
WHERE MEMORY_POOL_TYPE = 'CAT_CACHE'

If CAT_CACHE_USED_HWM_BYTES > CATALOGCACHE_SZ_BYTES then an overflow has occurred.

  • memory_pool_used – current amount of committed memory allocated in KB
  • memory_pool_used_hwm – maximum amount of committed memory allocated in KB

You can combine the above SQL statements into a single statement.

SELECT CATALOGCACHE_SZ_BYTES, CAT_CACHE_USED_BYTES, CAT_CACHE_USED_HWM_BYTES,
CASE WHEN CAT_CACHE_USED_HWM_BYTES > CATALOGCACHE_SZ_BYTES THEN 1 ELSE 0 END OVERFLOW_IND,
CASE WHEN CAT_CACHE_USED_HWM_BYTES > CATALOGCACHE_SZ_BYTES THEN CAT_CACHE_USED_HWM_BYTES /4096 END CATALOGCACHE_SZ_REC
FROM
TABLE(SELECT VALUE * 4096 AS CATALOGCACHE_SZ_BYTES
FROM SYSIBMADM.DBCFG
WHERE NAME ='catalogcache_sz') AS A JOIN TABLE(
SELECT MEMORY_POOL_USED * 1024 AS CAT_CACHE_USED_BYTES, MEMORY_POOL_USED_HWM * 1024 AS CAT_CACHE_USED_HWM_BYTES
FROM TABLE( MON_GET_MEMORY_POOL('DATABASE', CURRENT SERVER, -1))
WHERE MEMORY_POOL_TYPE = 'CAT_CACHE') AS B
ON 1=1

If OVERFLOW_IND = 1 then an overflow has occurred and CATALOGCACHE_SZ_REC is the value that can be used to set catalogcache_sz, calculated from CAT_CACHE_USED_HWM_BYTES.

Note

It is recommended to monitor for the overflow over a period of time in case there is a one-off spike. 

Processing implications

There are a number of implications if your system is experiencing this anomaly.

  1. In deleting and re-inserting data into the catalog cache, Db2 is redoing reusable work it previously did. This is an expensive process that should be avoided where possible. Note that this may be occurring if entries become invalidated. That can occur, for example, if DDL or RUNSTATs is run and changes objects in the catalog cache.

  2. A higher number of catalog accesses required via the IBMDEFAULTBP bufferpool, causing an increase in logical reads and hence an increase in CPU.

To address the issue, a number of configuration choices that relate to the database memory need to be made.

If an increase in the catalogcache_sz is to be made then the initial allocation for dbheap should be increased first by the same amount. This is required as the catalogcache_sz is allocated from the initial dbheap size.

The dbheap is allocated from database_memory, so this may need to be increased too.

Recommendation

The IBM documentation states in the description for cat_cache_lookups, that a hit-rate of over 80% is acceptable for the catalog cache. Other recommendations are >95% . Is this measurement a true indication of the efficiency of the catalog cache? To see that we need at least 2 other counters

  • cat_cache_deletes – incremented when an entry is deleted

  • cat_cache_invalidates – incremented when an entry is invalidated

A better indicator is the catalog cache HWM. Using the calculation in the section Tracking the condition, if CAT_CACHE_USED_HWM_BYTES > CATALOGCACHE_SZ_BYTES then consider increasing the catalogcache_sz to the HWM and continue to monitor.

IBMDEFAULTBP

When considering the overall performance of the Db2 system, the catalog cache isn't the only area of importance, we also need to look at the catalog bufferpool, IBMDEFAULTBP. - this should be reserved for the catalog objects only. Sharing it with non-catalog objects, even if they are deemed 'read only', introduces the possibility of slowing the catalog processing. 

If you don't share the IBMDEFAULTBP bufferpool with any other objects then expect a reduction in accesses as you increase catalogcache_sz. There should be a double saving in CPU

  1. Less accesses (logical reads) to IBMDEFAULTBP means less CPU is being used.

  2. Less converting of catalog table descriptors into internal format means less CPU is being used.

In addition, an improved throughput in your system as less time is being spent by Db2 redoing reusable work previously completed.

Conclusion

The anomaly described above should not be occurring. Db2 should be maintaining its overflow counters without condition - an overflow is an overflow. 

Monitoring of Db2 for overflow may be looking for a message in the db2diag log or looking for an increment in the counter. If neither of these are occurring then overflows may be happening more frequently than thought. As indicated in Recommendation, monitoring the high water mark and comparing it to the configured size is a more reliable way to determine if an overflow has occurred.

Note

It is possible that due to the processing in your system, a high hit rate may not be achievable. Avoiding overflows is just as important as achieving a high hit rate. The objective is to reduce unnecessary processing where possible.

MetricsAnalysis

MetricsAnalysis for IBM Db2 is a low cost high value knowledge as a service offering for analysing Db2, its use and its configuration. Anonymised run-time Db2 metrics are collected on the client side and provided to MetricsAnalysis.

The metrics are processed through a data warehouse and data mart, enabling a best practice multi-dimensional analysis to be performed every time. This includes analysing the memory areas, their configuration and high water mark usage. The output is a document that includes detailed instructions on re-configuring Db2 to match its use - the workload.

There are potentially huge gains that can be achieved via configuration changes. These are shown in the case study  which highlighted this anomaly and lead me to write this blog.

Please leave a comment or contact us for more information or to organise an analysis.