Home | Ciber
Knowledge page of Ciber Netherlands

OBIEE cache management.

Some remarks upfront:

  • OBIEE is NOT a database. It’s a tool to plough trough multidimensional data out of a data warehouse or datamart.
  • OBIEE is NOT an extraction tool. It’s not designed to extract 100k+ rows to a CSV or XLS file.
  • OBIEE reports returning more then 1000 rows are often use as source for other tools (like Excel) and don’t contain any information for the user without having him/her plough trough the data manually.
  • OBIEE caching is NOT in memory! It’s a file based system.

How does the OBIEE cache work?
The program first hashes the request string, looks in the cache directory is there is already a valid file present, if not it will execute the request against the database.
If you have a close look at your cache directory you will see that there it’s collection of “.TBL” files. If you open the file in an editor you will recognize things like, user, repository, execute request and of course the resulting data.
The cache entries are personal, so a different user executing the same request will lead to a new cache entry, especially if you have row level secrurity inside OBIEE.
Since all the cache is written to disk, the cache directory needs to be on the quickest drive available preferable with its own disk controller.
Before we go in to detail over the possible cache settings we first have to ask ourselves what the valid reasons to use the OBIEE cache are.

  • The user executes the same request over and over within the “valid” period of the cache.
  • The result of the request is the rollup of a large amount of underlying data. But if the rollup level is greater then 1 to 100 you really should go back to your data warehouse design.

Caching a report because “it takes to long” is usually a sign that your data warehouse or data mart doesn’t meet the user requirements.

The parameters for the cache management can be found in NQConfig.ini.

ENABLE
To enable the cache set the ENABLE parameter to YES.

DATA_STORAGE_PATHS
This parameters specifies one or more directory paths for where the cached query results data is stored and are accessed when a cache hit occurs. The maximum capacity in bytes, kilobytes, megabytes or gigabytes. The maximum capacity for each path is 4 GB. For optimal performance, the directories specified should be on high performance storage systems.
Each directory listed needs to be an existing, fully-qualified, writable directory pathname, with double quotes ( " ) surrounding the pathname. Specify mapped directories only.
UNC path names (“\\server.name.edu\somefolder“) and network mapped drives are allowed only if the service runs under a qualified user account. To change the account under which the service is running, see the corresponding topic in the Oracle Business Intelligence Enterprise Edition Deployment Guide.
Specify multiple directories with a comma separated list.
When you specify more than one directory, they should reside on different physical drives. (If you have multiple cache directory paths that all resolve to the same physical disk, both available and used space may be double-counted.)
Example: DATA_STORAGE_PATHS = "d:\OracleBI\cache" 256MB, “f:\OracleBI\cache" 256MB ;
OBIEE fills uses these directories alternating, request 1 goes to first directory, request 2 goes to second directory, request 3 goes to first directory etc.

some notes:

  1. Don’t make a directory larger than 4 GB, this caused by the fact that some of the OBIEE core products are VC++ based which uses an unsigned 32 bit integer for the memory allocation.
  2. Working with an organization of 8000 users who run an average of 10 reports during the cache valid period (average cache entry size 10 KB), I have never seen the total allocated cache space become more then 1 GB during a 6 months monitoring period. If you really need a cache allocation of more then 2GB there is probably something seriously wrong with your DWH or DM.
  3. Invest in one or more small high-speed disks with low access times and large cache memories. Give the drive(s) its/their own controller. If your are on a SAN ask your SAN system manager to give you a high priority access drive, so you don’t have to “wait”. You might want to consider creating an "in memory" drive.
  4. An Oracle Business Intelligence Server defined as a clustered server does not share cached data. The DATA_STORAGE_PATHS entry needs be unique for each server defined as a cluster participant.
  5. Specifying more than one directory per drive does not improve performance, because file input and output (I/O) takes place through the same I/O controller. In general, specify only one directory per disk drive. Specifying multiple directories on different drives may improve the overall I/O throughput of the OBIEE Server internally by distributing I/O across multiple devices.
  6. Seeding the OBIEE cache with large tables (more then 1MB) as source for relative small requests ( less then 50kb) force OBIEE to do a full file read for every request. Most modern databases have large shared pools and can do much quicker this read from memory.
  7. Most datacentres have dedicated VLAN’s for communication between databases and OBIEE. Most of these connections are much quiker then reading from disk.

MAX_ROWS_PER_CACHE_ENTRY
Specifies the maximum number of rows in a query result set to qualify for storage in the query cache. Limiting the number of rows is a useful way to avoid using up the cache space with runaway queries that return large numbers of rows. If the number of rows a query returns is greater than the value specified in the MAX_ROWS_PER_CACHE_ENTRY parameter, the query is not cached. When set to 0, there is no limit to the number of rows per cache entry.
Default: MAX_ROWS_PER_CACHE_ENTRY = 100000 ;
Setting this any higher makes OBIEE an ETL tool.

MAX_CACHE_ENTRY_SIZE
Specifies the maximum size for a cache entry. Potential entries that exceed this size are not cached. The default size is 1 MB.
Specify GB for gigabytes, KB for kilobytes, MB for megabytes, and no units for bytes.
Example: MAX_CACHE_ENTRY_SIZE = 1 MB ; Altering this value should be done in conjunction with the setting of MAX_ROWS_PER_CACHE_ENTRY.

MAX_CACHE_ENTRIES
Specifies the maximum number of cache entries allowed in the query cache. Limiting the total number of cache entries provides another parameter with which to manage your cache storage. The actual limit of cache entries might vary slightly depending on the number of concurrent queries. The default value is 1000. Example: MAX_CACHE_ENTRIES = 1000 ; Rule of thumb number of users * average number of report per “cache period”.

POPULATE_AGGREGATE_ROLLUP_HITS
Specifies whether to aggregate data from an earlier cached query result set and create a new entry in the query cache for rollup cache hits. The default value is NO.
Typically, if a query gets a cache hit from a previously executed query, then the new query is not added to the cache. A user may have a cached result set containing information at a particular level of detail (for example, sales revenue by ZIP Code). A second query may ask for this same information, but at a higher level of detail (for example, sales revenue by state). The POPULATE_AGGREGATE_ROLLUP_HITS parameter overrides this default when the cache hit occurs by rolling up an aggregate from a previously executed query, in this example, by aggregating data from the first result set stored in the cache. That is, Oracle BI sales revenue for all ZIP Codes in a particular state can be added to obtain the sales revenue by state. This is referred to as a rollup cache hit.
Normally, a new cache entry is not created for queries that result in cache hits. You can override this behavior specifically for cache rollup hits by setting

POPULATE_AGGREGATE_ROLLUP_HITS to YES.
Nonrollup cache hits are not affected by this flag. If a query result is satisfied by the cache—that is, the query gets a cache hit—then this query is not added to the cache. When this parameter is set to YES, then when a query gets an aggregate rollup hit (for example, "sales by region" is answered from "sales by district, region") then the result is put into the cache. Setting this parameter to TRUE may result in better performance, but results in more entries being added to the cache.
Example: POPULATE_AGGREGATE_ROLLUP_HITS = YES ;

USE_ADVANCED_HIT_DETECTION
When caching is enabled, each query is evaluated to determine whether it qualifies for a cache hit. A cache hit means that the server was able to use cache to answer the query and did not go to the database at all. The Oracle BI Server can use query cache to answer queries at the same or later level of aggregation (Data Mart Automation).
The parameter USE_ADVANCED_HIT_DETECTION enables an expanded search of the cache for hits. The expanded search has a performance impact, which is not easily quantified because of variable customer requirements. Customers that rely heavily on query caching and are experiencing misses might want to test the trade-off between better query matching and overall performance for high user loads. Example:USE_ADVANCED_HIT_DETECTION = YES;
Reasons Why a Query is Not Added to the Cache:

  • Non-cacheable SQL element. If a SQL request contains Current_Timestamp, Current_Time, Rand, Populate, or a parameter marker then it is not added to the cache.
  • Non-cacheable table. Physical tables in the Oracle BI Server repository can be marked ‘noncacheable’. If a query references any non-cacheable table then the query results will not be added to the cache.
  • Cache hit. In general, if the query gets a cache hit on a previously cached query, then the results of the current query are not added to the cache. The exception is query hits that are aggregate roll-up hits.
  • Result set is too big.
    Query is cancelled. This can happen by explicit cancellation from Oracle BI Presentation Services or the Administration Tool, or implicitly through timeout.
  • Oracle BI Server is clustered. Queries that fall into the ‘cache seeding’ family are propagated throughout the cluster. Other queries continue to be stored locally. Therefore, even though a query may be put into the cache on Oracle BI Server node 1, it may not be on Oracle BI Server node 2.

MAX_SUBEXPR_SEARCH_DEPTH
The parameter MAX_SUBEXPR_SEARCH_DEPTH allows you to configure how deep the hit detector looks for an inexact match in an expression of a query. The default is 5. At the default level (5), for example, a query on the expression sin(cos(tan(abs(round(trunc(profit)))))) misses on Profit, which is at level 7. Changing the search depth to 7 opens up Profit for a potential hit. Advice set it to 99.

The [CACHE] part from your NQConfig.ini file could now look like this:
##################################################################
#
# Query Result Cache Section
#
##################################################################
[ CACHE ]
ENABLE = YES;
DATA_STORAGE_PATHS = "d:\OracleBI\cache" 1GB, “f:\OracleBI\cache" 1GB ;
MAX_ROWS_PER_CACHE_ENTRY = 10000 ;
MAX_CACHE_ENTRY_SIZE = 1 MB ;
MAX_CACHE_ENTRIES = 80000 ;
POPULATE_AGGREGATE_ROLLUP_HITS = YES;
USE_ADVANCED_HIT_DETECTION = YES;
MAX_SUBEXPR_SEARCH_DEPTH = 99;

If your OBIEE server is in a clustered environment use the following parameters for the “shared” part:

GLOBAL_CACHE_STORAGE_PATH
The physical location for storing cache entries shared across clustering. All clustering nodes share the same location.
Example: GLOBAL_CACHE_STORAGE_PATH = "" SIZE;

MAX_GLOBAL_CACHE_ENTRIES
The maximum number of cache entries stored in the location specified by

GLOBAL_CACHE_STORAGE_PATH.
Example: MAX_GLOBAL_CACHE_ENTRIES = 1000;

CACHE_POLL_SECONDS
The interval in seconds that each node pulls from the shared location specified in

GLOBAL_CACHE_STORAGE_PATH.
Example: CACHE_POLL_SECONDS = 300;

CLUSTER_AWARE_CACHE_LOGGING
Turns on logging for the cluster caching feature. Used only for troubleshooting. The default is NO.
Example: CLUSTER_AWARE_CACHE_LOGGING = NO;

 

Till Next Time

John Minkjan is a senior BI-consultant at Ciber in the Netherlands, the text of this article is also published on his personal blog http://obiee101.blogspot.com/

3 Comments so far

  1. Gilles December 18th, 2008 15:09

    Hallo John,

    Met veel interesse je artikel gelezen. Maar heb toch een vraagje. Jij stelt dat een cache result gebruiker specifiek is. Stel nu dat ik een groot rapport heb, waar als admin geen filters op zitten. Dat rapport wil ik graag in cache hebben, zodat de gebruikers die na mij inloggen gebruik kunnen maken van die cache (en de data gefilterd gepresenteerd krijgen naar hun gebruikersnaam).

    Als ik dit zou willen cachen, volgens jou verhaal, zou ik voor iedere gebruiker dit rapport in cache moeten laden (liefst ook nog op aparte schijven) en in aparte directories. Dat klinkt onlogisch. Misschien heb ik je verkeerd gelezen, maar wellicht kun je een en ander wat nader toelichten!

    Alvast bedankt,
    Gilles

  2. miguel January 15th, 2011 6:27

    thank you

  3. WMS January 16th, 2011 15:57

    Any data management system available will help an unorganized system if ever the administrator decides to implement it. And if ever the administration decides to fix or upgrade a system, I hope it will be an automated one in order to keep up with the fast pace of change in the market. An automated system will bring a different and easy task within the company.

Leave a reply