Friday, August 31, 2007



The two are processed quite differently.

IN Clause

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;

The sub query is evaluated, distinct, indexed and then
joined to the original table -- typically.

As opposed to "EXIST" clause

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

for x in ( select * from t1 )
if ( exists ( select null from t2 where y = x.x )
end if
end loop

It always results in a full scan of T1 whereas the first query can make use of
an index on T1(x).

So, when is exists appropriate and in appropriate?

Lets say the result of the subquery
( select y from T2 )

is "huge" and takes a long time. But the table T1 is relatively small and
executing ( select null from t2 where y = x.x ) is fast (nice index on
t2(y)). Then the exists will be faster as the time to full scan T1 and do the
index probe into T2 could be less then the time to simply full scan T2 to build
the subquery we need to distinct on.

Lets say the result of the subquery is small -- then IN is typicaly more

If both the subquery and the outer table are huge -- either might work as well
as the other -- depends on the indexes and other factors.


Consider declaring NOT NULL columns

People sometimes do not bother to define columns as NOT NULL in the data dictionary, even though these columns should not contain nulls, and indeed never do contain nulls because the application ensures that a value is always supplied. You may think that this is a matter of indifference, but it is not. The optimizer sometimes needs to know that a column is not nullable, and without that knowledge it is constrained to choose a less than optimal execution plan.

1. An index on a nullable column cannot be used to drive access to a table unless the query contains one or more predicates against that column that exclude null values.

Of course, it is not normally desirable to use an index based access path unless the query contains such predicates, but there are important exceptions.

For example, if a full table scan would otherwise be required against the table and the query can be satisfied by a fast full scan (scan for which table data need not be read) against the index, then the latter plan will normally prove more efficient.

Test-case for the above reasoning

SQL> create index color_indx on automobile(color);

Index created.

SQL> select distinct color,count(*) from automobile group by color;

Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1046 Bytes=

1 0 SORT (GROUP BY) (Cost=4 Card=1046 Bytes=54392)
=1046 Bytes=54392)

SQL> alter table automobile modify color not null;

Table altered.

SQL> select distinct color,count(*) from automobile group by color;

Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1046 Bytes=

1 0 SORT (GROUP BY) (Cost=4 Card=1046 Bytes=54392)
ard=1046 Bytes=54392)

2. If you are calling a sub-query in a parent query using the NOT IN predicate, the indexing on column (in where clause of parent query) will not be used.

Because as per optimizer, results of parent query needs to be displayed only when there is no equi-matching from sub-query, And if the sub-query can potentially contain NULL value (UNKNOWN, incomparable), parent query will have no value to compare with NULL value, so it will not use the INDEX.

Test-case for the above Reasoning

SQL> create index sal_indx on emp(sal);

Index created.

SQL> create index ename_indx on emp(ename);

Index created.

SQL> select * from emp where sal not in (select sal from emp where ename='JONES');

Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=17 Card=13 Bytes=4

2 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Byt
es=518) –> you can see a full table scan even when index exist on SAL

ard=1 Bytes=10)


SQL> alter table emp modify sal not null;

Table altered.

SQL> select * from emp where sal not in (select sal from emp where ename='JONES');

Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=12 Bytes=56

1 0 MERGE JOIN (ANTI) (Cost=5 Card=12 Bytes=564)
ard=14 Bytes=518)

3 2 INDEX (FULL SCAN) OF 'SAL_INDX' (INDEX) (Cost=1 Card=1
4) -> Here you go, your index getting used now

4 1 SORT (UNIQUE) (Cost=3 Card=1 Bytes=10)
Card=1 Bytes=10)


The above article has been an inspiration after reading an article on ixora . The article was missing some of the testcases, so I thought of adding few for newbiews to relate to it.


Important point

If the statement is designed poorly, nothing much can be done by optimizer or indexes

Few known thumb rules

–Avoid Cartesian joins

–Use UNION ALL instead of UNION – if possible

–Use EXIST clause instead of IN - (Wherever appropiate)

–Use order by when you really require it – Its very costly

–When joining 2 views that themselves select from other views, check that the 2 views that you are using do not join the same tables!

–Avoid NOT in or NOT = on indexed columns. They prevent the optimizer from using indexes. Use where amount > 0 instead of where amount != 0

- Avoid writing where is not null. nulls can prevent the optimizer from using an index

- Avoid calculations on indexed columns. Write WHERE amount > 26000/3 instead of WHERE approved_amt/3 > 26000

- The query below will return any record where bmm_code = cORE, Core, CORE, COre, etc.

select appl_appl_id where upper(bmm_code) LIKE 'CORE%'

But this query can be very inefficient as it results in a full table scan. It cannot make use of the index on bmm_code.

Instead, write it like this:

select appl_appl_id from nci_appl_elements_t where (bmm_code like 'C%' or bmm_code like 'c%') and upper(bmm_code) LIKE 'CORE%'

This results in Index Range Scan.

You can also make this more efficient by using 2 characters instead of just one:

where ((bmm_code like 'CO%' or bmm_code like 'Co%' or bmm_code like 'cO%' or bmm_code like 'co%') and upper(bmm_code) LIKE 'CORE%')


The Clustering Factor

The clustering factor is a number which represent the degree to which data is randomly distributed in a table.

In simple terms it is the number of “block switches” while reading a table using an index.

Figure: Bad clustering factor

The above diagram explains that how scatter the rows of the table are. The first index entry (from left of index) points to the first data block and second index entry points to second data block. So while making index range scan or full index scan, optimizer have to switch between blocks and have to revisit the same block more than once because rows are scatter. So the number of times optimizer will make these switches is actually termed as “Clustering factor”.

Figure: Good clustering factor

The above image represents "Good CF”. In an event of index range scan, optimizer will not have to jump to next data block as most of the index entries points to same data block. This helps significantly in reducing the cost of your SELECT statements.

Clustering factor is stored in data dictionary and can be viewed from dba_indexes (or user_indexes)

SQL> create table sac as select * from all_objects;

Table created.

SQL> create index obj_id_indx on sac(object_id);

Index created.

SQL> select clustering_factor from user_indexes where index_name='OBJ_ID_INDX';


SQL> select count(*) from sac;


SQL> select blocks from user_segments where segment_name='OBJ_ID_INDX';


The above example shows that index has to jump 545 times to give you the full data had you performed full table scan using the index.

- A good CF is equal (or near) to the values of number of blocks of table.

- A bad CF is equal (or near) to the number of rows of table.

- Rebuilding of index can improve the CF.

Then how to improve the CF?

- To improve the CF, it’s the table that must be rebuilt (and reordered).
- If table has multiple indexes, careful consideration needs to be given by which index to order table.

Important point: The above is my interpretation of the subject after reading the book on Optimizer of Jonathan Lewis.


Are our cursors in shared pool shared at all?
How many DBA’s uses this feature of 9i (introduced in 8i but enhanced in 9i?)?

Actually, lot of us doesn’t use it all. Let’s first understand this feature and implement this in our systems.

CURSOR_SHARING is an init.ora parameter which decides whether a SQL send from user is a candidate for fresh parsing or will use an existing plan.

This parameter has 3 values.

1. CURSOR_SHARING = Exact (Default)

Definition: Share the plan only if text of SQL matches exactly with the text of SQL lying in shared pool

Let’s take an example

SQL> create table test1 (t1 number);

Table created.

SQL> insert into test1 values(1);

1 row created.

SQL> insert into test1 values(2);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from test1 where t1=1;


SQL> select * from test1 where t1=2;


SQL> select sql_text
2 from v$sql
3 where sql_text like 'select * from test1%'
4 order by sql_text;


select * from test1 where t1=1
select * from test1 where t1=2

As you see there were 2 statements in V$sql, so it generated 2 plans. Oracle had to do the same work again to generate the plan even when the difference between the two SQL was just literal value.

2. CURSOR_SHARING = Force (Introduced in 8.1.6)

Definition: Share the plan (forcibly) of a SQL if the text of SQL matches (except the literal values) with text of SQL in shared pool

This means if 2 SQL’s are same except their literal values, share the plan.

Let’s take an example:

I’m using the same table and data which is used in case of above example.

SQL> alter system flush shared_pool;

System altered.

SQL> alter session set cursor_sharing=force;

Session altered.

SQL> select * from test1 where t1=1;


SQL> select * from test1 where t1=2;


SQL> select sql_text
2 from v$sql
3 where sql_text like 'select * from test1%'
4 order by sql_text;


select * from test1 where t1=:"SYS_B_0"

You can see for both the statements there was only one entry in V$sql. This means for second occurrence, oracle did not generate a new plan.

This not only helps in savings DB server engine time for generating the plan but also helps in reducing the number of plans shared pool can hold.

Important note:

Cursor_sharing = force can have some flip behavior as well, so you must be careful to use this. Using this we are forcing oracle to use the same plan for 2(or more) SQL’s even when using the same plan may not be good for similar SQL’s.

Example: “where t1=2” may be a good candidate for index scan while “where t1=10” should use a full table scan because 90% of the rows in the table has t1=10 (assumption).

3. CURSOR_SHARING = SIMILAR (Introduced in 9i)

This is the tricky one, but most used.

Definition: SIMILAR causes statements that may differ in some literals, but are otherwise identical, to share a cursor, unless the literals affect either the meaning of the statement or the degree to which the plan is optimized. (Source: Oracle documentation)

Let’s understand this.
Re-quoting the example above > “where t1=2” may be a good candidate for index scan while “where t1=10” should use a full table scan because 90% of the rows in the table has t1=10 (assumption).

To avoid 2 statements using the same plan when the same plan is not good for one of them, we have cursor_sharing=similar

Let’s take an example:

SQL> alter system flush shared_pool;

System altered.

SQL> drop table test1;

Table dropped.

SQL> create table test1 (t1 number,t2 number);

Table created.

1 begin
2 for i in 1 .. 100 loop
3 insert into test1 values(1,i);
4 end loop;
5 commit;
6 update test1 set t1=2 where rownum <> /

PL/SQL procedure successfully completed.

In this case t1 has value “2” in first row and “1” in rest 99 rows
SQL> create index tt_indx on test1(t1);

Index created.

SQL> alter session set cursor_sharing=similar;

Session altered.

SQL> select * from test1 where t1=2;

1 row selected.

SQL> select * from test1 where t1=1;

99 rows selected.

SQL> select sql_text
2 from v$sql
3 where sql_text like 'select * from test1%'
4 order by sql_text;


select * from test1 where t1=:"SYS_B_0"
select * from test1 where t1=:"SYS_B_0"

This tells us that even though the 2 statements were similar, Oracle opted for a different plan. Now even if you put t1=30 (0 rows), Oracle will create another plan.

SQL> select * from test1 where t1=30; -- (0 rows)

SQL> select sql_text
2 from v$sql
3 where sql_text like 'select * from test1%'
4 order by sql_text;


select * from test1 where t1=:"SYS_B_0"
select * from test1 where t1=:"SYS_B_0"
select * from test1 where t1=:"SYS_B_0"

This is because the first time when the SQL ran, oracle engine found the literal value as “unsafe” because using the same literal value can cause bad plans for other similar SQL’s. So along with the PLAN, optimizer stored the literal value also. This will ensure the reusability of the plan only in case the same lieteral is provided. In case of any change, optimizer will generate a new plan.

But this doesn’t mean that SIMILAR and EXACT are same.

See this:

SQL> alter system flush shared_pool;

System altered.

SQL> select * from test1 where t1=2 and t1=22;

no rows selected

SQL> select * from test1 where t1=2 and t1=23;

no rows selected

SQL> select sql_text
2 from v$sql
3 where sql_text like 'select * from test1%'
4 order by sql_text;


select * from test1 where t1=:"SYS_B_0" and t1=:"SYS_B_1"

Optimizer used single plan for both.


1. Use CURSOR_SHARING=similar only when you have library cache misses and/or most of the SQL statements differ only in literal values

2. CURSOR_SHARING=force/similar significantly reduces the number of plans in shared pool


1. Oracle does not recommend setting CURSOR_SHARING to FORCE in a DSS environment or if you are using complex queries. Also, star transformation is not supported with CURSOR_SHARING set to either SIMILAR or FORCE

2. Setting CURSOR_SHARING to SIMILAR or FORCE causes an increase in the maximum lengths (as returned by DESCRIBE) of any selected expressions that contain literals (in a SELECT statement). However, the actual length of the data returned does not change.

Posted by Sachin at 4:09 AM 0 comments

Labels: Tuning

Thursday, November 30, 2006
Precaution while defining data types in Oracle

I was reading a wonderful article on Tom Kyte’s blog on repercussion of ill defined data types.

Some of the example he mentions is:

- Varchar2(40) Vs Varchar2(4000)
- Date Vs varchar2

Varchar2(40) Vs Varchar2(4000)

Generally developers ask for this to avoid issues in the application. They always want the uppermost limit to avoid any application errors. The fact on which they argue is “varchar2” data type will not reserve 4000 character space so disk space is not an issue. But what they don’t know is how costly are these.


- Generally application does an “array fetch” from the database i.e. they select 100 (may be more) rows in one go. So if you are selecting 10 varchar2 cols. Then effective RAM (not storage) usage will be 4000(char) x 10 (cols) x 100 (rows) = 4 MB of RAM. On contrary, had this column defined with 40 char size the usage would have been 40 x 10 x 100 ~ 40KB (approx)!! See the difference; also we didn’t multiply the “number of session”. That could be another shock!!

- Later on, it will be difficult to know for what the column was made. Ex: for first_name, if you define varchar2 (40000), it’s confusing for a new developer to know for what this column was made.

Date Vs varchar2

Again lots of developers define date cols as varchar2 (or char) for their convenience. But what they forget is not only data integrity (a date could be 01/01/03 .. what was dd,mm,yy .. then u don’t know what did you defined) but also performance.

While doing “Index range scans” by using “ between and ”
Optimizer will not be able to use the index as efficiently as in case of:
“ between and ”

I suggest you to read the full article by maestro himself.

Posted by Sachin at 4:43 AM 2 comments

Labels: Tuning

Tuesday, November 28, 2006
Oracle table compression
Compress your tables

This feature has been introduced in 9i rel 2 and is most useful in a warehouse environment (for fact tables).

How to compress? Simple

SQL> alter table test move compress;

Table altered.

How Oracle implements compression?

Oracle compress data by eliminating duplicate values within a data-block. Any repetitive occurrence of a value in a block is replaced by a symbol entry in a “symbol table” within the data block. So for example deptno=10 is repeated 5 times within a data block, it will be only stored once and rest 4 times a symbol entry will be stored in symbol table.
Its very important to know that every data block is self contained and sufficient to rebuild the uncompressed form of data.

Table compression can significantly reduce disk and buffer cache requirements for database tables while improving query performance. Compressed tables use fewer data blocks on disk, reducing disk space requirements.

Identifying tables to compress:

First create the following function which will get you the extent of compression

create function compression_ratio (tabname varchar2)
return number is — sample percentage
pct number := 0.000099;
blkcnt number := 0; blkcntc number; begin
execute immediate ' create table TEMP$$FOR_TEST pctfree 0
as select * from ' tabname ' where rownum < 1';
while ((pct < 100) and (blkcnt < 1000)) loop
execute immediate 'truncate table TEMP$$FOR_TEST';
execute immediate 'insert into TEMP$$FOR_TEST select *
from ' tabname ' sample block (' pct ',10)';
execute immediate 'select
from TEMP$$FOR_TEST' into blkcnt;
pct := pct * 10;
end loop;
execute immediate 'alter table TEMP$$FOR_TEST move compress ';
execute immediate 'select
from TEMP$$FOR_TEST' into blkcntc;
execute immediate 'drop table TEMP$$FOR_TEST';
return (blkcnt/blkcntc);

1 declare
2 a number;
3 begin
4 a:=compression_ratio('TEST');
5 dbms_output.put_line(a);
6 end
7 ;
8 /


PL/SQL procedure successfully completed.

1 select bytes/1024/1024 "Size in MB" from user_segments
2* where segment_name='TEST'
SQL> /

Size in MB

SQL> alter table test move compress;

Table altered.

SQL> select bytes/1024/1024 "Size in MB" from user_segments
2 where segment_name='TEST';

Size in MB

After compressing the table, you need to rebuild indexes because the rowid's have changed.


- This feature can be best utilized in a warehouse environment where there are lot of duplicate values (for fact tables). Infact a larger block size is more efficient, becuase duplicate values will be only stored once within a block.

- This feature has no -ve effect, infact it accelerates the performance of queries accessing large amount of data.

- I suggest you to read the following white paper by Oracle which explains the whole algorithm in details along with industry recognized TPC test cases.

I wrote the above article after reading the oramag. I suggest you to read the full article on Oracle site


Awareness is first step towards resolution.
I hope you will agree with me.

One of the important tasks of the DBA is to know what the high CPU consuming processes on database server are.
In my last organization, we used get number of request saying that DB server is running slow.
Now the problem is that, this server is hosting 86 databases, and finding out which is the culprit process and database sounds a daunting task (but it isn't).

See this:

First find out the top CPU processes on your system:

You may use TOP (or ps aux) or any other utility to find the top cpu consuming process.

Here is a sample top output:

bash-3.00$ top
17480 oracle 11 59 0 1576M 1486M sleep 0:09 23.51% oracle
9172 oracle 258 59 2 1576M 1476M sleep 0:43 1.33% oracle
9176 oracle 14 59 2 1582M 1472M sleep 0:52 0.43% oracle
17550 oracle 1 59 0 3188K 1580K cpu/1 0:00 0.04% top
9178 oracle 13 59 2 1571M 1472M sleep 2:30 0.03% oracle

You can see the bold section. Process# 17480 is consuming 23.51 % CPU.

Now this process can belong to any process out of many instances on this server.
To find out which instance this process belongs to:

bash-3.00$ ps -ef grep 17480

oracle 17480 17479 0 03:02:03 ? 0:48 oracleorcl (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

The instance name is highlighted in red.

Now you know which instance is holding that session.

Change your environmental settings (ORACLE_SID, ORACLE_HOME) related to this database.
and connect to the database as SYSDBA

bash-3.00$ sqlplus "/ as sysdba"

SQL*Plus: Release - Production on Thu Dec 21 04:03:44 2006

Copyright (c) 1982, 2005, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Release - Production

SQL> select ses.sid SID,sqa.SQL_TEXT SQL from
2 v$session ses, v$sqlarea sqa, v$process proc
3 where ses.paddr=proc.addr
4 and ses.sql_hash_value=sqa.hash_value
5 and proc.spid=17480;

--------- -----------------
67 delete from test

Now you have the responsible SQL behind 23% CPU using process.
In my case it was a deliberate DELETE statement to induct this test but in your case it can be a query worth tuning.

Mostly knowing what is the problem is solution of the problem. (At least you know what the issue is).
Issue is to be addressed right away or to be taken to development team is a subjective issue which i don’t want to comment.


I have seen many developers getting confused on index usage with like operator. Few are of the feeling that index will be used and few are against this feeling.

Let’s see this with example:

SQL> create table sac as select * from all_objects;

Table created.

SQL> create index sac_indx on sac(object_type);

Index created.

SQL> set autotrace trace explain

SQL> select * from sac where object_type='TAB%';

Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=128

d=1 Bytes=128)

2 1 INDEX (RANGE SCAN) OF 'SAC_INDX' (INDEX) (Cost=1 Card=1)

Above example shows that using % wild card character towards end probe an Index search.

But if it is used towards end, it will not be used. And sensibly so, because Oracle doesn’t know which data to search, it can start from ‘A to Z’ or ‘a to z’ or even 1 to any number.

See this.
SQL> select * from sac where object_type like '%ABLE';

Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=148 Card=1004 Byte

1 0 TABLE ACCESS (FULL) OF 'SAC' (TABLE) (Cost=148 Card=1004 B

Now how to use the index if you are using Like operator searches. The answer is Domain Indexes.

See the following example:

SQL> connect / as sysdba

SQL> grant execute on ctx_ddl to public;
Grant succeeded.

SQL> connect sac/******
SQL> begin
2 ctx_ddl.create_preference('SUBSTRING_PREF',
4 ctx_ddl.set_attribute('SUBSTRING_PREF',
6 end;
8 /
PL/SQL procedure successfully completed.

SQL> drop index sac_indx;
Index dropped.

SQL> create index sac_indx on sac(object_type) indextype is ctxsys.context parameters ('wordlist SUBSTRING_PREF memory 50m');
Index created.

SQL> set autotrace trace exp
SQL> select * from sac where contains (OBJECT_TYPE,'%PACK%') > 0
2 /
Execution Plan
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=19 Bytes=17
d=19 Bytes=1786)

In this case the index is getting used.
For proximity, soundex and fuzzy searchs, use domain indexes.


They 2 considered to be the most important parameter for shared pool tuning, but I guess most of us generally don’t use them or sometimes use them incorrectly.

The idea to put them here to understand “what they do?”, “when to use them?”, “how to use them?” and finally “see the impact”.


In most of the environments, there are many SQL’s which are re-fired many a times within a session, and every time they are issued, the session searches the shared pool for the parsed state, if it doesn’t get the parsed version, it will “hard parse” it, and if it exists in shared pool, it will still do a “soft parse”.

As we know “hard parse” is a costly operation, even a “soft parse” requires library cache latch and CPU overhead, which if aggregated is a significant number.

This parameter if set to a non-zero value (default is 50), improves the “soft parse” performance by doing a softer soft parse.


If enabled, oracle maintains a local session cache which stores recently closed cursors of a session.

To avoid this space getting misused or overused, oracle maintains the cursors for which there have been 3 parsed calls in the past, so all the SQL’s issued by a session are not here. Remember each cursor if pinned here, is not freeable and hence you may require more shared pool area.

A normal cursor in shared pool is sum of 2 components:
a) Heap 0 – size 1KB
b) SQL Area – size multiple of 4k

When we use session_cached_cursors only first component of cursor which is HEAP 0 is pinned in local session cache and if there is a call for re-parse for a statement, Oracle first checks the existence of the cursor in local cache and if found, it gets the address of the rest of the cursor which is in SQL Area (assuming if it is not aged out), so hereby saving CPU overhead and library cache latch contention.

How much it is getting used ?

SQL> select max(value) from v$sesstat
2 where STATISTIC# in (select STATISTIC# from v$statname where name='session cursor cache count');


This shows maximum value for session_cached_cursors in past. If this value= “session_cached_cursors” (init.ora parameter), you should consider increasing it.

If you want to see how is your session cache doing?

SQL> select cache/tot*100 "Session cursor cache%"
2 from
3 (select value tot from v$sysstat where name='parse count (total)'),
4 ( select value cache from sys.v_$sysstat where name = 'session cursor cache hits' );

Session cursor cache%

A value near 100 is considered very good. But you may still consider increasing this parameter if MAX(VALUE) in query one shows you equal number of cached cursor which you have set.

Conclusion: In an OLTP application, where the same set of SQL is issued number of times, one must configure this parameter more than its default value (50).
Also increasing this parameter will mean extra memory required for shared pool, so you must increase your shared pool when you use this parameter.


SQL can be aged out of shared pool in 2 cases:
a) When the cursor is closed: When the cursors are closed by application, they can still be in shared pool, until there comes a request for a new cursor and oracle needs to use LRU algorithm. SESSION_CACHED_CURSORS helps you in pinning (partial because it only pins HEAP 0) when the cursors are closed.

b) When the cursor is open: Oracle requires parsed state of SQL at PARSE and EXECUTE phase. If oracle parses (soft or hard) a statement, there is a likely hood that Oracle may age out your SQL out of shared pool after PARSE state if it requires to accommodate a new SQL coming its way. So in the EXECUTE state, there is a possibility that parsed information is lost and oracle parse it again.

CURSOR_SPACE_FOR_TIME if set to TRUE, ensures that SQL is not aged out before the cursor is closed, so in EXECUTE phase, you will have the PARSE information.

But this is generally a rare case and happens in a very highly active environment because to accommodate a new SQL, Oracle first check the free space and if it doesn’t get, it checks the closed cursors and see if any cursor can be aged out and when there is no space which can be reclaimed, Oracle comes to open cursors which are not EXECUTED.
This generally happens when the space of shared pool is too less.

I don’t suggest setting this parameter to TRUE in most of the cases.

There are some other serious tradeoffs also.
When a cursor is pinned, it cant be aged out and related memory cannot be freed for any new SQL and if you set this parameter to TRUE, you are telling Oracle to keep all the open cursors pinned and not freeable.
If you use this parameter, you are pinning the whole cursor not just the HEAP 0 which is 1k, you are pinning HEAP 0 (1k) + SQL Area (multiple of 4k) which makes shared pool life tough because of space issues.


As I said, I don’t suggest setting this parameter to TRUE in most of the cases. An alternative to set this parameter is to increase shared pool size or/and check your code on how many numbers of cursors you are opening/closing. That will be a better approach. Setting this parameter is like taking paracetamol without knowing the cause of fever.


With Oracle 9i, CBO is equipped with many more features, one of them is “Index skip scan” .This means even if you have a composite index on more than one column and you use the non-prefix column alone in your SQL, it may still use index.

I said “may” because CBO will calculate the cost of using the index and if it is more than that of full table scan, then it may not use index.

Index skip scan works differently from a normal index (range) scan.
A normal range scan works from top to bottom first and then move horizontal.
But a Skip scan includes several range scans in it. Since the query lacks the leading column it will rewrite the query into smaller queries and each doing a range scan.

SQL> create table test (a number, b number, c number);

Table created.

SQL> create index test_i on test(a,b);

Index created.

SQL> begin
2 for i in 1 .. 100000
3 loop
4 insert into test values(mod(i, 5), i, 100);
5 end loop;
6 commit;
7 end;
8 /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname => 'SAC', tabname => 'test', cascade => true);

PL/SQL procedure successfully completed.

SQL> set autotrace trace exp
SQL> select * from test where b=95267;

Execution Plan
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=22 Card=1 Bytes=10)

1 0

2 1
INDEX (SKIP SCAN) OF 'TEST_I' (INDEX) (Cost=21 Card=1)

I above example, “select * from test where b=95267” was broken down to several small range scan queries. It was effectively equivalent to following

Select * from test where a=0 and b=95267
Select * from test where a=1 and b=95267
Select * from test where a=2 and b=95267
Select * from test where a=3 and b=95267
Select * from test where a=4 and b=95267;

In concrete, saying that skip scan is not as efficient as normal “single range scan” is correct. But yet saves some disk space and overhead of maintaining another index.


Nested loop (loop over loop)

In this algorithm, an outer loop is formed which consists of few entries and then for each entry, and inner loop is processed.


Select tab1.*, tab2.* from tabl, tab2 where tabl.col1=tab2.col2;

It is processed like:

For i in (select * from tab1) loop
For j in (select * from tab2 where col2=i.col1) loop
Display results;
End loop;
End loop;

The Steps involved in doing nested loop are:

a) Identify outer (driving) table

b) Assign inner (driven) table to outer table.

c) For every row of outer table, access the rows of inner table.

In execution plan it is seen like this:


When optimizer uses nested loops?

Optimizer uses nested loop when we are joining tables containing small number of rows with an efficient driving condition. It is important to have an index on column of inner join table as this table is probed every time for a new value from outer table.

Optimizer may not use nested loop in case:

No of rows of both the table is quite high
Inner query always results in same set of records
The access path of inner table is independent of data coming from outer table.

Note: You will see more use of nested loop when using FIRST_ROWS optimizer mode as it works on model of showing instantaneous results to user as they are fetched. There is no need for selecting caching any data before it is returned to user. In case of hash join it is needed and is explained below.

Hash join

Hash joins are used when the joining large tables. The optimizer uses smaller of the 2 tables to build a hash table in memory and the scans the large tables and compares the hash value (of rows from large table) with this hash table to find the joined rows.

The algorithm of hash join is divided in two parts

Build a in-memory hash table on smaller of the two tables.
Probe this hash table with hash value for each row second table

In simpler terms it works like

Build phase

For each row RW1 in small (left/build) table loop
Calculate hash value on RW1 join key
Insert RW1 in appropriate hash bucket.
End loop;

Probe Phase

For each row RW2 in big (right/probe) table loop
Calculate the hash value on RW2 join key
For each row RW1 in hash table loop
If RW1 joins with RW2
Return RW1, RW2
End loop;
End loop;

When optimizer uses hash join?

Optimizer uses has join while joining big tables or big fraction of small tables.

Unlike nested loop, the output of hash join result is not instantaneous as hash joining is blocked on building up hash table.

Note: You may see more hash joins used with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.

Sort merge join

Sort merge join is used to join two independent data sources. They perform better than nested loop when the volume of data is big in tables but not as good as hash joins in general.

They perform better than hash join when the join condition columns are already sorted or there is no sorting required.

The full operation is done in two parts:

Sort join operation
get first row RW1 from input1
get first row RW2 from input2.

Merge join operation
while not at the end of either input loop
if RW1 joins with RW2
get next row R2 from input 2
return (RW1, RW2)
else if RW1 < style=""> get next row RW1 from input 1
get next row RW2 from input 2
end loop

Note: If the data is already sorted, first step is avoided.

Important point to understand is, unlike nested loop where driven (inner) table is read as many number of times as the input from outer table, in sort merge join each of the tables involved are accessed at most once. So they prove to be better than nested loop when the data set is large.

When optimizer uses Sort merge join?

a) When the join condition is an inequality condition (like <, <=, >=). This is because hash join cannot be used for inequality conditions and if the data set is large, nested loop is definitely not an option.

b) If sorting is anyways required due to some other attribute (other than join) like “order by”, optimizer prefers sort merge join over hash join as it is cheaper.

Note: Sort merge join can be seen with both ALL_ROWS and FIRST_ROWS optimizer hint because it works on a model of first sorting both the data sources and then start returning the results. So if the data set is large and you have FIRST_ROWS as optimizer goal, optimizer may prefer sort merge join over nested loop because of large data. And if you have ALL_ROWS as optimizer goal and if any inequality condition is used the SQL, optimizer may use sort-merge join over hash join


These days, we are working on data warehouse in which we have a master table which will have 1.5m (approx) rows inserted every half hour and we have few fast refresh materialized view based on it. These mviews have some aggregate functions on it, which makes it a bit complex.

To start the experiment, each mview refreshes used to take some 18-20 mins, which is totally against the business requirement. Then we tried to figure out on why the mview refresh is taking so much time, in spite of dropping all the bitmap indexes on the mview (generally b-map indexes are not good for inserts/updates).

The 10046 trace (level 12) highlighted that there were many “db file sequential reads” on mview because of optimizer using “I_SNAP$_mview” to fetch the rows from mview and merge the rows with that of master table to make the aggregated data for the mview.

Good part of the story is access to master table was quite fast because we used direct load (using sqlldr direct=y) to insert the data in it. When you use direct load to insert the data, oracle maintains the list of rowids added to table in a view called “SYS.ALL_SUMDELTA”. So while doing fast mview refresh, news rows inserted are picked directly from table using the rowids given from ALL_SUMDELTA view and not from Mview log, so this saves time.

Concerned part was still Oracle using I_SNAP$ index while fetching the data from mview and there were many “db file sequential read” waits and it was clearly visible that Oracle waited on sequential read the most. We figured it out that full table scan (which uses scattered read, and multi block read count) was very fast in comparison to index access by running simple test against table. Also the tables are dependent mviews are only for the day. End of the day the master table and mview’s data will be pushed to historical tables and master table and mviews will be empty post midnight.

I gathered the stats of mview and then re-ran the mview refresh, and traced the session, and this time optimizer didn’t use the index which was good news.

Now the challenge was to run the mview stats gathering job every half an hour or induce wrong stats to table/index to ensure mview refresh never uses index access or may be to lock the stats using DBMS_STATS.LOCK_TABLE_STATS.

But we found another solution by creating the mview with “USING NO INDEX” clause. This way “I_SNAP$” index is not created with “CREATE MATERIALIZED VIEW’ command. As per Oracle the “I_SNAP$” index is good for fast refresh but it proved to be reverse for us because our environment is different and the data changes is quite frequent.

Now, we ran the tests again, we loaded 48 slices of data (24 hrs x 2 times within hour) and the results were above expectations. We could load the data with max 3 mins per load of data.

This is not the end of story. In the trace we could see the mview refresh using “MERGE” command and using full table scan access to mview (which we wanted) and rowid range access to master table.

The explain plan looks like:

Rows Row Source Operation
------- ---------------------------------------------------
2 MERGE SF_ENV_DATA_MV (cr=4598 pr=5376 pw=0 time=47493463 us)
263052 VIEW (cr=3703 pr=3488 pw=0 time=24390284 us)
263052 HASH JOIN OUTER (cr=3703 pr=3488 pw=0 time=24127224 us)
263052 VIEW (cr=1800 pr=1790 pw=0 time=14731732 us)
263052 SORT GROUP BY (cr=1800 pr=1790 pw=0 time=14205624 us)
784862 VIEW (cr=1800 pr=1790 pw=0 time=3953958 us)
784862 NESTED LOOPS (cr=1800 pr=1790 pw=0 time=3169093 us)
1 VIEW ALL_SUMDELTA (cr=9 pr=0 pw=0 time=468 us)
1 FILTER (cr=9 pr=0 pw=0 time=464 us)
1 MERGE JOIN CARTESIAN (cr=9 pr=0 pw=0 time=459 us)
1 NESTED LOOPS (cr=6 pr=0 pw=0 time=99 us)
1 TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=56 us)
1 INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=23 us)(object id 36)
1 TABLE ACCESS CLUSTER USER$ (cr=3 pr=0 pw=0 time=40 us)
1 INDEX UNIQUE SCAN I_USER# (cr=1 pr=0 pw=0 time=7 us)(object id 11)
1 BUFFER SORT (cr=3 pr=0 pw=0 time=354 us)
1 INDEX RANGE SCAN I_SUMDELTA$ (cr=3 pr=0 pw=0 time=243 us)(object id 158)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us)
0 INDEX RANGE SCAN I_OBJAUTH1 (cr=0 pr=0 pw=0 time=0 us)(object id 103)
0 FIXED TABLE FULL X$KZSRO (cr=0 pr=0 pw=0 time=0 us)
0 FIXED TABLE FULL X$KZSPR (cr=0 pr=0 pw=0 time=0 us)
784862 TABLE ACCESS BY ROWID RANGE SF_ENV_SLICE_DATA (cr=1791 pr=1790 pw=0 time=2383760 us)
708905 MAT_VIEW ACCESS FULL SF_ENV_DATA_MV (cr=1903 pr=1698 pw=0 time=6387829 us)

You can see the access pattern as above.

Interesting twist in the story is when I saw the wait events in trace file.

Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 2253 0.74 7.73
db file scattered read 240 1.05 6.77
log file switch completion 6 0.98 3.16
log file switch 8 0.98 2.47
rdbms ipc reply 6 0.00 0.00
log buffer space 3 0.42 0.61

Again, even when we are doing full table scan, there are “db file sequential reads”?

To confirm I opened the raw trace file (before tkprof), and checked the obj# on sequential read wait event, it was the mview (SF_ENV_DATA_MV) !! and there were many. To further investigate I checked if there were any scattered reads to mview or not. I found there were scattered reads but there were many sequential reads also on which Oracle waited more than that of scattered read which did most of the data fetching.

After giving some thought, I realized that we created the mviews without storage clause, which means Oracle created the mview with default storage clause.

So assuming there are 17 blocks in an mview (container table) extent and Multi block read count is 16, Oracle will use scattered read mechanism (multiple blocks) to read the first 16 blocks and for the rest 1 it will use sequential read mechanism (one block), so you will find many sequential reads wait events sandwiched between scattered reads. To overcome this we created the mview with larger extent sizes and also multiple of MBCR (multi block read count).

Also, another cause of sequential read is chained or migrated rows, if your mview (or table) rows are migrated, the pointer to the next row is maintained in old (original) block, which will always be read by a single i/o call i.e. by sequential read.You can check the count of chained rows using DBA_TABLES.CHAIN_CNT after analysing the table . So to overcome this, we created the mview with genuine pctfree so when the merge runs (as a part of mview refresh) and updates few rows, the rows are not moved to a different block and hence avoiding sequential read.


Mview creation with “USING NO INDEX” does not create “I_SNAP$” index which sometimes help in fast refresh when the data changes are quite frequent and you cannot afford to collect stats after every few mins.
Create mview with storage clause suiting to your environment. Default extent sizes may not be always good.
PCTFREE can be quite handy to avoid sequential reads and extra block read.

Posted by Sachin at 2:01 AM 4 comments

Labels: CBO, Tips, Tuning

Saturday, March 3, 2007
Optimizer_mode – ALL_ROWS or FIRST_ROWS?

Out of all Oracle RDBMS modules, optimizer code is actually the most complicated code and different optimizer modes seem like jack while lifting your car in case of a puncture.

This paper focuses on how optimizer behaves differently when you have optimizer mode set to ALL_ROWS or FIRST_ROWS.

Possible values for optimizer_mode = choose/ all_rows/ first_rows/ first_rows[n]

By default, the value of optimizer_mode is CHOOSE which basically means ALL_ROWS (if statistics on underlying tables exist) else RULE (if there are no statistics on underlying tables). So it is very important to have statistics collected on your tables on regular intervals or else you are living in Stone Age.

FIRST_ROWS and ALL_ROWS are both cost based optimizer features. You may use them according to their requirement.


In simple terms it ensures best response time of first few rows (n rows).

This mode is good for interactive client-server environment where server serves first few rows and by the time user scroll down for more rows, it fetches other. So user feels that he has been served the data he requested, but in reality the request is still pending and query is still fetching the data in background.

Best example for this is toad, if you click on data tab, it instantaneously start showing you data and you feel toad is faster than sqlplus, but the fact is if you scroll down, you will see the query is still running.

Ok, let us simulate this on SQLPLUS

Create a table and index over it:

SQL> create table test as select * from all_objects;

Table created.

SQL> create index test_in on test(object_type);

Index created.

SQL> exec dbms_stats.gather_table_stats(‘SAC’,'TEST')

PL/SQL procedure successfully completed.

SQL> select count(*) from test;


SQL> select count(*) from test where object_type='JAVA CLASS';


You see out of almost 38k records, 15k are of JAVA class. And now if you select the rows having object_type=’JAVA_CLASS’, it should not use index as almost half of the rows are JAVA_CLASS. It will be foolish of optimizer to read the index first and then go to table.

Check out the Explain plans

SQL> set autotrace traceonly exp
SQL> select * from test where object_type='JAVA CLASS';

Execution Plan
Plan hash value: 1357081020

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1001 | 94094 | 10 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 1001 | 94094 | 10 (0)| 00:00:01 |

As you see above, optimizer has not used Index we created on this table.

Now use FIRST_ROWS hint:

SQL> select /*+ FIRST_ROWS*/ * from test where object_type='JAVA CLASS';

Execution Plan
Plan hash value: 3548301374

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 14662 | 1345K| 536 (1)| 00:00:07 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 14662 | 1345K| 536 (1)| 00:00:07 |
|* 2 | INDEX RANGE SCAN | TEST_IN | 14662 | | 43 (3)| 00:00:01 |

In this case, optimizer has used the index.

Q> Why?

Ans> Because you wanted to see first few rows quickly. So, following your instructions oracle delivered you first few rows quickly using index and later delivering the rest.

See the difference in cost, although the response time (partial) of second query was faster but resource consumption was high.

But that does not mean that this optimizer mode is bad. As I said this mode may be good for interactive client-server model. In most of OLTP systems, where users want to see data fast on their screen, this mode of optimizer is very handy.

Important facts about FIRST_ROWS

It gives preference to Index scan Vs Full scan (even when index scan is not good).
It prefers nested loop over hash joins because nested loop returns data as selected (& compared), but hash join hashes one first input in hash table which takes time.
Cost of the query is not the only criteria for choosing the execution plan. It chooses plan which helps in fetching first rows fast.
It may be a good option to use this in an OLTP environment where user wants to see data as early as possible.


In simple terms, it means better throughput

While FIRST_ROWS may be good in returning first few rows, ALL_ROWS ensures the optimum resource consumption and throughput of the query. In other words, ALL_ROWS is better to retrieve the last row first.

In above example while explaining FIRST_ROWS, you have already seen how efficient ALL_ROWS is.

Important facts about ALL_ROWS

ALL_ROWS considers both index scan and full scan and based on their contribution to the overall query, it uses them. If Selectivity of a column is low, optimizer may use index to fetch the data (for example ‘where employee_code=7712’), but if selectivity of column is quite high ('where deptno=10'), optimizer may consider doing Full table scan. With ALL_ROWS, optimizer has more freedom to its job at its best.
Good for OLAP system, where work happens in batches/procedures. (While some of the report may still use FIRST_ROWS depending upon the anxiety level of report reviewers)
Likes hash joins over nested loop for larger data sets.


Cost based optimizer gives you flexibility to choose response time or throughput. So use them based on your business requirement.