Tuesday, October 23, 2007

Why isn't my index getting used?---Case Study

Why isn't my index getting used?

There are many possible causes of this. In this section, we'll take a look at some of the most common.
Case 1

We're using a B*Tree index, and our predicate does not use the leading edge of an index. In this case, we might have a table T with an index on T(x,y). We query SELECT * FROM T WHERE Y = 5. The optimizer will tend not to use the index since our predicate did not involve the column X -- it might have to inspect each and every index entry in this case (we'll discuss an index skip scan shortly where this is not true). It will typically opt for a full table scan of T instead. That does not preclude the index from being used. If the query was SELECT X,Y FROM T WHERE Y = 5, the optimizer would notice that it did not have to go to the table to get either X or Y (they are in the index) and may very well opt for a fast full scan of the index itself, as the index is typically much smaller than the underlying table. Note also that this access path is only available with the CBO.

There is another case whereby the index on T(x,y) could be used with the CBO is during an index skip scan. The skip scan works well if and only if the leading edge of the index (X in the previous example) has very few distinct values and the optimizer understands that. For example, consider an index on (GENDER, EMPNO) where GENDER has the values M and F, and EMPNO is unique. A query such as

select * from t where empno = 5;

might consider using that index on T to satisfy the query in a skip scan method, meaning the query will be processed conceptually like this:

select * from t where GENDER='M' and empno = 5
UNION ALL
select * from t where GENDER='F' and empno = 5;

It will skip throughout the index, pretending it is two indexes: one for Ms and one for Fs. We can see this in a query plan easily. We'll set up a table with a bivalued column and index it:

ops$tkyte@ORA10GR1> create table t
2 as
3 select decode(mod(rownum,2), 0, 'M', 'F' ) gender, all_objects.*
4 from all_objects
5 /
Table created.

ops$tkyte@ORA10GR1> create index t_idx on t(gender,object_id)
2 /
Index created.

ops$tkyte@ORA10GR1> begin
2 dbms_stats.gather_table_stats
3 ( user, 'T', cascade=>true );
4 end;
5 /
PL/SQL procedure successfully completed.

Now, when we query this, we should see the following:

ops$tkyte@ORA10GR1> set autotrace traceonly explain
ops$tkyte@ORA10GR1> select * from t t1 where object_id = 42;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1 Bytes=95)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=4 Card=1 Bytes=95)
2 1 INDEX (SKIP SCAN) OF 'T_IDX' (INDEX) (Cost=3 Card=1)

The INDEX SKIP SCAN step tells us that Oracle is going to skip throughout the index, looking for points where GENDER changes values and read down the tree from there, looking for OBJECT_ID=42 in each virtual index being considered. If we increase the number of distinct values for GENDER measurably, as follows:

ops$tkyte@ORA10GR1> update t
2 set gender = chr(mod(rownum,256));
48215 rows updated.

ops$tkyte@ORA10GR1> begin
2 dbms_stats.gather_table_stats
3 ( user, 'T', cascade=>true );
4 end;
5 /
PL/SQL procedure successfully completed.

we'll see that Oracle stops seeing the skip scan as being a sensible plan. It would have 256 mini indexes to inspect, and it opts for a full table scan to find our row:

ops$tkyte@ORA10GR1> set autotrace traceonly explain
ops$tkyte@ORA10GR1> select * from t t1 where object_id = 42;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=158 Card=1 Bytes=95)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=158 Card=1 Bytes=95)

Case 2

We're using a SELECT COUNT(*) FROM T query (or something similar) and we have a B*Tree index on table T. However, the optimizer is full scanning the table, rather than counting the (much smaller) index entries. In this case, the index is probably on a set of columns that can contain nulls. Since a totally null index entry would never be made, the count of rows in the index will not be the count of rows in the table. Here the optimizer is doing the right thing -- it would get the wrong answer if it used the index to count rows.
Case 3

For an indexed column, we query using the following:

select * from t where f(indexed_column) = value

and find that the index on INDEX_COLUMN is not used. This is due to the use of the function on the column. We indexed the values of INDEX_COLUMN, not the value of F(INDEXED_COLUMN). The ability to use the index is curtailed here. We can index the function if we choose to do it.

Case 4

We have indexed a character column. This column contains only numeric data. We query using the following syntax:

select * from t where indexed_column = 5

Note that the number 5 in the query is the constant number 5 (not a character string). The index on INDEXED_COLUMN is not used. This is because the preceding query is the same as the following:

select * from t where to_number(indexed_column) = 5

We have implicitly applied a function to the column and, as noted in case 3, this will preclude the use of the index. This is very easy to see with a small example. In this example, we're going to use the built-in package DBMS_XPLAN. This package is available only with Oracle9i Release 2 and above (in Oracle9i Release 1, we will use AUTOTRACE instead to see the plan easily, but we will not see the predicate information—that is only available in Oracle9i Release 2 and above):

ops$tkyte@ORA10GR1> create table t ( x char(1) constraint t_pk primary key,
2 y date );
Table created.

ops$tkyte@ORA10GR1> insert into t values ( '5', sysdate );
1 row created.

ops$tkyte@ORA10GR1> delete from plan_table;
3 rows deleted.

ops$tkyte@ORA10GR1> explain plan for select * from t where x = 5;
Explained.

ops$tkyte@ORA10GR1> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------
Plan hash value: 749696591
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 12 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("X")=5)

As you can see, it full scanned the table, and even if we were to hint the query

ops$tkyte@ORA10GR1> explain plan for select /*+ INDEX(t t_pk) */ * from t
2 where x = 5;
Explained.

ops$tkyte@ORA10GR1> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------
Plan hash value: 3473040572
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 34 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 34 (0)| 00:00:01 |
|* 2 | INDEX FULL SCAN | T_PK | 1 | | 26 (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_NUMBER("X")=5)

it uses the index, but not for a UNIQUE SCAN as we might expect -- it is FULL SCANNING this index. The reason lies in the last line of output there: filter(TO_NUMBER("X")=5). There is an implicit function being applied to the database column. The character string stored in X must be converted to a number prior to comparing to the value 5. We cannot convert 5 to a string, since our NLS settings control what 5 might look like in a string (it is not deterministic), so we convert the string into a number, and that precludes the use of the index to rapidly find this row. If we simply compare strings to strings

ops$tkyte@ORA10GR1> delete from plan_table;
2 rows deleted.

ops$tkyte@ORA10GR1> explain plan for select * from t where x = '5';
Explained.

ops$tkyte@ORA10GR1> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------
Plan hash value: 1301177541
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | T_PK | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("X"='5')

we get the expected INDEX UNIQUE SCAN, and we can see the function is not being applied. You should always avoid implicit conversions anyway. Always compare apples to apples and oranges to oranges. Another case where this comes up frequently is with dates. We try to query:

-- find all records for today
select * from t where trunc(date_col) = trunc(sysdate);

and discover that the index on DATE_COL will not be used. We can either index the TRUNC(DATE_COL) or, perhaps more easily, query using range comparison operators. The following demonstrates the use of greater than and less than on a date. Once we realize that the condition

TRUNC(DATE_COL) = TRUNC(SYSDATE)

is the same as the condition

select *
from t
where date_col >= trunc(sysdate)
and date_col < trunc(sysdate+1)

this moves all of the functions to the right-hand side of the equation, allowing us to use the index on DATE_COL (and it has the same exact effect as WHERE TRUNC(DATE_COL) = ➥ TRUNC(SYSDATE)).

If possible, you should always remove the functions from database columns when they are in the predicate. Not only will doing so allow for more indexes to be considered for use, but also it will reduce the amount of processing the database needs to do. In the preceding case, when we used

where date_col >= trunc(sysdate)
and date_col < trunc(sysdate+1)

the TRUNC values are computed once for the query, and then an index could be used to find just the qualifying values. When we used TRUNC(DATE_COL) = TRUNC(SYSDATE), the TRUNC(DATE_COL) had to be evaluated once per row for every row in the entire table (no indexes).

Case 5

The index, if used, would actually be slower. I see this a lot -- people assume that, of course, an index will always make a query go faster. So, they set up a small table, analyze it, and find that the optimizer doesn't use the index. The optimizer is doing exactly the right thing in this case. Oracle (under the CBO) will use an index only when it makes sense to do so. Consider this example:

ops$tkyte@ORA10GR1> create table t
2 ( x, y , primary key (x) )
3 as
4 select rownum x, object_name
5 from all_objects
6 /
Table created.

ops$tkyte@ORA10GR1> begin
2 dbms_stats.gather_table_stats
3 ( user, 'T', cascade=>true );
4 end;
5 /
PL/SQL procedure successfully completed.

If we run a query that needs a relatively small percentage of the table, as follows:

ops$tkyte@ORA10GR1> set autotrace on explain
ops$tkyte@ORA10GR1> select count(y) from t where x < 50;

COUNT(Y)
----------
49

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=1 Bytes=28)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=3 Card=41 Bytes=1148)
3 2 INDEX (RANGE SCAN) OF 'SYS_C009167' (INDEX (UNIQUE)) (Cost=2 Card=41)

it will happily use the index; however, we'll find that when the estimated number of rows to be retrieved via the index crosses a threshold (which varies depending on various optimizer settings, physical statistics, and so on), we'll start to observe a full table scan:

ops$tkyte@ORA10GR1> select count(y) from t where x < 15000;

COUNT(Y)
----------
14999

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=57 Card=1 Bytes=28)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=57 Card=14994 Bytes=419832)

This example shows the optimizer won't always use an index and, in fact, it makes the right choice in skipping indexes. While tuning your queries, if you discover that an index isn't used when you think it "ought to be," don't just force it to be used -- test and prove first that the index is indeed faster (via elapsed and I/O counts) before overruling the CBO. Reason it out.
Case 6

We haven't analyzed our tables in a while. They used to be small, but now when we look at them, they have grown quite large. An index will now make sense, whereas it didn't originally. If we analyze the table, it will use the index.

Without correct statistics, the CBO cannot make the correct decisions.
Index Case Summary

In my experience, these six cases are the main reasons I find that indexes are not being used. It usually boils down to a case of "They cannot be used -- using them would return incorrect results" or "They should not be used -- if they were used, performance would be terrible."

3 comments:

Anonymous said...

Me and ozzy fucked harder, trying to show to my own god!
FUCK YES!' just before cumming inside my warm pussy. were still fucking

My web page ... hcg injections

Anonymous said...

Whаt's up to every body, it's my fiгst paу a ѵisit of thiѕ blog; this wеbpage contains
amazing and in faсt fine data in favor оf readers.


Feel free to ѵisit my blog pοst :: steroids penis

Anonymous said...

Piece of writing writing is also a excitement, if you know afterward you can write
otherwise it is difficult to write.

Feel free to visit my homepage ... height to waist ratio