Adaptive Plans - Inflection Iteration

In Summary

Oracle 12c appears to take the following steps in order to compute the inflection point for a join:

  1. Identifies the number of rows coming from the "left side" table. Zero will be the lower bound for the inflection point, and the left side row count (minus the number of NULLs in the join field) will be the upper bound.
  2. Computes the sub-query cost at a left side cardinality of 1,
  3. Computes the sub-query cost at a left side cardinality of the left side table row count (minus number of NULLs),
  4. Then picks a row source cardinality value half way between one and the left table row count, and computes the cost
  5. If the nested loop join cost is less than the hash join cost in 4., then the cardinality in 4. will be used as the lower bound for the inflection point, and a cardinality half way between the new lower bound and the left table row count will be used next. Otherwise, when the hash join cost is the lower, the cardinality in 4. will be used as the upper bound for the inflection point, and a cardinality half way between one and the new upper count will be used next.
  6. In a similar manner, a binary split approach is used to iterate onto the left row source cardinality which gives the same cost for a nested loop join as for a hash join. This row source cardinality is then the inflection point for the join.

After the first three cardinality cost pairs have been identified, I might have expected the optimiser to fit a curve through the three points to come up with a much better guess at the inflection point. This might reduce the number of iterations, and associated computation effort required to converge on the inflection point.

An Example

Consider the join

SELECT count(*)
FROM   infl_double_twos   a
JOIN   infl_triple_threes b USING (id1)
JOIN   infl_five_fives    c USING (id1);
      

The statistics which will be of primary interest to the optimiser will be the number of rows in each table, and the number of distinct "ID1"s in each table:

SQL> COLUMN table_name  FORMAT a30;
SQL> COLUMN column_name FORMAT a30;
SQL>
SQL> SELECT table_name, num_rows, blocks
  2  FROM   tabs
  3  WHERE  table_name IN ('INFL_DOUBLE_TWOS', 'INFL_TRIPLE_THREES', 'INFL_FIVE_FIVES');

TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ---------- ----------
INFL_TRIPLE_THREES                 253962       9077
INFL_FIVE_FIVES                    169308       5788
INFL_DOUBLE_TWOS                   338616      12137

SQL>
SQL> SELECT table_name, column_name, num_distinct, num_nulls, density
  2  FROM   user_tab_columns
  3  WHERE  table_name IN ('INFL_DOUBLE_TWOS', 'INFL_TRIPLE_THREES', 'INFL_FIVE_FIVES')
  4  AND    column_name = 'ID1';

TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY
------------------------------ ------------------------------ ------------ ---------- ----------
INFL_DOUBLE_TWOS               ID1                                  173440          0 5.7657E-06
INFL_FIVE_FIVES                ID1                                   33756      12345 .000029624
INFL_TRIPLE_THREES             ID1                                   88592          0 .000011288
      

The plan (taken from the 10053 trace) is:

----------------------------------------------------------+-----------------------------------+
| Id  | Operation                  | Name                 | Rows  | Bytes | Cost  | Time      |
----------------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT           |                      |       |       |  3398 |           |
| 1   |  SORT AGGREGATE            |                      |     1 |    18 |       |           |
| 2   |   HASH JOIN                |                      |  858K |   15M |  3398 |  00:00:41 |
| 3   |    INDEX FAST FULL SCAN    | INFL_DOUBLE_TWOS_PK  |  331K | 1984K |   385 |  00:00:05 |
| 4   |    HASH JOIN               |                      |  439K | 5273K |  2208 |  00:00:27 |
| 5   |     NESTED LOOPS           |                      |  439K | 5273K |  2208 |  00:00:27 |
| 6   |      STATISTICS COLLECTOR  |                      |       |       |       |           |
| 7   |       TABLE ACCESS FULL    | INFL_FIVE_FIVES      |  153K |  920K |  1571 |  00:00:19 |
| 8   |      INDEX RANGE SCAN      | INFL_TRIPLE_THREES_PK|     3 |    18 |   284 |  00:00:04 |
| 9   |     INDEX FAST FULL SCAN   | INFL_TRIPLE_THREES_PK|  248K | 1488K |   284 |  00:00:04 |
----------------------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("A"."ID1"="B"."ID1")
4 - access("B"."ID1"="C"."ID1")
7 - filter("C"."ID1" IS NOT NULL)
8 - access("B"."ID1"="C"."ID1")
      

Which implies that for this query, Oracle is not considering anything other than a hash join for INFL_DOUBLE_TWOS, but will decide between nested loops and a hash join for the INFL_FIVE_FIVES to INFL_TRIPLE_THREES join at runtime.

The initial bounds for the inflection point are 1 and 156963. 156963 = 169308 - 12345 (number of rows in INFL_FIVE_FIVES minus number of NULLs in the ID1 field in INFL_FIVE_FIVES).

The nested loop cost at a cardinality of 1 is computed as 1572.78, and the hash join cost is computed as 1856.01. The nested loop cost at a cardinality of 156963 is computed as 315559.34, and the hash join cost is computed as 2208.35. At this point we might suspect that the inflection point is rather closer to 1 than to 156963, but Oracle will perform binary split, and try a cardinality of 78481.5 next.

In this example, the iteration then proceeded as follows
IterationLower BoundUpper Bound Lower Bound CostsUpper Bound Costs
Nested LoopsHash JoinNested LoopsHash Join
11.00156963.001572.781856.01315559.342208.35
2 1.00 78481.50 1572.78 1856.01158566.06 2141.03
3 1.00 39240.75 1572.78 1856.01 80068.42 1856.16
4 1.00 19620.38 1572.78 1856.01 40818.60 1856.09
5 1.00 9810.19 1572.78 1856.01 21194.69 1856.05
6 1.00 4905.09 1572.78 1856.01 11382.73 1856.03
7 1.00 2452.55 1572.78 1856.01 6477.76 1856.02
8 1.00 1226.27 1572.78 1856.01 4023.27 1856.02
9 1.00 613.14 1572.78 1856.01 2797.02 1856.02
10 1.00 306.57 1572.78 1856.01 2184.90 1856.02
11 1.00 153.28 1572.78 1856.01 1876.84 1856.01
12 76.64 153.28 1724.81 1856.01 1876.84 1856.01
13 114.96 153.28 1800.82 1856.01 1876.84 1856.01
14 134.12 153.28 1838.83 1856.01 1876.84 1856.01
15 134.12 143.70 1838.83 1856.01 1858.84 1856.01
16 138.91 143.70 1848.83 1856.01 1858.84 1856.01
17 141.31 143.70 1852.83 1856.01 1858.84 1856.01
18 141.31 142.51 1852.83 1856.01 1856.84 1856.01
19 141.31 142.51 1852.83 1856.01 1854.83 1856.01
with the trace file reporting: Found point of inflection for NLJ vs. HJ: card = 141.91