Query processing cost formulae
Legend
Symbol Description
NKeys(Col) The number of distinct values of column Col
High(Col) The highest value of column Col
Low(Col) The lowest value of column Col
NTuples(R) The number of tuples of relation R
NPages(R) The number of pages of relation R
NPages(I) The number of pages of index I
Height(I) The height of index I
RFi The product of all reduction factors
NTuples(Ri) The product of the numbers of tuples of all
relations taking part in a join
num_passes(R) The number of passes for sorting relation R
PF Projection factor (portion of all columns)
1. Reduction factor (Selectivity)
a. Col = value
RF = 1/NKeys(Col)
b. Col > value
RF = (High(Col) – value) / (High(Col) – Low(Col))
c. Col < value
RF = (val – Low(Col)) / (High(Col) – Low(Col))
d. Col_A = Col_B (for joins)
RF = 1/ (Max (NKeys(Col_A), NKeys(Col_B)))
e. In no information about NKeys or interval, use a “magic number” 1/10
RF = 1/10
2. Result size calculations
a. Single table
Result_size = NTuples(R) * RFi
b. Joins
Result_size = NTuples(Ri) * RFi
3. Indexing Cost
a. B+-tree index
i. Just a single tuple (selection over a primary key)
Cost = Height(I) + 1
ii. Clustered index (multiple tuples)
Cost = (NPages(I)+NPages(R)) * RFi
iii. Unclustered (multiple tuples)
Cost = (NPages(I)+NTuples(R)) * RFi
b. Hash Index
i. Just a single tuple (selection over a primary key)
Cost = 1.2 + 1 = 2.2
ii. Clustered index (multiple tuples)
Cost = (NPages (R)) * RFi * 2.2
iii. Unclustered index (multiple tuples)
Cost = (NTuples(R)) * RFi * 2.2
4. Sequential Scan (i.e. Heap Scan) Cost
Cost = NPages(R)
5. Joins (between relations R and S, R = outer, S = inner) Cost
a. NLJ
i. Tuple-oriented NLJ
Cost = NPages(R) + NTuples(R) * NPages(S)
ii. Page-oriented NLJ
Cost = NPages(R) + NPages (R) * NPages(S)
iii. Block-oriented NJL (for block_size B)
Cost = NPages(R) + ceil(NPages (R)/(B-2)) * NPages(S)
b. Hash Join
Cost = 3*(NPages(R) + NPages(S))
c. Sort-Merge Join
CostSMJ = NPages(R) + NPages(S) +
2* NPages(R)* num_passes(R) +
2* NPages(S)* num_passes(S)
6. Projections Cost
a. Sort-based
Cost = ReadTable + WriteProjectedPages + SortProjectedPages +
ReadProjectedPages = NPages(R)+ NPages(R)*PF + 2*num_passes*
NPages(R)*PF + NPages(R)*PF
b. Hash-Based
Cost = ReadTable + WriteProjectedPages + ReadProjectedPages
= NPages(R) + NPages(R)*PF + NPages(R)*PF