Saturday, June 14, 2008

Components of the Query Optimizer

If we can divide the function of query optimizer then we can get the following components of a query optimizer.

1)Query Transformer.
2)Estimator.
3)Plan Generator.

The functions of these components are given below.
Transforming Queries
--------------------------------------

Parser passed the parsed query to the query transformer as input, which is represented by a set of query blocks. The main objective of the query transformer is to determine if it is advantageous to change the form of the query so that it enables generation of a better query plan. Several different query transformation techniques are employed by the query transformer, including:

i)View Merging
ii)Predicate Pushing
iii)Subquery Unnesting
iv)Query Rewrite with Materialized Views
v)OR-expansion

Any combination of these transformations can be applied to a given query.

i)View Merging: The query transformer analyze the view query block separately from rest of the query and generate a view subplan. The optimizer then processes the rest of the query by using the view subplan in the generation of an overall query plan. The query transformer then removes the potentially suboptimal plan by merging the view query block into the query block that contains the view. This technique usually leads to a suboptimal query plan, because the view is optimized separately from rest of the query.

ii)Predicate Pushing: For those views that are not merged, the query transformer can push the relevant predicates from the containing query block into the view query block. This technique improves the subplan of the non-merged view, because the pushed-in predicates can be used either to access indexes or to act as filters.

iii)Subquery Unnesting: Often the performance of queries that contain subqueries can be improved by unnesting the subqueries and converting them into joins. Most subqueries are unnested by the query transformer. For those subqueries that are not unnested, separate subplans are generated. To improve execution speed of the overall query plan, the subplans are ordered in an efficient manner.

iv)Query Rewrite with Materialized Views:
A materialized view is like a query with a result stored in a table. When a user query is found compatible with the query associated with a materialized view, the user query can be rewritten in terms of the materialized view. This technique improves the execution of the user query, because most of the query result has been precomputed. The query transformer looks for any materialized views that are compatible with the user query and selects one or more materialized views to rewrite the user query. The use of materialized views to rewrite a query is cost-based. That is, the query is not rewritten if the plan generated without the materialized views has a lower cost than the plan generated with the materialized views.

So if you convert a view query into a materialized view then there is possibly improving performance of the query.
v)OR-expansion:

2)Estimator.
--------------------------

The estimator generates three different types of measures:

i)Selectivity
ii)Cardinality
iii)Cost
These measures are related to each other, and one is derived from another. The end goal of the estimator is to estimate the overall cost of a given plan.

i)Selectivity: The selectivity of a predicate indicates how many rows from a row set will pass the predicate test. Selectivity lies in a value range from 0.0 to 1.0. A selectivity of 0.0 means that no rows will be selected from a row set, and a selectivity of 1.0 means that all rows will be selected.

ii)Cardinality: Cardinality represents the number of rows in a row set. Here, the row set can be a base table, a view, or the result of a join or GROUP BY operator.

iii)Cost: The cost represents units of work or resource used. The query optimizer uses disk I/O, CPU usage, and memory usage as units of work. So, the cost used by the query optimizer represents an estimate of the number of disk I/Os and the amount of CPU and memory used in performing an operation.

The access path determines the number of units of work required to get data from a base table. The access path can be a table scan, a fast full index scan, or an index scan. During table scan or fast full index scan, multiple blocks are read from the disk in a single I/O operation. Therefore, the cost of a table scan or a fast full index scan depends on the number of blocks to be scanned and the multiblock read count value. The cost of an index scan depends on the levels in the B-tree, the number of index leaf blocks to be scanned, and the number of rows to be fetched using the rowid in the index keys. The cost of fetching rows using rowids depends on the index clustering factor.

The join cost represents the combination of the individual access costs of the two row sets being joined, plus the cost of the join operation.

3)Generating Plans
-----------------------------------

The main function of the plan generator is to try out different possible plans for a given query and pick the one that has the lowest cost. Many different plans are possible because of the various combinations of different access paths, join methods, and join orders that can be used to access and process data in different ways and produce the same result.

No comments:

Post a Comment