Heuristic Optimization of SQL Queries

In general, relational algebra expressions and query trees could be equivalent; that is, they may correspond to the same query. But, the performance or cost of query may vary depending on the query technique that we apply. An optimization technique helps reduce the query execution time as well as the cost by re-formatting the query. We applied heuristic optimization in our queries and could reduce the execution time to a greater extent and thus reduced the cost quite a bit.
Figure 1: Initial Query
If we write this query in SQL we get:
Select Name from E, J, P where E.EID=J.EID and PCode=Code and P.Name='GreenLife' and E.BDate>'31-12-1967'
Which is basically the cross product of E, J and P

Outline of a Heuristic Algebraic Optimization Algorithm:

  1. Break up any select operations with conjunctive conditions into a cascade of select operations.

  2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition.

  3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation.

  4. Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation.

  5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed.

  6. Identify sub trees that represent groups of operations that can be executed by a single algorithm.

Searching on Non-Indexed Field


[Before Heuristic Optimization]:


SET STATISTICS IO ON


SET STATISTICS TIME ON


DECLARE @StartTime datetime,@EndTime datetime


SELECT @StartTime=GETDATE()


SELECT tblProvince.Province_Name, tblCities.City_Name, tblDistricts.District_Name, tblCommunities.Community_Name, tblCategories.Category_Name, tblSubCategories.SubCategory_Name, tblProperties.Price, tblProperties.Fireplace, tblProperties.CentralAC, tblProperties.Pool, tblProperties.GreenSpace, tblProperties.WaterFront, tblProperties.BasementDeveloped, tblProperties.StateArea, tblProperties.BedRooms, tblProperties.BathRooms, tblProperties.BusService, tblProperties.Shopping, tblProperties.School, tblProperties.Clinic, tblProperties.Description, tblProperties.RealtorID, tblProperties.Comments


FROM tblProperties,tblProvince,tblCommunities,tblDistricts,tblCities,tblCategories, tblSubCategories where (tblProperties.Community_ID = tblCommunities.ID) and (tblProperties.District_ID = tblDistricts.ID) and (tblProperties.City_ID = tblCities.ID) and (tblProperties.Province_ID = tblCities.Province_ID) and (tblProperties.SubCategory_ID = tblSubCategories.ID) and (tblProperties.Category_ID = tblCategories.ID)


and Province_Name='AB' and City_Name='City1' and District_Name='NW' and Community_Name='Community1' and Category_Name='Category2' and subcategory_name='SubCategory3'


SELECT @EndTime=GETDATE()


SELECT DATEDIFF(ms,@StartTime,@EndTime) AS [Duration in millisecs]


Execution Time: CPU time = 94 ms, Elapsed time = 282 ms.


Searching on Indexed Field [IDs]:


[Before Heuristic Optimization]:


SET STATISTICS IO ON


SET STATISTICS TIME ON


DECLARE @StartTime datetime,@EndTime datetime


SELECT @StartTime=GETDATE()


SELECT tblProvince.Province_Name, tblCities.City_Name, tblDistricts.District_Name, tblCommunities.Community_Name, tblCategories.Category_Name, tblSubCategories.SubCategory_Name, tblProperties.Price, tblProperties.Fireplace, tblProperties.CentralAC, tblProperties.Pool, tblProperties.GreenSpace, tblProperties.WaterFront, tblProperties.BasementDeveloped, tblProperties.StateArea, tblProperties.BedRooms, tblProperties.BathRooms, tblProperties.BusService, tblProperties.Shopping, tblProperties.School, tblProperties.Clinic, tblProperties.Description, tblProperties.RealtorID, tblProperties.Comments


FROM tblProperties,tblProvince,tblCommunities,tblDistricts,tblCities,tblCategories, tblSubCategories where (tblProperties.Community_ID = tblCommunities.ID) and (tblProperties.District_ID = tblDistricts.ID) and (tblProperties.City_ID = tblCities.ID) and (tblProperties.Province_ID = tblCities.Province_ID) and (tblProperties.SubCategory_ID = tblSubCategories.ID) and (tblProperties.Category_ID = tblCategories.ID)


and tblProperties.Province_Id=1 and tblProperties.City_Id=1 and tblProperties.District_Id=1 and tblProperties.Community_Id=1 and tblProperties.Category_Id=1 and tblProperties.subcategory_Id=1


SELECT @EndTime=GETDATE()


SELECT DATEDIFF(ms,@StartTime,@EndTime) AS [Duration in millisecs]


Execution Time: CPU time = 15 ms, Elapsed time = 236 ms.



Searching on Non-Indexed Field


[After Heuristic Optimization]:


SET STATISTICS IO ON


SET STATISTICS TIME ON


DECLARE @StartTime datetime,@EndTime datetime


SELECT @StartTime=GETDATE()


SELECT tblProperties.ID, tblProvince.Province_Name, tblCities.City_Name, tblDistricts.District_Name, tblCommunities.Community_Name, tblCategories.Category_Name, tblSubCategories.SubCategory_Name, tblProperties.Price, tblProperties.Fireplace, tblProperties.CentralAC, tblProperties.Pool, tblProperties.GreenSpace, tblProperties.WaterFront, tblProperties.BasementDeveloped, tblProperties.StateArea, tblProperties.BedRooms, tblProperties.BathRooms, tblProperties.BusService, tblProperties.Shopping, tblProperties.School, tblProperties.Clinic, tblProperties.Description, tblProperties.RealtorID, tblProperties.Comments


FROM (((((tblProperties LEFT JOIN tblProvince ON tblProperties.Province_ID=tblProvince.ID) LEFT JOIN tblCities ON (tblProperties.Province_ID=tblCities.Province_ID) AND (tblProperties.City_ID=tblCities.ID)) LEFT JOIN tblDistricts ON (tblProperties.City_ID=tblDistricts.City_ID) AND (tblProperties.Province_ID=tblDistricts.Province_ID) AND (tblProperties.District_ID=tblDistricts.ID)) LEFT JOIN tblCommunities ON (tblProperties.District_ID=tblCommunities.District_ID) AND (tblProperties.City_ID=tblCommunities.City_ID) AND (tblProperties.Province_ID=tblCommunities.Province_ID) AND (tblProperties.Community_ID=tblCommunities.ID)) LEFT JOIN tblCategories ON (tblProperties.PropertyType_ID=tblCategories.PropertyTypeID) AND (tblProperties.Category_ID=tblCategories.ID)) LEFT JOIN tblSubCategories ON (tblProperties.Category_ID=tblSubCategories.CategoryID) AND (tblProperties.PropertyType_ID=tblSubCategories.PropertyTypeID) AND (tblProperties.SubCategory_ID=tblSubCategories.ID) WHERE Province_Name='AB' and City_Name='City1' and District_Name='NW' and Community_Name='Community1' and Category_Name='Category2' and subcategory_name='SubCategory3'


SELECT @EndTime=GETDATE()


SELECT DATEDIFF(ms,@StartTime,@EndTime) AS [Duration in millisecs]


CPU time = 47 ms, Elapsed time = 209 ms.



Searching on Indexed Field [IDs]:


[After Heuristic Optimization]:



SET STATISTICS IO ON


SET STATISTICS TIME ON


DECLARE @StartTime datetime,@EndTime datetime


SELECT @StartTime=GETDATE()


SELECT tblProperties.ID, tblProvince.Province_Name, tblCities.City_Name, tblDistricts.District_Name, tblCommunities.Community_Name, tblCategories.Category_Name, tblSubCategories.SubCategory_Name, tblProperties.Price, tblProperties.Fireplace, tblProperties.CentralAC, tblProperties.Pool, tblProperties.GreenSpace, tblProperties.WaterFront, tblProperties.BasementDeveloped, tblProperties.StateArea, tblProperties.BedRooms, tblProperties.BathRooms, tblProperties.BusService, tblProperties.Shopping, tblProperties.School, tblProperties.Clinic, tblProperties.Description, tblProperties.RealtorID, tblProperties.Comments


FROM (((((tblProperties LEFT JOIN tblProvince ON tblProperties.Province_ID=tblProvince.ID) LEFT JOIN tblCities ON (tblProperties.Province_ID=tblCities.Province_ID) AND (tblProperties.City_ID=tblCities.ID)) LEFT JOIN tblDistricts ON (tblProperties.City_ID=tblDistricts.City_ID) AND (tblProperties.Province_ID=tblDistricts.Province_ID) AND (tblProperties.District_ID=tblDistricts.ID)) LEFT JOIN tblCommunities ON (tblProperties.District_ID=tblCommunities.District_ID) AND (tblProperties.City_ID=tblCommunities.City_ID) AND (tblProperties.Province_ID=tblCommunities.Province_ID) AND (tblProperties.Community_ID=tblCommunities.ID)) LEFT JOIN tblCategories ON (tblProperties.PropertyType_ID=tblCategories.PropertyTypeID) AND (tblProperties.Category_ID=tblCategories.ID)) LEFT JOIN tblSubCategories ON (tblProperties.Category_ID=tblSubCategories.CategoryID) AND (tblProperties.PropertyType_ID=tblSubCategories.PropertyTypeID) AND (tblProperties.SubCategory_ID=tblSubCategories.ID) WHERE tblProperties.Province_Id=1 and tblProperties.City_Id=1 and tblProperties.District_Id=1 and tblProperties.Community_Id=1 and tblProperties.Category_Id=1 and tblProperties.subcategory_Id=1


SELECT @EndTime=GETDATE()


SELECT DATEDIFF(ms,@StartTime,@EndTime) AS [Duration in millisecs]


CPU time = 15 ms, Elapsed time = 113 ms.



So, we found that the searching technique utilizing Indexed Fields for property searches and after doing the heuristic query optimization takes minimal time [CPU time = 15 ms, Elapsed time = 113 ms.] to produce the query output.


Comments

Popular posts from this blog

Cloud Computing Technology Assessment

Database Testing With DBUnit

Data Science, Big Data & Microsoft Machine Learning