Friday, July 07, 2006

Avoid Table Scan In SQL Server

How SQL Server handles a query exactly? Its query optimizer takes the query, performs some analysis of related objects, and comes up a execution plan. The worst scenario is a full table scan with large amount of data. The full table scan is not efficient because every row in the table will be searched no matter it's qualified or not; it's also not scalable as data grows.

We can use indexing to avoid the full table scan. With indexed column(s), SQL Server will do the binary search (Index Seek) which is log2(N) complexity. A single record search with full table scan against 1 million records results in 1 million of row process, and the same search using indexed column(s) could have maximum of 20 row process (log2(1000000) < 20). It's a big difference when the row number is huge.

An index should include all columns that are involved in the Where clause. Also we need to write our queries carefully so that SQL Server can understand to use the index for searching. Suppose we have an Employee table with FirstName, LastName, StartDate and Country columns. All columns except Country column have been indexed (non-cluster indexing). Following queries will be index-based and fast:
SELELCT * FROM Employee WHERE FirstName = 'Mike'
SELELCT * FROM Employee WHERE FirstName = 'Mike' AND LastName = 'Bauer'
SELELCT * FROM Employee WHERE FirstName = 'Mike' OR LastName = 'Bauer'
SELELCT * FROM Employee WHERE FirstName = 'Mike' AND Country = 'Canada'
SELELCT * FROM Employee WHERE FirstName LIKE 'M%'
SELELCT * FROM Employee WHERE StartDate BETWEEN '20000101' AND '20001231'
But following queries will be less efficient because of the table scan:
SELELCT * FROM Emoloyee WHERE Country = 'Canada'
SELELCT * FROM Emoloyee WHERE FirstName = "Mike' OR Country = 'Canada'
SELELCT * FROM Employee WHERE FirstName LIKE '%M'
SELELCT * FROM Employee WHERE FirstName <> 'Mike'
SELELCT * FROM Employee WHERE StartDate = DATEPART(yyyy, OrderDate) = 2000