Wednesday, August 29, 2012

Explaining Table Indexes from a SQL Point of View

As part of the MVP club, we get to network with MVPs in other products in the Microsoft world. So, after some recent discussions within our development group at Sikich, I decided to get an official explanation on a few index properties to get a more technically accurate and concise explanation. (For the record, we did know what these properties mean, but it wouldn't make for good reading. :-))
So I emailed a fellow MVP, Dan Guzman, who has been a SQL MVP for 10 years!

My first question to Dan is more a lead-in into the next question. I asked Dan: "What is the difference between a clustered and non-clustered index?"
SQL Server implements relational indexes as b-tree structures consisting of multiple index levels. These levels form a hierarchy consisting of a single root node at the top of the hierarchy, zero or more intermediate levels below, and finally the index leaf nodes at the bottom with one entry per row. Each index node points to a node in the next lower level in the hierarchy.

SQL Server traverses an index hierarchy during query execution to efficiently locate a row by key rather than scanning the entire table from beginning to end. SQL Server also maintains indexes pages in logical key order using previous/next pointers. This allows SQL Server to use indexes to return data in key sequence without sorting the data.

In the case of a non-clustered index, the index leaf nodes contain the index key plus a unique row locator that points to the corresponding data row. With a clustered index, the index leaf node is the actual data row, which is why a table can have only one clustered index. Every table should typically have a clustered index and the best choice for the clustered index for a given table depends on your queries.

Considering that, what is the difference between a primary key and a clustered index?
All ANSI-compliant relational databases support primary keys. A primary key specifies one or more columns that uniquely identify a row. A table can have only one primary key, consisting of one or more not-null columns that uniquely identify a row in the table.

SQL Server creates a unique index on the primary key column(s) to enforce uniqueness. Like all indexes, the primary key index can be used by SQL Server to optimize query performance and you have the choice between clustered or non-clustered. SQL Server will create a clustered index for the primary key unless a clustered index already exists on the table or you specifically specify a non-clustered primary key.

My third question is a question I have been asked quite a few times because this is a new feature in our Dynamics AX 2012 environment: What are indexes with included columns?
Included columns are additional non-key columns stored in non-clustered leaf nodes. Included columns are applicable only to non-clustered indexes because clustered index leaf nodes are the actual data pages and therefore already include all non-key columns.

A technique sometimes used to improve query performance is a covering index. A covering index is one that contains all columns needed by a query, thus avoiding the need to read the actual data row. Indexes with included non-key columns allow an index to cover a query. Consider the following query with a non-clustered index on DepartmentCode:

SELECT DepartmentCode, FirstName, LastName
FROM dbo.Employee
WHERE DeparmentCode = ‘HR’;

SQL Server will use the non-clustered index on DepartmentCode to locate the employees for the HR department and then read the corresponding employee data row for each employee in order to get the FirstName and LastName values needed by the query. By adding FirstName and LastName as included columns to the DepartmentCode index, SQL Server can use the DepartmentCode, FirstName and LastName columns in the index leaf nodes and avoid touching the data row entirely.