Darth Rob - SQL Topics

Thursday, June 29, 2006

Clustered vs. Non Clustered Index

In SQL Server, there are two types of indexes: clustered and non-clustered. Both serve the same general purpose: to allow you to quickly find specific information in a table or view. However, each has their specific advantages and disadvantages as you will see.

Clustered Indexes
A clustered index determines the order in which the individual records in a table are physically stored on disk. Due to this, there can be only one clustered index per table. The primary advantage of a clustered index is its speed. Since the leaf node of a clustered index actually contains the data for it corresponding row, it is extremely efficient. You can compare this to a dictionary where the definition of a word is next to the word that is being defined.

Non-Clustered Indexes
In contrast, non-clustered indexes do not determine the order of the data on disk, so there can be up to 249 non-clustered indexes per table in SQL 2005. The leaf node of a non-clustered index contains a pointer to the data in the clustered index. So the non-clustered index can be thought of as an indirect index. You can compare this to the index in the back of a book that references another page in the book where more information about that entry can be found.

Organization
There are many schools of thought on the best way to create indexes. In my experience, both Clustered and Non-Clustered Indexes benefit most when the cardinality of the data is taken into consideration when creating the index. The columns with the greatest cardinaltiy (that is the columns with the greatest number of distinct values) should come first in your index. The columns with the least cardinality (that is the columns with the fewest distinct values) should come last. An example of a low cardinality column would be one with a datatype of bit. An example of a high cardinality column would be an identity column.

Summary
It's best to organize indexes as you go rather than wait until queries get slow. At that point, you don't know who or what is reference your data. By changing a clustered index to benefit one query, you could inadvertently slow another query dramatically. Experiment with some typical queries and try to limit the data you pull into your application when at all possible. Instead of defaulting to an "All" view of invoices, only show invoices with a particular status or time frame by default. This way, you can include the status of the invoice in the clustered index, further improving the database's performance.

0 Comments:

Post a Comment

<< Home