Darth Rob - SQL Topics

Wednesday, August 27, 2008

Alternative to "SELECT COUNT(*)" to get the number of rows in a table

I do quite a bit of data integration, so "SELECT COUNT(*)" is a favorite of mine when refreshing data to determine the amount of time remaining before the refresh will complete. With small tables, this isn't a problem. Slap a (nolock) hint on it, and off ya go. With large tables (millions of records) this can be a problem. Anyone working with larger tables knows that they are far less forgiving than smaller tables and require experience, finess, luck, and clairvoyance to maintain performance. This holds true for the old "COUNT(*)".

Fortunately, SQL keeps a row count in sysindexes. It's not updated with every record, but it will give you an approximate count. Most times, that's all I really need.

So, instead of:

select count(*) from PurchaseOrderDetail (nolock)

Use:

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('PurchaseOrderDetail') AND indid < 2

If you absolutely have to have up to the millisecond reporting on the number of rows, you can always run DBCC UpdateUsage to update the sysindexes table. However, if you're doing that, you're probably better off just using count(*).

Tuesday, December 04, 2007

Named Pipes and YOU!

Don't laugh, but we still use LEI 3.2 where I work. The other day, I started getting errors like the following:

Error: ErrorRecord: HResult: 0x80004005 Description: [DBNMPNTW]Connection broken. SQLErrorInfo: 08S01 Source: Microsoft OLE DB Provider for SQL Server , Connector 'Yadda Yadda', Method -Connect- (12325)

DBNMPNTW is the name of the dll used to connect to SQL Server using Named Pipes. By default, Named Pipes are usually not enabled for SQL 2005. LEI should have been using TCP/IP like a good little client. But, I can't blame LEI for this one. LEI's installed on a Win2K server (stop laughing). So, I opened the Client Configuration tool (cliconfg.exe) on the server to set the "default protocol" to TCP/IP. Unfortunately, I wasn't a local admin on the box so I kept getting the following annoying error:

The default network protocol that has been specified is not one of the default protocols. Please make sure it is installed correctly.

What it should have said is:

You're not a local admin, so you can't use this tool. If you do anything on this screen, it will be silently erased causing you to pull out your hair.

So, I had someone with the appropriate rights log into the server and run cliconfg.exe and set the "default protocol" to TCP/IP.

The DBNMPNTW error is now gone, and everything's working normally.

Thursday, July 06, 2006

Code Monkey

One day while working diligently on a project, one of our security guys sent me link to a song called Code Monkey. Apparently, Code Monkey has a lot in common with some of the people who work for me. The song's here. Enjoy.

Monday, July 03, 2006

DevPapers - article Tips And Tricks For Advanced MS SQL Server Developers

Here's a great article entitled DevPapers - article Tips And Tricks For Advanced MS SQL Server Developers. It includes a few tips I'd never heard of before such as how to quickly determine how many rows are in a table without using "count(*)."

For reference, here's the contents of the above mentioned article

1. Use "TRUNCATE TABLE" statement instead of "DELETE" clause if you want to delete all rows from a table. It is much faster then "DELETE" statement without any conditions. "TRUNCATE TABLE" frees all the space occupied by that table's data and indexes, without logging the individual row deletes.

2. Always use owner prefix in T-SQL queries:

SELECT mycolumn FROM dbo.mytable

In this case query optimizer does not have to decide whether to retrieve from dbo.mytable or other owner’s table and avoids recompilation. Recompilation results in no performance advantages of stored procedures usage.

3. Don't use "sp_" as your prefix for stored procedures – it is a reserved prefix in MS SQL server! MS SQL server searches for a stored procedure with "sp_" prefix in the system procedures first, and only after that looks for them in client procedures.

4. If you are unable to install MSDE at home because of unknown error – check that you did not stop "Server" system service on you PC…

5. There are thousands of examples, when developers use "SELECT COUNT(*)" statement. But there is another, much faster way to accomplish the task:

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('Table_Name') AND indid < 2

6. Include "SET NOCOUNT ON" statement in your stored procedures to greatly reduce network traffic.

7. Use the "BETWEEN" clause instead of "IN" for greater performance:

SELECT productId FROM customer WHERE productId BETWEEN 1 AND 9

Instead of:

SELECT productId FROM customer WHERE productId IN (1, 2, 3, 4,5,6,7,8,9)

8. Use Table variables - new feature of MS SQL 2000 instead of temp tables. Table variables are created in memory, not written to the tempdb database, and therefore they are much faster. However, be careful to use them only with not very huge amount of data that you want to allocate in temp tables, otherwise you can easily get the server down.

Thursday, June 29, 2006

Clustered vs. Non Clustered Index

In SQL Server, there are two types of indexes: clustered and non-clustered. Both serve the same general purpose: to allow you to quickly find specific information in a table or view. However, each has their specific advantages and disadvantages as you will see.

Clustered Indexes
A clustered index determines the order in which the individual records in a table are physically stored on disk. Due to this, there can be only one clustered index per table. The primary advantage of a clustered index is its speed. Since the leaf node of a clustered index actually contains the data for it corresponding row, it is extremely efficient. You can compare this to a dictionary where the definition of a word is next to the word that is being defined.

Non-Clustered Indexes
In contrast, non-clustered indexes do not determine the order of the data on disk, so there can be up to 249 non-clustered indexes per table in SQL 2005. The leaf node of a non-clustered index contains a pointer to the data in the clustered index. So the non-clustered index can be thought of as an indirect index. You can compare this to the index in the back of a book that references another page in the book where more information about that entry can be found.

Organization
There are many schools of thought on the best way to create indexes. In my experience, both Clustered and Non-Clustered Indexes benefit most when the cardinality of the data is taken into consideration when creating the index. The columns with the greatest cardinaltiy (that is the columns with the greatest number of distinct values) should come first in your index. The columns with the least cardinality (that is the columns with the fewest distinct values) should come last. An example of a low cardinality column would be one with a datatype of bit. An example of a high cardinality column would be an identity column.

Summary
It's best to organize indexes as you go rather than wait until queries get slow. At that point, you don't know who or what is reference your data. By changing a clustered index to benefit one query, you could inadvertently slow another query dramatically. Experiment with some typical queries and try to limit the data you pull into your application when at all possible. Instead of defaulting to an "All" view of invoices, only show invoices with a particular status or time frame by default. This way, you can include the status of the invoice in the clustered index, further improving the database's performance.

Truncating SQL LDF Files

In SQL Server, quite often I find that LDF Files grow leaps and bounds even though the recovery mode is set to Simple. Annoying as it is, it's a fact of life that large transactions have to be stored somewhere. However, once the transaction has completed, LDF files don't automatically shrink. So, from time to time, we'll run the following command to shrink the log:

BACKUP LOG database_name WITH TRUNCATE_ONLY

Keep in mind that this should only be done in extreme cases when you either have no choice but to truncate the log or when you have no use for point-in-time recovery as this command basically blows away the transaction log without backing it up.