Friday, February 17, 2006

SQL Server vs SAN

Recently we have been looking at some performance problems on some of our servers. I found some interesting information concerning SAN's so here is a little high level overview of things to remember.

  • Format the Data drives with 64k sector size, the Log drive with at least 8k.
  • Align the disks using diskpar in Windows 2003 (diskpart for SP1+) or do this at SAN level if possible (10%-20% improvement in some studies).
  • Use StorPort drivers for your HBA's. SCSIPort was designed for direct attached storage while StorPort is optimized for SAN I/O) - Info
  • Choose RAID 10 over RAID5. Although certain SAN's have optimized RAID5 technology that reduce the overhead RAID10 is still the preferred level.
  • Spread over as many disks as possible.
  • Always separate your data from your log drives. Log is sequential while data is more random I/O, mixing these might cause log latency.
  • Stress test your SAN with realistic file sizes.

Tuesday, February 14, 2006

Clustered indexes on GUIDs

Hmm... this is actually a debate that will go on forever I suppose :-) I've created a little test script to prove a specific point but do NOT think of this as a complete negative advise for clustered indexes on GUIDs. Every DBA should know the sentence 'it all depends' by heart and actually you should *sigh* when someone asks: "What is the best solution for the database?" without knowing the data, the usage of the data and the environment it will run in.

What I wanted to test was fragmentation because of random uniqueidentifiers. When you run this script you will see amazing high figures for the fragmentation factor after just 1000 inserts! On my machine this small set of inserts generated 96,84% fragmentation and 587 fragments.

SQL Server 2005 has a new feature that may ease your fragmentation pain and that is sequential GUIDs. This generates uniqueidentifiers but based on the previous uniqueidentifier. This is great if you still want to use GUIDs but want to get rid of it's biggest strength and drawback which is it's random nature. When you change the script to use sequential GUIDs the fragmentation drops to 0.95% and 6 fragments. This is a quite spectacular drop. However, it is not always possible to use this because I know a lot of you are generating GUIDs on the clientside or middle tier making this unavailable since a sequential GUID can only be used as a default.

Fragmentation does have a lot of negative impact such as reduced scan speed and more pages are needed in memory because the data is spread over a lot of different pages. Is this ALWAYS a bad thing? Well there is something called a hotspot meaning all actions are concentrated on a single 'spot'. This may reduce performance but has been greatly reduced by the introduction of row locking. How many inserts are needed to create a hotspot? Well... it all depends :-) You can use the link from my previous post where a nice script has been provided to detect hotspots and lock contention using the new DMV's.

SET NOCOUNT ON
GO

CREATE TABLE myFragmentation
(myID uniqueidentifier)

GO

DECLARE @i int
SET @i = 0

WHILE @i < 100000
BEGIN
INSERT INTO myFragmentation (myID) VALUES (NewID())
SET @i = @i + 1
END
GO

SELECT * FROM
sys.dm_db_index_physical_stats
(DB_ID(), OBJECT_ID('myFragmentation'), NULL, NULL, 'DETAILED')
GO

CREATE UNIQUE CLUSTERED INDEX myCI ON myFragmentation (myID)
GO

DECLARE @i int
SET @i = 0

WHILE @i < 1000
BEGIN
INSERT INTO myFragmentation (myID) VALUES (NewID())
SET @i = @i + 1
END

SELECT * FROM
sys.dm_db_index_physical_stats
(DB_ID(), OBJECT_ID('myFragmentation'), 1, NULL, 'DETAILED')

DROP TABLE myFragmentation

Index information using DMV's

Wow, this is a GREAT post by the Customer Advisory Team.
http://blogs.msdn.com/sqlcat/archive/2006/02/13/531339.aspx

Sunday, February 12, 2006

SSIS Variables in Execute SQL Task

When trying to use an input variable in my Execute SQL Task I was getting the following error:"Parameter name is unrecognized." and the package obviously failed.

After some playing around I noticed that it had something to do with the difference in variable mapping declaration based on your connection type.

Based on your connection type you have to choose the correct name in your Parameter Mapping window.

Here is a little overview (credits go to Kirk Haselden):
OLEDB takes ? in the query and a number as name (0, 1, ...)
ODBC takes ? in the query and a number as name (1, 2, ...)
ADO takes ? in the query and a @Variable as name
ADO.Net takes @Variable in both the query and the name

Also make sure to choose the correct datatype for the parameter!