Thursday, January 26, 2006

Slow mass deletes

My favorite Lead DBA Patrick asked me to find out how we could delete millions of rows from a table in the fastest way possible. You guessed it... this too is linked to the archiving project :-)

I was inspired by a blogpost from Kimberly Tripp where she tested mass deletes extensively with all different kinds of indexing.

The conclusion was that deletes from tables with non-clustered indexes was a lot slower as opposed to tables with only a clustered index.

When I started testing the delete options I suddenly noticed that there was a lot of locking activity on TempDB. Freaky as I can get I needed to find out why this was happening.

So off to investigate!

I've created a small table with the following script:



SET NOCOUNT ON
GO
CREATE TABLE TestNCDelete
(myID int IDENTITY(1,1),
myChar char(500),
myVarChar varchar(500)
)

GO

DECLARE @i int
SET @i = 0

WHILE @i < 10000
BEGIN
INSERT INTO TestNCDelete (myChar, myVarChar)
VALUES (' ', REPLICATE('A', 500))

SET @i = @i + 1

END

CREATE UNIQUE CLUSTERED INDEX CI_myID ON TestNCDelete (myID)
GO
CREATE INDEX IX_myChar ON TestNCDelete (myChar)
GO
CREATE INDEX IX_myVarChar ON TestNCDelete (myVarChar)
GO


--DROP TABLE TestNCDelete
Let's delete everything from this table:


BEGIN TRAN
DELETE FROM TestNCDelete WITH (TABLOCKX)

--ROLLBACK

Check the active locks with sp_lock and there you go... lots and lots of extent locks on TempDB.

Now why is this happening?
It is actually very simple to find out... the query plan says it all!

After the Clustered Index Delete you will see a Table Spool/Eager Spool action for every non-clustered index. The description of this action is quite clear: "Stores the data from the input in a temporary table in order to optimize rewinds". This is followed by a Sort, an Index Delete, a Sequence and finally the delete is final.

You can imagine that these spools, sorts, ... can be quite intrusive when we are talking about 40 to 100 million rows.

Another mystery solved!

No comments: