Monday, January 23, 2006

Fun with Lookups in SSIS

One of the current hot projects here is archiving (the database grows with +1GB/day). For one of the solutions we need two files from the same table based on a staging table. This is now done in 2 steps so I tried to figure out a way to read the table only once.

I noticed that the Lookup Transformation has an option to redirect a row when an error occurs. This enables us to separate the output to 2 separate files based on the lookup table.

I started testing and I was thrilled by the performance of these transformations. So off to do the 'real' work, a lookup table with 40.000.000 records (holding just a guid) to export a table with 120.000.000 records. Unfortunately I hit the first 32-bit limitation :-( Because of the massive amount of data SSIS is having trouble reading all the records in memory. I get up to about 38.000.000 rows and then the process stalls. After quite some time I get the following error:
0xC0047031 DTS_E_THREADFAILEDCREATE
The Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state.

SSIS cannot use AWE memory so it depends entirely on the VAS memory meaning only 2GB (or 3 with /3GB enabled) on a 32-bit environment. If we were running a 64-bit environment all my worries would be over :-(

I could skip the cache or cache less data but then it becomes quite slow and the 'double' export runs better in that case.

Who knows... maybe tomorrow I will find a solution.

No comments: