Monday, March 20, 2006

One big lookup table vs many small lookup tables

Recently I was reading SQL Magazine when I came across an article "Designing for performance: lookup tables" (instadoc: 48811 - Michelle A Poolet). The question was whether or not one big lookup table was better than many small ones.

While I do agree that one big lookup table complicates things in terms of integrity and data management there are a couple of arguments I disagree with.

Locking and blocking increases

The nature of a lookup table is read-only and given this I would suspect that only shared locks are being taken on the lookup table. So how would you experience locking and blocking in such a scenario?

You create a hotspot on the disk

I don't see why you would have hotspots on a read-only lookup table. Being a hot table means that all the pages of that table would probably be in your data cache - meaning less disk I/O once the pages have been cached. You can even force the table to stay in memory by pinning it, but I don't think many people actually use this technique since SQL Server handles data caching quite well.

I think like any database question the only answer here is: "it all depends". We have many domain specific lookup tables but we also have a big lookup table with over 300 different types. In a database with 400 tables I wouldn't want to create 300 additional tables for simple lookups where most of them would contain 3 to 4 records.

4 comments:

tv said...

Your new carrier as a basher is taking new heights :-D

Gabriel Lozano-MorĂ¡n said...

If correctly designed the software will cache the data from reference tables and maybe use some caching dependency mechanisms to invalidate the cached data and retrieve the updated data.

There are several reasons why you might want to use seperate tables:
(1) Data normalization rules
(2) Microsoft has always claimed that having a lot of nullable columns in SQL Server decreases the performance because of the overhead during comparisons ...

What concerns number (2) we all know that this is not really true for simple CRUD operations

What concerns number (1) Personally I like guidelines, best practices and rules. So why break them if you don't need to?

So the real question here does performance really matter? and if so how important is this for the specific project? Are you running a dedicated server? Clustered? Shared server? ...

If performance is critical you would propably try out different scenarions comparing the execution plans.

Just my 50 cents though, cuz as you I am not really a SQL Server expert.

WesleyB said...

Thank you 'favourite' colleague :-)

Ola Gabriel ;-)
You are indeed right that most well designed applications use some sort of caching mechanism for lookup tables.

I too tend to follow the general guidelines and best practices. But sometimes you have to think if following them causes performance problems or maintenance issues and evaluate which is most important for your project.

This being said, you have the right concerns for a non-SQL Expert and I wish more developers had them :-)

tv said...

Did I wrote "carrier"? Oh well, I speak English very well, I learned it from a book ;-)