Why too much Database Normalization can be a Bad Thing

Main menu:

Topics

Recent Posts

Blog

November 2008
M T W T F S S
« Oct   Dec »
 12
3456789
10111213141516
17181920212223
24252627282930

Past Posts

Java/Java EE

JDBC

Other

Why too much Database Normalization can be a Bad Thing

November 19th, 2008 by Scott Selikoff

As someone with a long history in database optimization and who even did their Master’s project on Database normalization, I’m probably the last person in the world to argue against database normalization. From a theoretical standpoint, database normalization is a wonderful thing, helping to organize your data into easy-to-manage and understandable parts. For this article, I will play devil’s advocate and argue why too much normalization can be a bad thing.

Database

The years of working in the professional software industry has taught me the practical implications of a fully normalized database system: it’s often slow. It’s a non-trivial issue to design a good database schema that is both fully normalized and performs well on a variety of circumstances. I think back to some systems that I have seen in which a single common query joins dozens or more very large tables. Keep in mind, the more normalized your data is, the more joins that are required. For example, if you normalize a user’s address into a separate table (so the user can have a set of addresses) you have to join the user table with the address table every time you want to display their address. Then, you often have to join with a city, state, and country tables for full normalization.

Let’s take some of the advantages of database normalization and look at why they might not be so great:

Normalization saves space, but space is cheap!
The most obvious advantage of normalization is spacing saving. For example, instead of listing a “United States of America” for 10,000 records in a Users table, I can create a country table that lists the text once, then create a reference with an integer. Clearly, 10,000 integers take less space than 10,000 24-digit text fields. But these days, space is cheap! Terabyte drives are now common, so would normalizing some of the fields of a table really save much space? Probably not. I’m not saying denormalize all fields, but there’s some where the advantages to space are negligible.

Normalization simplifies updates, but reads are more common!
Another common reason for normalization is to simplify updates and reduce anomalies. For example, in the case of “United States of America” text in the Users table, its a lot easier to update/change the text in a single record than it is to update 10,000 records. But that brings up an interesting point, how often does the text “United States of America” change? I would argue almost never. There are examples where data does change more frequently, such as a user’s address, but its common knowledge in database systems that in most tables reads are far more frequent than writes. Therefore, if you have a table with relatively stable data that changes infrequently, normalization isn’t buying you a lot.

Performance, performance, Performance
Database administrators spend the bulk of their time worrying about performance, how to optimize queries and table structure, and in that regard normalization rarely helps. Those calls people tend to get at 2AM about the system crashing over a specific query? They often related to normalization in some way. If you have a case where you can eliminate a normalized table with minimal impact on functionality, it is better to do so. For example, in the case of address you might need users to be able to have multiple addresses (1-to-many relationship), therefore there’s no way to avoid normalizing the data into a separate table. But there are other cases such as with States and Countries, where constantly joining to two completely static tables is not helping. You can still have the tables present in order for a user to select a state or country from a drop-down list, but it may be better to save the text of the state or country in the user’s table, instead of a reference.

Conclusion and Next Article
In this article I played devil’s advocate arguing that too much normalization can be a bad thing. On the other end of the spectrum, too little normalization can also be a bad thing. The mark of a good database designer and software developer is the ability to find a good balance between the two that matches the database structure against the actual or expected use of the system. In the next article, I will discuss some of the tools available in most databases to help combat the performance issues of normalization such as indexes, triggers, and materialized views.

Comments

Comment from Milos Misic
Posted: November 20, 2008 at 7:58 am

Nice article, I’m no db-expert but I agree with the point you made about too much normalization though your example with country table and USA as an example isn’t that good. For instance I am from Serbia and this country has changed its name several times, from ‘Yugoslavia'; ‘Serbia and Montenegro’ to only ‘Serbia’. So even country table isn’t ‘static’ as you wrote.

Comment from Lars Hoss
Posted: November 20, 2008 at 8:42 am

I agree that too much normalization can have negative impacts for read performance. But it helps to reduce redundancy and maintain integrity. Quite often it also depends on the database you are using. Almost every MySQL developer will tell you that joins suck and you better use denormalized tables for best speed. Things are different for PostgreSQL or even Oracle, though. And things like Materialized Views can help to improve read performance as well.
Some days ago I had to “fix” a table from someone who thinks that normalization is bad. This table has 80 columns. Columns like cost_age30, cost_age31, cost_age32 …
Now imaging how queries do look like on those tables *sigh*

Comment from Mike
Posted: November 20, 2008 at 11:36 am

It is true that joins have a cost, but the cost is primarily in CPU. The cost of denormalizing is in disk. Now it is true that disk is cheap, but CPU performance has improved more dramatically in recent years, more so than disk speed. (In fact the increase of disk capacity may have reduced the performance of many disk systems in practice, because a wider RAID array with more spindles is no longer necessary from a space point of view). From a performance point of view, disk is still orders of magnitude slower than memory.

I tested the countries example in MS SQL Server. At 10,000 rows there was not much difference, but by 160,000 rows there were clearly fewer logical reads in the normalized model.

Comment from Joachim
Posted: November 20, 2008 at 3:09 pm

Great. So true !
This is one of the amazing aspects where the conclusion is so clear and easy to take up – but most developers don’t ask that question. 3NF is holy, if you don’t normalize till it hurts you’re a bad developer and you’ll be contracted by an awful disease soon ;-) .
Good to read that someone has the ba… to have an own opinion.
Thanks !

Comment from Daoud AbdelMonem Faleh
Posted: November 20, 2008 at 4:58 pm

I once eliminated all FK relationships from the database and managed them in the code. It was more error prone yes, it took more time to implement yes, but it was a huge gain in performance. I’ve gone that way cause denormalisation was not an option as some of those tables were shared between many entities and updates were frequent on them. The key to success down this road was to test test test…

Comment from Raditya Arthapraja
Posted: November 21, 2008 at 1:30 pm

Great, article!

How about a performance chart comparing the database that stores the same information; but, with the first database being normalized too much, the second database being not too normalized, and the third being not normalized?

Comment from scott
Posted: November 23, 2008 at 4:38 am

I think a lot of you are taking me too literally… I never said get rid of normalization, but its clear that too much of it can be a bad thing. In general, reads within a denormalized system is almost always faster than a normalized one, the catch is updating and data storage are poor.

Comment from scott
Posted: November 23, 2008 at 4:39 am

Great Raditya, why create a chart as a project and I’ll post the results.

Comment from Kevin
Posted: August 30, 2010 at 7:36 pm

An old topic, but still ranks in Google, so I’ll still comment.

I think you need to also consider disk latency when assessing the cost of disk space. It might be cheap per GB of storage, but if the database manager needs to bring in four times as many pages in order to perform a join or any other operation on the table, then that could have a very negative impact on performance. There’s usually more disk storage than memory allocated to a database.

Also, the sophistication of the database manager in generating access paths and the accuracy of database statistics will play a large part in determining whether it’s faster to denormalise to plain text, or use lookup tables (as per your ‘country’ field example.

Comment from ARH95
Posted: October 15, 2010 at 5:18 pm

Fabulous article. I’m dealing with overnormalization right now, in an IT computer database. Normalizing, in this case, is just more work than it’s worth, especially for a small company.

I mean, I’ve got a table for CPUType and three entries for a P5. I will NEVER buy another P5. They don’t make them. I don’t don’t want one… you get the picture, I hope.

I am in the process of “denormalizing,” to a degree… but I will admit it HURTS because normalizing as been so drilled in to my brain. :-)

Pingback from Memo: Avoid Functions in Database Queries | Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky
Posted: October 21, 2010 at 2:19 am

[...] although since reads are more common than writes, this is often easy in practice. I am aware some developers have issues with denormalized data, but it is often necessary on large tables with queries that can run for minutes or even [...]

Comment from john
Posted: October 7, 2013 at 8:03 pm

I’m poking around SQL Server 2012 with the Adventureworks database. So sick of all these numeric ids!!!

Write a comment