Site menu:

Topics

Recent Posts

Blog

 

November 2008
M T W T F S S
« Oct    
 12
3456789
10111213141516
17181920212223
24252627282930

Past Posts

Links:

Why too much Database Normalization can be a Bad Thing

November 19th, 2008 by Scott Selikoff

As someone with a long history in database optimization and who even did their Master’s project on Database normalization, I’m probably the last person in the world to argue against database normalization. From a theoretical standpoint, database normalization is a wonderful thing, helping to organize your data into easy-to-manage and understandable parts. For this article, I will play devil’s advocate and argue why too much normalization can be a bad thing.

Database

The years of working in the professional software industry has taught me the practical implications of a fully normalized database system: it’s often slow. It’s a non-trivial issue to design a good database schema that is both fully normalized and performs well on a variety of circumstances. I think back to some systems that I have seen in which a single common query joins dozens or more very large tables. Keep in mind, the more normalized your data is, the more joins that are required. For example, if you normalize a user’s address into a separate table (so the user can have a set of addresses) you have to join the user table with the address table every time you want to display their address. Then, you often have to join with a city, state, and country tables for full normalization.

Let’s take some of the advantages of database normalization and look at why they might not be so great:

Normalization saves space, but space is cheap!
The most obvious advantage of normalization is spacing saving. For example, instead of listing a “United States of America” for 10,000 records in a Users table, I can create a country table that lists the text once, then create a reference with an integer. Clearly, 10,000 integers take less space than 10,000 24-digit text fields. But these days, space is cheap! Terabyte drives are now common, so would normalizing some of the fields of a table really save much space? Probably not. I’m not saying denormalize all fields, but there’s some where the advantages to space are negligible.

Normalization simplifies updates, but reads are more common!
Another common reason for normalization is to simplify updates and reduce anomalies. For example, in the case of “United States of America” text in the Users table, its a lot easier to update/change the text in a single record than it is to update 10,000 records. But that brings up an interesting point, how often does the text “United States of America” change? I would argue almost never. There are examples where data does change more frequently, such as a user’s address, but its common knowledge in database systems that in most tables reads are far more frequent than writes. Therefore, if you have a table with relatively stable data that changes infrequently, normalization isn’t buying you a lot.

Performance, performance, Performance
Database administrators spend the bulk of their time worrying about performance, how to optimize queries and table structure, and in that regard normalization rarely helps. Those calls people tend to get at 2AM about the system crashing over a specific query? They often related to normalization in some way. If you have a case where you can eliminate a normalized table with minimal impact on functionality, it is better to do so. For example, in the case of address you might need users to be able to have multiple addresses (1-to-many relationship), therefore there’s no way to avoid normalizing the data into a separate table. But there are other cases such as with States and Countries, where constantly joining to two completely static tables is not helping. You can still have the tables present in order for a user to select a state or country from a drop-down list, but it may be better to save the text of the state or country in the user’s table, instead of a reference.

Conclusion and Next Article
In this article I played devil’s advocate arguing that too much normalization can be a bad thing. On the other end of the spectrum, too little normalization can also be a bad thing. The mark of a good database designer and software developer is the ability to find a good balance between the two that matches the database structure against the actual or expected use of the system. In the next article, I will discuss some of the tools available in most databases to help combat the performance issues of normalization such as indexes, triggers, and materialized views.

why do we violate law of demeter?

November 15th, 2008 by Jeanne Boyarsky

I was watching a Google Tech Talk on Dependency Injection.  Sixteen minutes in the speaker gave an interesting example of the Law of Demeter: If buying item for $25 in store, do you hand clerk $25 or give clerk your wallet and have him/her retrieve the $25.

This got me thinking about why we so frequently violate the Law of Demeter.  I can think of three reasons that come up frequently.

Familiarity with code/process

When we think about what needs to be done, we think “I’ll get out my wallet and get the money.”  This does translate literally to handing the clerk the $25. Since the wallet is an inanimate object, we think more about the wallet than the cash.

Wallet wallet = customer.getWallet();
Cash cash = wallet.getMoney(25);
customer.payCashier(cash);

If it were a person, we would think of the separate step.  If a six year old was buying something, he might think of the transaction as “ask mommy or daddy for $25 and then hand that to the clerk $25.”  This translates better to:

Cash cash = parent.getMoney(25);
customer.payCashier(cash);

In the real life scenario of an intermediary person, the requestor (child) never even rifles through the wallet.  (Granted in the real life scenario, the child doesn’t really ask the wallet - rather the person holding the wallet.)

Access data we shouldn’t

This is really a variant of being overly familiar with all the objects/data.  We tend to think about what we are doing rather than from the object’s point of view.

Can’t touch other code

To me this is the most interesting reason.  We have an object that we think really should have a new method but can’t touch that object.  One solution to this is to create a new object that holds the original and does contain the new method.  This solution is useful when the object belongs to a third party and really can’t be changed.  Sometimes it’s more of a mental block.

I recently came across a case where two numbers where compared via a string comparison (and some other logic) because the author didn’t want to touch the object that owned the data (and was exposing it via a string.)  After further discussion we concluded the owning object really needed a method that told us if this number was positive or negative.  And the complex logic was a “code smell” pointing us to that fact.

Personally, I find the reasons behind things to be more interesting than the actual instance.  It’s easy to say “don’t violate the Law of Demeter.”  Discussing why is plenty interesting (as in the Google Tech Talk.)  It’s thinking about what drives us to such things that helps actually prevent them.

the different levels of a question

November 1st, 2008 by Jeanne Boyarsky

People with different levels of experience approach tend to approach a question in different ways.  This is something I see frequently on JavaRanch.

An example

The ideas apply to many questions, but I’ll use “how do I disable the browser back button” as a common example.  Consider some answers to this:

1) In Internet Explorer, you do something the onunload event. This isn’t an ideal answer because it teaches people a hack as if it were the generally accepted approach.  It also doesn’t work universally. [While writing this, I learned this even is implemented in Firefox 3.  Hopefully that doesn't mean people will start using it inappropriately.]

2) You can’t. - Accurate, but not very helpful as it doesn’t give the person with a problem any ideas on where to go from here.

3) Explanation of why bad practice with follow up question asking what trying to accomplish. - This is better as it tells the person something new and allows for an alternate approach to be suggested.

Answer 1 is a very developer centric way of looking at things.  They literally answer the question.  As such, this tends to be an approach given by someone who has less experience in development or is accustomed to receiving and implementing detailed specs.

Answer 3 is a higher level type answer.  It hones right in on what the person is trying to accomplish.  (maybe not submitting a form again or not seeing stale data.)  This is the type of answer more likely to be given by a senior developer or architect.

You’ve undoubtedly noticed I skipped answer 2 here.  That one could apply to any level of experience.  It is a correct answer if taken literally.  It could just as easily be given by an architect in a hurry - it is factually correct but less helpful than a full answer would be.

A more specific example:

The reason I was thinking about this is I came across a question asking how to identify the first column in a Collection in a JSP.  As of right now, there are two answers in the thread. One is a literal answer involving scriptlets.  The other is a suggestion that maybe the model should be represented.

The interesting thing is that the literal answer better answers the actual question that was asked.  By contrast, my answer (the second one), speculates about what the underlying cause might be without actually touching on the question.  I don’t think this thread particular highlights different levels of experience as much as it does different ways of approaching a problem.

On some level I think this is related to Defending the Code.  But only in that it is about speaking up and finding out what is realy wanted.

what to do for nulls

October 17th, 2008 by Jeanne Boyarsky

A logical follow up to null checking in the extreme and Never return Null Arrays, is what we should be doing when a null does get returned.

The problem

Consider the following method:

public void method() {
StringBuilder builder = UnreliableCode.createBuilder();
if ( builder != null ) {
builder.append(“data”);
}

This time, a null check is appropriate because we can’t guarantee createBuilder() works as advertised.  Let’s assume that the method is either designed to return null on error or is just plain buggy.  Let’s also assume it is not owned by us so we can’t change it.

That leaves us with the question of what we should do if builder returns null. The code currently fails silently. Yuck! This means if createBuilder returns null, we just don’t append the data. Finding this bug relies on hunting it down based on the side effects. Letting Java throw the NullPointerException (or throwing our own more logical exception) based on it is preferable to failing silently. Or in this case, it might make sense to recover by creating a new StringBuilder ourselves.

Another example of handling nulls poorly is:

public void method() {
StringBuilder builder = UnreliableCode.createBuilder();
if ( builder == null ) {
System.out.println(“builder is null”);
} else {
builder.append(“data”);
}

Ok so we logged it.  Does that mean we don’t care and can proceed?

I once got into a debate with a former coworker about what should happen when a variable is unexpectedly null. The essence of his argument was that “code shouldn’t throw null pointers.” While this is true in some respects, it isn’t always the case. If you have a null pointer error condition, why should the code do something else to mask it?

What I think we should do

Contrary to the above, I don’t advocate throwing null pointers on purpose.  In this case, where I think it is likely to happen, I would want to handle it explicitly.  However, seeing code littered with null checks because “you never know when something might return null” seems worse to me then letting the NullPointer get thrown.  Especially if there is nothing logical to do in handling it.  When a null pointer really is unexpected is when I think it should be treated as such rather than masked.

Regardless, I think the user shouldn’t see the NullPointer or ArrayIndexOutOfBounds or any other unexpected RuntimeException.  This winds up being a matter of writing a top level handler.  For an applet/swing type app, it winds up being a matter of intercepting them to show a user message.  For a web app, it can be redirection to a custom error page.  For a command line app, it can be a text error message.  The key being that the user doesn’t care that there was a NullPointer.  He/she cares that “something went wrong.”

Never return Null Arrays!

October 15th, 2008 by Scott Selikoff

Continuing on Jeanne’s theme of nulls, its a pet peeve of mine when I come across code that returns null arrays instead of empty arrays. The purpose of this post is to discuss some of the reasons why its a good practice to return empty arrays over null arrays, including Collection objects or typed array.

Null Pointer Exception

Consider the reusability of the following code:

public List getItems() {
   …
   // There are no items, return null
   return null;
}

Let’s say you want to iterate on the results, the following code would throw a NullPointerException if used in conjunction with the code above:

List items = getItems();
for(int i=0; i<items.size(); i++) {
   // If there are no items, this code will throw a null pointer
}

Iterating on an array is one of the most commons practices in Java, so returning null instead of an empty array means the person is going to have to do extra null checks when they really don’t need to. In this situation, an empty array would suffice and would not produce any errors.

To summarize, it makes the code shorter, easier to read, and less likely to throw a NullPointerException. Also, there is some confusion with returning null since you may (or may not) be saying an empty array is the same thing as null. For example, the following would be confusing logic:

List items = getItems();
if(items == null) {
   // If null do one thing
} else if(items.size==0) {
   // If empty do another?
}

In this situation its not clear why you might act differently on the two return values and this can lead to confusing, ambiguous code.

The only time I might consider returning null valid is if there was an error of some kind, but then you should be throwing an exception (something more programmers are hesitant to do). Think of it like this, would you rather read a stack trace with a detailed error what went wrong in the code created by you, or would you rather see NullPointerException and wonder what of a dozen objects might have been null?