The Joy of Null

Often in the database world, you do not have all the information needed to create a record. For example, you may have a person’s full name but not their middle name or initial, or you might be missing their date of birth. In such cases, the recommended solution is to fill that field with a special database value referred to as Null. Over the years and for a variety of reasons, I have noticed some database designers and software developers have invented their own equivalent of Null. This article is about how this came about and why the practice should be discontinued.

I’m inventing a new term today called null-equivalent value. A null-equivalent value is a value in a system that is not actually null, but meant to imply null. The most common null-equivalent value is a blank string.

Null

Part 1: What happened to null?

Oftentimes, database designers will (perhaps incorrectly) mark a column as “Not Null”, which translates to: this field is absolutely required for all records and we forbid anyone from inserting a record without this value populated. Some time later, software developers will then build an application on top of the database and realize they are missing some of the ‘required’ information needed to create a record. At this point the developer has one of three choices:

  • 1. Acquire the missing information before inserting the record
  • 2. Modify the database design to remove the “Not Null” attribute on the field
  • 3. Insert a null-equivalent value such as a blank string

Clearly, #1 is the best solution since it was the intention of the database designer that this field be populated, but, as I alluded to, that decision may have been incorrect. It may be that the field is not required, and creating records without this field make sense within the data model. In that case, the developer should consult a database designer and go with solution #2 – to remove the “not null” attribute from the record. Often, though, developers are prohibited from making changes to the database and the process of updating the database is long, vast, and sometimes risky. So the vast majority of developers will go with the ‘quick fix’ of #3, and insert a null-equivalent value since it’s the least work in the short term.

While this is the most common reason null-equivalent values are inserted into the database, it’s by far not the only reason. The second driving force results from a poor mix between the presentation layer and data layer. Assume a developer has the freedom to insert null values into a column, ie, the column is nullable. They may choose to instead insert a null-equivalent value such as a blank strings for a variety of reasons. For example, if a developer outputs the contents of the database directly to the screen, they often prefer to use empty strings rather than nulls since it’s a much more lazy approach – they do not need to convert the null values to empty strings, and can display them directly.

I’ve mentioned two of the main driving forces for null-equivalent values, but there others. I use the term null-equivalent instead of blank strings because developers will sometimes insert “special values” meant to imply null as well as have a secondary meaning such as “Lost Data”, “Missing”, or, in one case, “Null” (yes, these are real examples). In many of these cases, the data model should have been updated to include a boolean column, such as “Lost Data = true”, to explain the reason why the original field is blank. But many developers are nothing if not lazy, and the time required to make a database change, as I mentioned, is great. So, they will overload a column with a special single value, used to imply multiple values, all of which should have been fleshed out in a better database design.

The next article will continue this discussion and consider reasons why null-equivalent values are harmful to the database. In other words: TO BE CONTINUED.

8 thoughts on “The Joy of Null

  1. The problem with allowing null fields in a tables (at least in MySQL) is that it leads to non-static-length tables; which can offer a large performance hit on i/o-intensive operations. Using null-like characters instead of allowing null fields and taking a performance hit is perfectly fine when A) your programmer(s) is expecting this, and/or B) when you have i/o intensive DB operations going on.

    p.s. F’ your comment box. I had to rewrite this because I didn’t want to put an email/url in. The irony of not allowing nulls in this comment box section on this article is bittersweet.

  2. Perhaps in MySQL it is problematic, MySQL’s performance is among the weaker of the larger DBMS’s. But inserting null-equivalent versus null values is more about the data model than performance. I’ve never heard anyard anyone claim they inserted an empty string on purpose because of performance reasons. Also, keep in mind if the field is VARCHAR, it could grow in size just as easily as null fields, granted that’s more of a database-specific implementation issue to me.

  3. Just one question: how do you detect that the user hasn’t filled in his middle name (null) or he has a blank middle name (”) ?
    Generally speaking, when you have no flip switch to tell if a value is filled in or not, you never get nulls in the nullable columns, right?

  4. A user entering a blank value isn’t the same thing as the system inserting blank values. For example, if a user enters 2 spaces for a name, do you cut them off? This is far more dependent on your specific application model than anything that could be generalized. This article is more about developers writing code that actively inserting blank strings when it sees null or missing values.

  5. I believe for Oracle you don’t really get a choice, since empty strings (“”) are coerced to null.

    We had to create a “null-equivalent value” because our webapp would store empty strings for empty text fields, and then repopulate them with “(null)”.

  6. Most of our java based web apps are dependent on iSeries database which is maintained in a case tool called synon. The resulting dds(data definition specs) can only support 3 datatypes
    char,signed numeric and unsigned numeric. So a null in char(9) is nine blanks, a null in numeric is 0 etc.

  7. Codd actually proposed separating null into a couple different types based on exactly this distinction. You’ll find it in some of his later writing on databases.

    Of course, the major databases today don’t even implement the relational algebra in any meaningful way, so…

Leave a Reply

Your email address will not be published. Required fields are marked *