Where’s your database’s ER Diagram?

I was recently training a new software developer, explaining the joys of three-tier architecture and the importance of the proper black-box encapsulation, when the subject switched to database design and ER diagrams. For those unfamiliar with the subject, entity-relationship diagrams, or ER diagrams for short, are a visual technique for modelling entities, aka tables in relational databases, and the relationships between the entities, such as foreign key constraints, 1-to-many relationships, etc. Below is a sample of such a diagram.


I. The Theory

Like many enterprise technologies, ER diagrams can be a bit of an overkill in single-developer projects, but come in handy as soon as you need to explain your design decisions to a room full of people. Since a software application is only as flexible as its underlying database, ER diagrams help define the initial set of business rules for how people will be able to interact with the system. As a software development practice, they are often encouraged, but in medium to large companies, they may be absolutely required. There are enough tools now to create ER diagrams quickly and easily, many of which will generate SQL database creation statements for a variety of platforms directly.

II. The Practice

Due to tight time constraints and ever-expanding scope creep, I find most developers skip creating or maintaining ER diagrams whenever the opportunity arises. One telling example of this is a developer who creates a diagram, starts building the application, and realizes their initial diagram was completely flawed. Given that they are now behind schedule because they made mistakes in modelling the data, they do not have time to go back an update the model, and their ER diagram becomes a distant memory compared to the final database. Most managers would rather see finished software products than accurate diagrams, although will take both if offered.

ER diagrams are incredibly useful in the early stages of designing a new application, but as an experienced software developer, I spend more time enhancing and maintaining databases than I do creating them from scratch. Furthermore, it can be difficult to create an ER diagram for an existing database, especially if you were not involved in its creation. Even when companies do maintain ER diagrams, they tend to be months, often years, out of date, as it can be difficult to motivate each and every software developer to update the database documentation after making a change.

III. Where’s your database’s ER Diagram?

What about your software application? Is there now or was there ever an ER diagram for your company’s database? Is it 100% accurate to the current production database? I’d like to hear from other developers to find out if people are diligently maintaining ER diagrams, or if it is really a common practice to let them fall by the wayside after a database is established.

sql query optimization – why temp tables can help

Last month, I migrated the jforum.net forum data to a coderanch jforum forum.  I had a requirement/goal to update the links in our forum so they work rather than point to the jforum broken links.

First I created the mapping.  (lesson: store this data as you migrate so you don’t have to do it later.)  I wound up mapping on subject lines and dates.  Luckily the threads were migrated in numeric order so I could fill in the gaps.  But that wasn’t interesting enough to blog about.  What was interesting enough to blog about was the SQL queries to update the database.

My goal

I wanted to make the changes entirely through the database – no Java code.  I also wanted to avoid postgres stored procedures because I encountered some time sinks last time I wrote a postgres stored procedure.  I am happy to say I achieved my goal.

Step 1 – Analysis

Noted that I need to update 375 rows.  Too many to do by hand.  There are just under 5000 posts in the jforum forum.  Which means therea rea t most 5000 search/replace strings to check.  This doesn’t seem bad for a computer.  Once I know the SQL, I can write code to generate 5000 of them using my local mappings and then run the SQL script on the server.

select * from jforum_posts_text where post_text like '%http://www.jforum.net/posts/list/%'

Step 2 – Horrible performing but functional query

It’s got to work before you can tune it.  My first attempt was:

explain update jforum_posts_text set post_text=replace(post_text, 'http://www.jforum.net/posts/list/5.page','http://www.coderanch.com/t/574339 ')  where post_text like '%http://www.jforum.net/posts/list/5.page%';

The query plan was:

Seq Scan on jforum_posts_text (cost=0.00..132971.82 rows=1 width=459) 
Filter: (post_text ~~ '%http://www.jforum.net/posts/list/5.page%'::text)

5000 of those is going to take over an hour of hitting the database hard. Not good. I could run it over the weekend when volumes are need be, but I can do better than that.

Step 3 – Trying to use an index

I know that most (95% maybe) of the updates are in the JForum forum.  We had a few “legacy” links to jforum.net in other forums, but not a lot.  I then tried adding a condition on forum id

explain update jforum_posts_text set post_text=replace(post_text, 'http://www.jforum.net/posts/list/5.page','http://www.coderanch.com/t/574339 ')  where post_text like '%http://www.jforum.net/posts/list/5.page%' and post_id in (select post_id from jforum_posts where forum_id = 95);

The query plan was:

Nested Loop (cost=22133.48..97281.40 rows=1 width=459)
HashAggregate (cost=22133.48..22263.46 rows=12998 width=4)
Bitmap Heap Scan on jforum_posts (cost=245.18..22100.99 rows=12998 width=4)
Recheck Cond: (forum_id = 95)
Bitmap Index Scan on idx_posts_forum (cost=0.00..241.93 rows=12998 width=0)
Index Cond: (forum_id = 95)
Index Scan using jforum_posts_text_pkey on jforum_posts_text (cost=0.00..5.76 rows=1 width=459)
Index Cond: (jforum_posts_text.post_id = jforum_posts.post_id)
Filter: (jforum_posts_text.post_text ~~ '%http://www.jforum.net/posts/list/5.page%'::text)

Well, it is using the indexes now.  But it is only about a 30% drop in cost for the worst case. On an untuned complex query, I usually see at least an order of magnitude performance jump on my initial tuning.

Step 4 – time for a temp table

Most of the work is finding the 375 rows that need updating in a table with 1,793,111 rows.  And it has to happen for each of the 5000 times I run the query.

I decided to use a temporary table so I could run the expensive part once.

create table jeanne_test as select * from jforum_posts_text where post_text like '%http://www.jforum.net/posts/list/%';

explain update jforum_posts_text set post_text=replace(post_text, 'http://www.jforum.net/posts/list/5.page','http://www.coderanch.com/t/574339 ')  where post_id in (select post_id from jeanne_test where post_text like '%http://www.jforum.net/posts/list/5.page%');

Now I’m doing the expensive part once.  It still takes a couple seconds to do the first part.  But the second part – the update I’m running 5000 times – drops the query plan to

Nested Loop (cost=13.50..21.99 rows=1 width=459)
HashAggregate (cost=13.50..13.51 rows=1 width=4)
Seq Scan on jeanne_test (cost=0.00..13.50 rows=1 width=4)
Filter: (post_text ~~ '%http://www.jforum.net/posts/list/5.page%'::text)
Index Scan using jforum_posts_text_pkey on jforum_posts_text (cost=0.00..8.46 rows=1 width=459)
Index Cond: (jforum_posts_text.post_id = jeanne_test.post_id

Nice.  Running the script with the 5000 update statements only took a few seconds.

Conclusion

Database tuning is fun.  Explain is your friend.  As are different approaches.  And for those who aren’t doing match, the performance jump was 3-4 orders of magnitude.

jeanne’s ocpjp java programmer II experiences

About three weeks ago, I took the OCPJP Java Programmer II beta exam.  If I pass, I become a Oracle Certified Professional (OCP) Java SE 7 Programmer.  Hmm.  If you are already an OCM (Oracle Certified Master) from the OCMEA, is that a step backwards?  Just kidding.  It’s still cool.

I don’t know how I did because I took the beta version of the exam.  I’ll edit this post and leave a comment in a few months when I do know.  (My guess is I got between 70 and 80% though.  Since the scoring of a beta depends on which questions get chosen for the real exam, my score could be a lot lower or higher.  Probably lower – I can’t imagine they’d choose all the easy questions!)

 

Edit: I got a 78%.  I can’t believe how close my estimate was!

If you took the SCJP in the “good old days” (Java 6) or earlier, you only had to take one exam to become certified.  Now you need two.  I took the OCAJP a little over a week after the OCPJP.  Advice: take the exams in the proper order.  The OCAJP is a lot easier; it will build confidence before the harder exam.

Deciding to take the test

I wrote about this on my OCAJP post.  I scheduled the exam for the last week of the beta.  (Which got extended another month after that.)  One interesting factor is that I decided to take the exam 20 days before the date I scheduled the exam.  I went in figuring I would see how well I did given a time constraint.  Which means I could have gotten a higher score had I studied more.  I do recommend you actually feel fully prepared before taking the exam so you don’t waste $300.  (The beta is only $50 so the economics work out differently.)

How were the resources I tried

  1. K&B version 5 – Granted you’d buy version 6 at this point.  (or the OCA/OCP 7 version once it is out).  Of course, get the latest version available, but this is a great book.  Covers all sorts of exam trickery, how to think about exam questions and tough questions at the end of each chapter.  It’s worth buying the version 6 book even for the 7 exam because you learn over half the material and get a feel for what types of things to study for the rest.  I have no doubt that once the version 7 book is out, it will be awesome as well.
  2. K&B SCJP 6 mock exam book – Same deal.  The extra questions helped even though some were out of scope.  I routinely got between 35 ad 45 (out of 60) questions right in a mock.  I didn’t spend a long time on the questions or treat it like a real exam though.  One note: the authors warn that the more times you see the same questions the better you do since you remember the questions.  I didn’t have this experience.  I did roughly as poorly each time I saw the questions.  I think it is because I don’t try to remember the answers and the questions all look similar.  As does the real exam for that matter.
  3. Anki – The act of creating and reviewing flashcards is what helped here and is absolutely critical to learning facts.  The time interval learning helped me optimize my time with the flashcards and manage the fact that I have 200 of them.  I’ll be writing a separate blog post on Anki and will link to it from here when it is up.
  4. epracticelabs – Deep breath.  What to say here.  It is Windows only which meant I was using my old computer.  I bought version 1.2 of the Java 7 upgrade to get some exposure to Java 7 questions since my study materials were so heavily weighted to the SCJP 5 and 6.  The “book” part wasn’t bad.  I didn’t find much value since I had already studied by the time I got there.  The mock exams were of poor quality.  I found errors in the answers, there were redundant questions and there were typos in questions.  I complained and received a refund.  I then tried version 1.3 which added another mock exam and fixed various quality issues.  Which it did.  I didn’t notice any typos in the new mock.  I did find two mock exam questions on JDBC with incorrect answers.  (I didn’t check the other topics as I like JDBC best.)  Some questions are good and others are like “which API copies a file: copy, file or move” or “is JDBC used for database code”.  There is heavier testing on API knowledge than concepts but you still need to know APIs for the exam.  Don’t rely on epracticelabs for the majority of your reviewing.  It’s an ok tool but has gaps.  In particular, it still misses a couple significant areas from the exam.  While the quality is improved from 1.2, it still has a ways to go.  I do believe they are working on it.  [Note: Since I had heard good things about epracticelabs from others I asked to try a different product in hopes this one was an anomaly.  It was.  The SCEA part 1 product was excellent content and quality.  And the CEO of epracticelabs acknowledges the quality issue in the Java 7 product and has a plan to deal with that.]
  5. BlackBeltFactory – Some helped me learn exam material (Java 7 features and SCJP mock) and others were just interesting (NIO covers pre-Java 7 NIO)
  6. Oracle tutorials (Programmer II and Upgrade 7) – there’s a lot of non-exam material to read through.  For the Java 7 parts it helped more since it wasn’t known well what the scope of the exam was.  For the “classic” material it was more effective to use the Sierra & Bates books.
  7. Well Grounded Java Developer – While it was a great review of the Java 7 features, it turned out the ones on the exam were a small percentage of what was in those chapters of the book.  Great book not a good study tool.
  8. Java 7 New Features Cookbook – Almost the same deal as the Well Grounded Java Developer.  I say almost because each recipe has a “there’s more” section which covers tricky things.  And the exam likes tricky things.  Until the actual OCPJP 7 study guides come out, this isn’t a bad way of collecting some gotchas.
If you only have one resource, I’d choose the Sierra & Bates study guide.  If you have two, I’d add the Sierra & Bates mock exam book.

My impressions of the exam

  • I don’t know if I passed, but I think it was a good exam.  I didn’t feel like I was being tested on a ton of obscure things nobody would care about.  Still tricky, but didn’t feel as obscure and felt more real.
  • Since it was a beta, there were a lot of extra questions.  Which meant there was some duplication.  Ironically this didn’t help answer the questions as the exam creators were consistent in how they asked the things I didn’t know!
  • I really liked the JDBC questions.
  • While the OCPJP and OCAJP have different objectives, the OCPJP is essentially cumulative.
  • I’m glad I finally took it.

My study plan

As an added bonus to only having three weeks to study for the exam, I was also traveling for a few days during that period.  Luckily I was somewhat familiar with the content and had already read the Java 5/6 parts of the exam materials so it was more a matter of review.

I’m listing out how I studying day by day so you can see the quantity that comes with rushing.  And that I still allowed myself to do other big tech activities.  It also shows how I tried to follow the K&B study guide suggestion to study at least 15 minutes each day.

  1. Thursday – Registered for the beta.  Printed the objectives for the OCJP 7 exam.
  2. Friday (off from work this day) – Re-read chapters 1-3 of K&B SCJP 5 study guide.
  3. Saturday – Researched and downloaded Anki – a timed flashcard program.  Re-read chapters 4-6 of K&B SCJP 5 study guide.  Did a couple of easy BlackBeltFactory exams to get some pre-reqs for their SCJP mock out of the way.
  4. Sunday – Re-read chapters 7-8 of K&B SCJP 5 study guide.  Did the handful of OCPJP sample questions on Oracle’s website.  Did the K&B master exam on CD. Also spent 6 hours migrating jforum data.
  5. Monday – Re-read chapter 9 of K&B SCJP 5 study guide.  Created iPad flashcards for Anki.
  6. Tuesday – Re-read chapter 10 of K&B SCJP 5 study guide.  Created iPad flashcards for Anki.
  7. Wednesday – On flight, took the two short mock exams in K&B SCJP 6 mock exam book.  Also took mock exam #3 but didn’t review answers.  Reviewed 60 questions from Anki.  From this point forward, I went through all the cards Anki suggested every day.  (On the flight back, I slept so I’ll leave that out here.)
  8. Thursday – Reviewed mock exam #3.  Went through SCJP Coach questions on iPad app.
  9. Friday – Started mock exam #4.  Went through more questions on SCJP Coach
  10. Saturday – 15 minutes of flashcards (bare minimum of 15 minute rule)
  11. Sunday – finished mock exam #4
  12. Monday – Read the 2 minute drills from chapters 1-4 in K&B SCJP 5 study guide (they take a lot longer than 2 minutes)
  13. Tuesday – Took BlackBeltFactory advanced Java exams that were pre-reqs to SCJP one.  Make notes and flashcards for the Java classes added on the SCJP 6.  (just NavigableMap, NavigableSet and Console.)
  14. Wednesday – Read the 2 minute drills from chapters 5-7  in K&B SCJP 5 study guide
  15. Thursday – Read the 2 minute drills from chapters 8-10  in K&B SCJP 5 study guide [If you are paying attention, I’m not a week to the exam and haven’t done one iota of study on the Java 7 material which is a substantial part of the OCPJP 7 exam.]
  16. Friday – Skimmed Oracle tutorials on Programmer II and read those on Upgrade 7 topics.
  17. Saturday – mock exam #5.  More BlackBeltFactory exams.
  18. Sunday – mock exam #5.  More BlackBeltFactory exams.  Reread Well Grounded Java Developer chapters about new Java 7 features including NIO2 and concurrency.  Tried epracticelabs java 7 upgrade exam.
  19. Monday – Re-read Java 7 New Features Cookbook parts on exam scope items.
  20. Tuesday – Finish Java 7 New Features Cookbook parts on exam scope items.