sql query optimization – why temp tables can help

Last month, I migrated the jforum.net forum data to a coderanch jforum forum.  I had a requirement/goal to update the links in our forum so they work rather than point to the jforum broken links.

First I created the mapping.  (lesson: store this data as you migrate so you don’t have to do it later.)  I wound up mapping on subject lines and dates.  Luckily the threads were migrated in numeric order so I could fill in the gaps.  But that wasn’t interesting enough to blog about.  What was interesting enough to blog about was the SQL queries to update the database.

My goal

I wanted to make the changes entirely through the database – no Java code.  I also wanted to avoid postgres stored procedures because I encountered some time sinks last time I wrote a postgres stored procedure.  I am happy to say I achieved my goal.

Step 1 – Analysis

Noted that I need to update 375 rows.  Too many to do by hand.  There are just under 5000 posts in the jforum forum.  Which means therea rea t most 5000 search/replace strings to check.  This doesn’t seem bad for a computer.  Once I know the SQL, I can write code to generate 5000 of them using my local mappings and then run the SQL script on the server.

select * from jforum_posts_text where post_text like '%http://www.jforum.net/posts/list/%'

Step 2 – Horrible performing but functional query

It’s got to work before you can tune it.  My first attempt was:

explain update jforum_posts_text set post_text=replace(post_text, 'http://www.jforum.net/posts/list/5.page','http://www.coderanch.com/t/574339 ')  where post_text like '%http://www.jforum.net/posts/list/5.page%';

The query plan was:

Seq Scan on jforum_posts_text (cost=0.00..132971.82 rows=1 width=459) 
Filter: (post_text ~~ '%http://www.jforum.net/posts/list/5.page%'::text)

5000 of those is going to take over an hour of hitting the database hard. Not good. I could run it over the weekend when volumes are need be, but I can do better than that.

Step 3 – Trying to use an index

I know that most (95% maybe) of the updates are in the JForum forum.  We had a few “legacy” links to jforum.net in other forums, but not a lot.  I then tried adding a condition on forum id

explain update jforum_posts_text set post_text=replace(post_text, 'http://www.jforum.net/posts/list/5.page','http://www.coderanch.com/t/574339 ')  where post_text like '%http://www.jforum.net/posts/list/5.page%' and post_id in (select post_id from jforum_posts where forum_id = 95);

The query plan was:

Nested Loop (cost=22133.48..97281.40 rows=1 width=459)
HashAggregate (cost=22133.48..22263.46 rows=12998 width=4)
Bitmap Heap Scan on jforum_posts (cost=245.18..22100.99 rows=12998 width=4)
Recheck Cond: (forum_id = 95)
Bitmap Index Scan on idx_posts_forum (cost=0.00..241.93 rows=12998 width=0)
Index Cond: (forum_id = 95)
Index Scan using jforum_posts_text_pkey on jforum_posts_text (cost=0.00..5.76 rows=1 width=459)
Index Cond: (jforum_posts_text.post_id = jforum_posts.post_id)
Filter: (jforum_posts_text.post_text ~~ '%http://www.jforum.net/posts/list/5.page%'::text)

Well, it is using the indexes now.  But it is only about a 30% drop in cost for the worst case. On an untuned complex query, I usually see at least an order of magnitude performance jump on my initial tuning.

Step 4 – time for a temp table

Most of the work is finding the 375 rows that need updating in a table with 1,793,111 rows.  And it has to happen for each of the 5000 times I run the query.

I decided to use a temporary table so I could run the expensive part once.

create table jeanne_test as select * from jforum_posts_text where post_text like '%http://www.jforum.net/posts/list/%';

explain update jforum_posts_text set post_text=replace(post_text, 'http://www.jforum.net/posts/list/5.page','http://www.coderanch.com/t/574339 ')  where post_id in (select post_id from jeanne_test where post_text like '%http://www.jforum.net/posts/list/5.page%');

Now I’m doing the expensive part once.  It still takes a couple seconds to do the first part.  But the second part – the update I’m running 5000 times – drops the query plan to

Nested Loop (cost=13.50..21.99 rows=1 width=459)
HashAggregate (cost=13.50..13.51 rows=1 width=4)
Seq Scan on jeanne_test (cost=0.00..13.50 rows=1 width=4)
Filter: (post_text ~~ '%http://www.jforum.net/posts/list/5.page%'::text)
Index Scan using jforum_posts_text_pkey on jforum_posts_text (cost=0.00..8.46 rows=1 width=459)
Index Cond: (jforum_posts_text.post_id = jeanne_test.post_id

Nice.  Running the script with the 5000 update statements only took a few seconds.

Conclusion

Database tuning is fun.  Explain is your friend.  As are different approaches.  And for those who aren’t doing match, the performance jump was 3-4 orders of magnitude.

jeanne’s ocpjp java programmer II experiences

About three weeks ago, I took the OCPJP Java Programmer II beta exam.  If I pass, I become a Oracle Certified Professional (OCP) Java SE 7 Programmer.  Hmm.  If you are already an OCM (Oracle Certified Master) from the OCMEA, is that a step backwards?  Just kidding.  It’s still cool.

I don’t know how I did because I took the beta version of the exam.  I’ll edit this post and leave a comment in a few months when I do know.  (My guess is I got between 70 and 80% though.  Since the scoring of a beta depends on which questions get chosen for the real exam, my score could be a lot lower or higher.  Probably lower – I can’t imagine they’d choose all the easy questions!)

 

Edit: I got a 78%.  I can’t believe how close my estimate was!

If you took the SCJP in the “good old days” (Java 6) or earlier, you only had to take one exam to become certified.  Now you need two.  I took the OCAJP a little over a week after the OCPJP.  Advice: take the exams in the proper order.  The OCAJP is a lot easier; it will build confidence before the harder exam.

Deciding to take the test

I wrote about this on my OCAJP post.  I scheduled the exam for the last week of the beta.  (Which got extended another month after that.)  One interesting factor is that I decided to take the exam 20 days before the date I scheduled the exam.  I went in figuring I would see how well I did given a time constraint.  Which means I could have gotten a higher score had I studied more.  I do recommend you actually feel fully prepared before taking the exam so you don’t waste $300.  (The beta is only $50 so the economics work out differently.)

How were the resources I tried

  1. K&B version 5 – Granted you’d buy version 6 at this point.  (or the OCA/OCP 7 version once it is out).  Of course, get the latest version available, but this is a great book.  Covers all sorts of exam trickery, how to think about exam questions and tough questions at the end of each chapter.  It’s worth buying the version 6 book even for the 7 exam because you learn over half the material and get a feel for what types of things to study for the rest.  I have no doubt that once the version 7 book is out, it will be awesome as well.
  2. K&B SCJP 6 mock exam book – Same deal.  The extra questions helped even though some were out of scope.  I routinely got between 35 ad 45 (out of 60) questions right in a mock.  I didn’t spend a long time on the questions or treat it like a real exam though.  One note: the authors warn that the more times you see the same questions the better you do since you remember the questions.  I didn’t have this experience.  I did roughly as poorly each time I saw the questions.  I think it is because I don’t try to remember the answers and the questions all look similar.  As does the real exam for that matter.
  3. Anki – The act of creating and reviewing flashcards is what helped here and is absolutely critical to learning facts.  The time interval learning helped me optimize my time with the flashcards and manage the fact that I have 200 of them.  I’ll be writing a separate blog post on Anki and will link to it from here when it is up.
  4. epracticelabs – Deep breath.  What to say here.  It is Windows only which meant I was using my old computer.  I bought version 1.2 of the Java 7 upgrade to get some exposure to Java 7 questions since my study materials were so heavily weighted to the SCJP 5 and 6.  The “book” part wasn’t bad.  I didn’t find much value since I had already studied by the time I got there.  The mock exams were of poor quality.  I found errors in the answers, there were redundant questions and there were typos in questions.  I complained and received a refund.  I then tried version 1.3 which added another mock exam and fixed various quality issues.  Which it did.  I didn’t notice any typos in the new mock.  I did find two mock exam questions on JDBC with incorrect answers.  (I didn’t check the other topics as I like JDBC best.)  Some questions are good and others are like “which API copies a file: copy, file or move” or “is JDBC used for database code”.  There is heavier testing on API knowledge than concepts but you still need to know APIs for the exam.  Don’t rely on epracticelabs for the majority of your reviewing.  It’s an ok tool but has gaps.  In particular, it still misses a couple significant areas from the exam.  While the quality is improved from 1.2, it still has a ways to go.  I do believe they are working on it.  [Note: Since I had heard good things about epracticelabs from others I asked to try a different product in hopes this one was an anomaly.  It was.  The SCEA part 1 product was excellent content and quality.  And the CEO of epracticelabs acknowledges the quality issue in the Java 7 product and has a plan to deal with that.]
  5. BlackBeltFactory – Some helped me learn exam material (Java 7 features and SCJP mock) and others were just interesting (NIO covers pre-Java 7 NIO)
  6. Oracle tutorials (Programmer II and Upgrade 7) – there’s a lot of non-exam material to read through.  For the Java 7 parts it helped more since it wasn’t known well what the scope of the exam was.  For the “classic” material it was more effective to use the Sierra & Bates books.
  7. Well Grounded Java Developer – While it was a great review of the Java 7 features, it turned out the ones on the exam were a small percentage of what was in those chapters of the book.  Great book not a good study tool.
  8. Java 7 New Features Cookbook – Almost the same deal as the Well Grounded Java Developer.  I say almost because each recipe has a “there’s more” section which covers tricky things.  And the exam likes tricky things.  Until the actual OCPJP 7 study guides come out, this isn’t a bad way of collecting some gotchas.
If you only have one resource, I’d choose the Sierra & Bates study guide.  If you have two, I’d add the Sierra & Bates mock exam book.

My impressions of the exam

  • I don’t know if I passed, but I think it was a good exam.  I didn’t feel like I was being tested on a ton of obscure things nobody would care about.  Still tricky, but didn’t feel as obscure and felt more real.
  • Since it was a beta, there were a lot of extra questions.  Which meant there was some duplication.  Ironically this didn’t help answer the questions as the exam creators were consistent in how they asked the things I didn’t know!
  • I really liked the JDBC questions.
  • While the OCPJP and OCAJP have different objectives, the OCPJP is essentially cumulative.
  • I’m glad I finally took it.

My study plan

As an added bonus to only having three weeks to study for the exam, I was also traveling for a few days during that period.  Luckily I was somewhat familiar with the content and had already read the Java 5/6 parts of the exam materials so it was more a matter of review.

I’m listing out how I studying day by day so you can see the quantity that comes with rushing.  And that I still allowed myself to do other big tech activities.  It also shows how I tried to follow the K&B study guide suggestion to study at least 15 minutes each day.

  1. Thursday – Registered for the beta.  Printed the objectives for the OCJP 7 exam.
  2. Friday (off from work this day) – Re-read chapters 1-3 of K&B SCJP 5 study guide.
  3. Saturday – Researched and downloaded Anki – a timed flashcard program.  Re-read chapters 4-6 of K&B SCJP 5 study guide.  Did a couple of easy BlackBeltFactory exams to get some pre-reqs for their SCJP mock out of the way.
  4. Sunday – Re-read chapters 7-8 of K&B SCJP 5 study guide.  Did the handful of OCPJP sample questions on Oracle’s website.  Did the K&B master exam on CD. Also spent 6 hours migrating jforum data.
  5. Monday – Re-read chapter 9 of K&B SCJP 5 study guide.  Created iPad flashcards for Anki.
  6. Tuesday – Re-read chapter 10 of K&B SCJP 5 study guide.  Created iPad flashcards for Anki.
  7. Wednesday – On flight, took the two short mock exams in K&B SCJP 6 mock exam book.  Also took mock exam #3 but didn’t review answers.  Reviewed 60 questions from Anki.  From this point forward, I went through all the cards Anki suggested every day.  (On the flight back, I slept so I’ll leave that out here.)
  8. Thursday – Reviewed mock exam #3.  Went through SCJP Coach questions on iPad app.
  9. Friday – Started mock exam #4.  Went through more questions on SCJP Coach
  10. Saturday – 15 minutes of flashcards (bare minimum of 15 minute rule)
  11. Sunday – finished mock exam #4
  12. Monday – Read the 2 minute drills from chapters 1-4 in K&B SCJP 5 study guide (they take a lot longer than 2 minutes)
  13. Tuesday – Took BlackBeltFactory advanced Java exams that were pre-reqs to SCJP one.  Make notes and flashcards for the Java classes added on the SCJP 6.  (just NavigableMap, NavigableSet and Console.)
  14. Wednesday – Read the 2 minute drills from chapters 5-7  in K&B SCJP 5 study guide
  15. Thursday – Read the 2 minute drills from chapters 8-10  in K&B SCJP 5 study guide [If you are paying attention, I’m not a week to the exam and haven’t done one iota of study on the Java 7 material which is a substantial part of the OCPJP 7 exam.]
  16. Friday – Skimmed Oracle tutorials on Programmer II and read those on Upgrade 7 topics.
  17. Saturday – mock exam #5.  More BlackBeltFactory exams.
  18. Sunday – mock exam #5.  More BlackBeltFactory exams.  Reread Well Grounded Java Developer chapters about new Java 7 features including NIO2 and concurrency.  Tried epracticelabs java 7 upgrade exam.
  19. Monday – Re-read Java 7 New Features Cookbook parts on exam scope items.
  20. Tuesday – Finish Java 7 New Features Cookbook parts on exam scope items.

jeanne’s oca/ocajp java programmer I experiences

Today I passed the Java SE 7 Programmer I with a score of 98% which makes me a “Oracle Certified Associate, Java SE 7 Programmer“. (see other post for Java 8 exam)

Deciding to take the test

If you’ve been following this blog, you may know that I’ve been a Java developer.  (tech lead and architect count if you still code <smile>).  Last year I took the SCEA and Core Spring certs.  So why go back and take the “entry level cert.”  I’ve thought about taking the SCJP twice.   My thought process to finally getting here:

  1. Bert Bates mailed me a free copy of K&B version 5 so I could help spot plagiarism of questions on BlackBeltFactory years ago.  I read the book then.  I learned a lot.  I scored poorly on the sample questions because I didn’t take any time to memorize the  relevant information.  And I didn’t care to spend the time learning obscure details.
  2. In 2009, there were rumors of the Programmer Plus exam.  If that exam existed, I wanted to take the beta of it because it sounded *so cool.*  I re-read K&B version 5 and actually studied APIs.  The programmer plus never came to be and I wasn’t motivated to take the traditional SCJP.
  3. A year or two ago, I was one of the cadre of JavaRanch moderators providing technical review of the K&B SCJP 6 mock exam book.  Reviewing that book actually made me less likely to want to take the test because I read it questions about obscure details.  (It did make me interested in doing technical reviews of future books though.  I actually did one for a yet to be printed Manning book – The Well Grounded Java Developer.)
  4. When I noticed the Java 7 OCJPJP had JDBC and NIO sections, that got me interested again.  JDBC is one of my favorite topics and I moderated the CodeRanch JDBC forum for many years with Scott Selikoff.  The beta of part 2 was only $50 so I decided to take it.  I decided to take it on 4/19 and took the beta on 5/9.  I’ll be blogging about this separately in the next week or two.  While I don’t have the score yet, I think I either passed or came really close.  If I fail, I’ll pay full price to take it again.  I’m never going to be more prepared for it than I am right now and don’t want to learn obscure details again!  Once I realized this, I needed to take part 1 of the Java Programmer exam to actually get certified.

My study plan

In all fairness, most of the studying went on before I decided to take either exam.  Both by reading/reviewing the SCJP books and by just picking things up over the years.  I had a week and a day between part 2 of the exam and this part (part 1).  For two of those days, I didn’t do anything at all.    What I did the rest of the time:

  • re-read chapters 1-5 along with parts of chapters 6, 7 and 10 of K&B SCJP which Bert recommended
  • take all 6 mock exams from Enthuware JA +V7

How were the resources I tried

  1. K&B version 5 – Granted you’d buy version 6 at this point.  (or the OCA/OCP 7 version once it is out).  If you plan to take the OCP afterwards, I recommend going with this book to study.  If you just want to take the OCA, it is overkill as the information you need is mixed up with lots of harder information you don’t know.
  2. K&B SCJP 6 mock exam book – If you are planning to take the OCP afterwards and don’t mind the material being mixed together, this is a great resource.  If you want to get your OCA first without being exposed to the OCP information, it isn’t helpful though.  Also, some of the content is no longer on the exam so you have to ignore these parts.
  3. Enthuware JA +V7 – This was a great resource.  It was more difficult than the exam, but not overwhelmingly so.  The only topic that stood out as being on the mock and not needing to know was the ranges that primitive objects could hold.  A free 14 question trial is available so you can see what it is like before committing the $10.  Yes, you read that right.  You get 6 full length mock exams, analysis on your weak areas and the ability to retake questions by subject – all for ten bucks.  Even though there aren’t drag and drop questions, the mock exam still includes them because they are harder.  It also includes one or two fill in the blank questions.  (I found this to be practically impossible as a one character typo marks it wrong and then it is hard to compare to see why you were wrong.  But it isn’t frequent.)  The mock has been updated for Java 7 and the new objectives/difficulty.  I was highly satisfied with it. [My scores were 78%, 82%, 82%, 83%, 80% and 88%]
If you only have one resource, I would pick the Enthuware mock exams.

My impressions of the exam

  • Just like on the SCEA, I had a ton of time.  I had an hour left after taking all the questions and carefully reviewing them.  (I found one wrong answer on review.)
  • Unlike the SCEA, the questions are designed to be tricky/subtle.
  • It was similar to the Enthuware mocks in terms of facts/skills you had to know and format.  The Enthuware questions used some of the same variable names, class names and structures.  Kind of like the SCJP talks about horses.
  • I really like how the exam uses incredibly similar looking questions to ask different concepts.  Even within the same exam.  This prevents you from remembering much of use or memorizing much in advance.
  • A few questions had you flipping between the “exhibit” class and the question.  This was annoying because if you are looking for subtle details, it is a lot to remember between flips or *a lot* of flips.
  • While taking the mock exams, I found a technique that helped me limit stupid errors.  My score jumped 8-10% of the last mock I took (the only one using that technique) and then again on the actual exam.  On the mocks, I went too fast because the questions appeared easy.  On the final mock and real exam, I subvocalized as I read the code for *all* code questions.  This prevented my brain from going too fast and missing anything.
  • When you get your score report, it tells you which objectives you missed questions in.  Mine was “flow control” which tells me I made a stupid error in that space.

And finally, why you should visit a testing center in advance

Note: the testing center fixed the problems I had – see comment below this post for updates

I took the SCEA and Java Programmer part 2 exams at my local testing center which I am happy with. They provide you with pen/paper and a detached room from the office to take the exam.  The only distraction is if there is another person in the room with you and he/she stops and starts at a different time.  They take care to be quiet.

Then there was the center I was at today.  They gave me a *one sided* erasable markerboard to write on.  But that’s within their rights.  If you want paper, you should call and ask.  The exam was held in what looked like a closet.  A small table with two computers and poor ventilation.  It was hot!  And no soundproofing.  I heard everyone who walked into the office while I was there.  Which included an irate customer who had computer troubles and was dissatisfied with the speed at which they were fixing it (this went on for 10 minutes) and another potential customer who was inquiring about training. It was loud enough that I had to hold my ears to think.  Which doesn’t go well with using a mouse to select answers or trace code on paper.

I wanted to take the exam today because I was off work today.  And I figured minimizing the time between part 2 and part 1 would give me less time to get out of the habit of looking for details in exam questions.  It was the right decision.  But I wouldn’t recommend “Horizon Technical Consultants of Flushing” as a testing center for anyone.