Creating a tar.gz file in Java

Today’s article demonstrates how to create a tar.gz file in a single pass in Java. While there’s number of websites that provide instructions for creating a gzip or tar archive via Java, there aren’t any that will tell you how to make a tar.gz file without performing the same operations twice.

Reviewing Tar and Gzip Compression

First, download the Apache Commons Compression library. It is actually a subset of the code found in the Ant Jar for those performing compression operations that do not require all of Ant’s many features. Below is the code to create a tar and gzip archive, respectively, using the Compression library.

TarArchiveOutputStream out = null;
try {
     out = new TarArchiveOutputStream(
          new BufferedOutputStream(new FileOutputStream("myFile.tar")));
     // Add data to out and flush stream
     ...
} finally {
     if(out != null) out.close();
}
GZIPOutputStream out = null;
try {
     out = new GZIPOutputStream(
          new BufferedOutputStream(new FileOutputStream("myFile.tar")));
     // Add data to out and flush stream
     ...
} finally {
     if(out != null) out.close();
}

One subtlety in this example is that we use a BufferedOutputStream on the file stream for performance reasons. Often, archive files are large so that buffering the output is desirable. Another good practice is to always close your resources in a finally block after you are done with them.

The Solution

The solution is to wrap the tar stream around a gzip stream, since the order of writing goes inward from outer most to inner most stream. The code below first creates a tar archive, then compresses it inside a gzip stream. Buffering is applied and the result is written to disk.

TarArchiveOutputStream out = null;
try {
     out = new TarArchiveOutputStream(
          new GZIPOutputStream(
               new BufferedOutputStream(new FileOutputStream("myFile.tar.gz"))));
     // Add data to out and flush stream
     ...
} finally {
     if(out != null) out.close();
}

You can then treat the stream as a tar file using the TarArchiveEntry API to add entries and write data directly to the stream. The gzip compression will happen automatically as the stream is written.

jeanne’s SCEA 5 part 1 experiences

Six weeks ago, I decided to take the SCEA 5 (Sun Certified Enterprise Architect) exam.  Today I passed part one with a score 87%.

(See part 2 and 3 notes as well)

Note: I did not pay for any mock exams.  I’m sure I would have scored above 90% had I done so.  I don’t think paying money for a higher passing score is worth it.  That isn’t learning for the sake of learning.  It is learning specifically for a test.

Since people at JavaRanch‘s SCEA forum often ask how one studied and about experiences, I decided to blog about it.

Deciding to take the test

The first thing I did was make sure the SCEA was not going away.  After all, who wants to be one of the last people to take the test before Oracle changes things?  As you can see in What’s next for Sun certifications now that Oracle is in charge? The SCEA looks quite stable and likely to continue.

While I’ve read the SCJP book, I’ve never taken the exam.  Multiple choice exams based on minutia that one doesn’t encounter in real life isn’t high on my list of things to do.  The SCEA doesn’t have the SCJP as a pre-requisite and emphasized the practical over trick questions and memorization so I decided to go for it.  Granted it is Sun’s view of the world so Spring and the like don’t fit it, but I find it a lot more relevant that the other exams.

My study plan

My plan was a wee bit vague.  I intended to get the study guide, see what was involved, study/practice as needed and register for the test.  I planned to spend 5-10 hours a week on this.  (I probably did this half the weeks.)  Here’s what actually happened:  [note this is a list of what I did and not what is good to do; see the next section for comments on the resources]

  1. Week 0
    • Decided want to take SCEA.  (My employer thinks it would be nice for people to get certified.  I’ve thought about doing so before, but this provided the final motivation to actually go for it.)
    • Convinced myself it is ok to do shortly before Oracle changes things because the SCEA looks intact.
  2. Week 1
    • Did sample questions to see where I stand.  (I copy/pasted the questions to notepad and removed the (*) next to the correct answers since they aren’t in a form one can use.  It still takes discipline to cheat, but I wanted it as a benchmark.  I got 5 or 6 out of 8 correct.  One of the answers was obviously incorrect.)  While this is (barely) above the passing score, I want to feel more confident before spending money to take the exam.  Decided if I read the SCEA book and practice some, I’ll be more reliably above it.
    • Ordered “Sun Certified Enterprise Architect for Java EE Study Guide“.
  3. Week 2
    • The only thing I did until the book came (Thursday) was look where the testing centers are me.
    • Once the book came, I read the first 5 chapters.  I was surprised by how easy the content was.  I guess this is because I already serve as an architect in some capacity.  This is good; I’ll be able to take Part 1 earlier than expected.
  4. Week 3
    • Registered for exam – since it doesn’t look overly difficult, I decided to take it soon.  I didn’t want to take in July 4th weekend and happened to have a day off work July 16th anyway, so I decided to go with July 16th.
    • Read remainder of book.
    • I had/was fighting off a cold part of the week so didn’t get much done.
    • The book contains sample questions that are supposed to be the level of difficultly of the exam.  (They seemed too easy, but turned out to be correct about the level of difficulty.)  Looked at scores to see how I am doing.  My scores for the chapters were 100%, 50%, 85%, 85%, 67%, 75%.  Some of the questions are not included in this percentage because I saw the answer before choosing one.  It was difficult to cover up the answers well.
  5. Week 4
    • Identified the areas I want to get stronger at:
      1. Make sure I can identify the design pattern names from descriptions.  While I am familiar with all the patterns, I often look them up in real life.
      2. Web Services – I know them in the capacity I have used them , but not as extensively as some of the other content.
    • Made flash cards for the design pattern names.  There are a few that have similar sounding descriptions and I wanted to make sure I could tell them apart.  After making the cards, it turned out I know almost all of them.  I imagine the act of making the cards did that.
  6. Week 5
    • Take free JavaBeat mock exams online to practice.  These were for the SCEA 4 exam, but interesting so I kept going.  Plus they gave me practice taking multiple choice closed book exams; a skill I haven’t used since college.
    • JavaBeat exams showed I need to review design patterns more didn’t do much this week though.  It was just too hot in New York City to think!
  7. Week 6
    • Reviewed the questions I got wrong so they are top of the mind
    • Looked at references from people in the SCEA forum at JavaRanch
    • Remind self not to read into questions (Sun doesn’t seem to think a web app is more secure than a fat client despite the logic being only on the server)
    • JavaChamp mock exam – got low score, but doesn’t tell how many correct answers there all the time so you have to guess how much to think about each question.
  8. The night before
  9. Take the test!

How were the resources I tried or read about?

Resource Cost Tried? Comments
Cade & Sheil Study Guide $40 Yes I relied on it heavily.  See my review on Amazon (which should be up within a day or two for more details.)
Whizlab Mock Trial free Yes 15 questions.  (got 93% on this which is slightly higher than real exam.  others have noted larger differences for Whizlab being easier.)  Satisfied with quality.
Whizlab Mock $100 No n/a – didn’t need extra help passing exam
Epractice Mock Trial free No I wanted to try this but you need a “free license key that gets e-mailed immediately” and it never came.
Epractice Mock $80 No n/a – didn’t need extra help passing exam
Sun’s epractice exam $65 No 120 SCEA 5 questions from Sun

n/a – didn’t need extra help passing exam

JavaBeat SCEA questions free Yes this is NOT for the SCEA 5 exam.  I found the design patterns and scenario questions to be good practice and comparable though.
About.com SCEA questions free Yes this is NOT for the SCEA 5 exam.  I found the design patterns and scenario questions to be good practice and comparable though.
JavaChamp Express Exam free Yes 20 question sample exam

Remember the SCEA 4 exam is quite different and study materials for it should only be used selectively.

What did I read?

Another thing people at JavaRanch typically state is what they read for the exam.  I really only read the Cade & Sheil study guide because so much of the content sounded familiar.  Prior to even thinking about taking the exam I have read many books including:

  1. EJB 3 in Action – see my review at JavaRanch from 2007
  2. Core Java Server Faces – see my review at JavaRanch from 2004
  3. JEE 5 tutorial – see my review at JavaRanch from 2006
  4. Other books further in the past including Gang of Four, Core J2EE Patterns, Design Patterns, etc

My impressions of the exam

  • I had a ton of time.  I completed pass #1 in 45 minutes.  I wrote down on paper which ones I wasn’t 100% sure of.  After pass 1, I calculated the total and I already had more 60% of the questions correct.  I then spent another 25 minutes on pass 2 thinking about the ones I wasn’t sure of or where more complicated.  Caveat: In school, I always finished tests with a load of extra time.  This just means the SCEA was consistent with other exams.
  • I dislike that you can’t review drag and drop questions.
  • Two of the questions I got wrong were because I didn’t realize they were in scope of the test – Java Cryptography Architecture and CORBA.
  • The questions were not tricky.  Subjective maybe, but not tricky.
  • I think this is one of those exams where having more experience hurts you a bit because you read into the questions.  For example, I know that a three tier app is more secure than a two tier app because you aren’t executing business logic on the client.  Sun disagrees.  I know because it is Cade’s book.  I imagine there are other things like that where the experience I have clashes with the test.
  • Most of the questions were high level.  There was only one I can recall that was at the level of what API/class name you would use and it was one every enterprise developer should know.
  • My section scores confirm these impressions.  I got 100% on web tier, applicability of JEE and Patterns.  I got 75% on Security (which would be the cryptography question), 66% on application design concepts and principles (which is only one wrong), 83% on integration and messaging (CORBA plus legitimate mistakes) and 87% on common architectures and business tier technologies (the two areas I suspect real world experience clashed.)

All in all, it was interesting and I felt the test was fair.

What’s next for Sun certifications now that Oracle is in charge?

At the moment, Sun’s certification page has a preview of the upcoming JEE 6 curriculum.  I’ve saved the image here in case it disappears like the SCJP Plus information did.  The learning paths on Oracle’s site do gel with this info so I think it can be safely assumed this is the plan as of now.

Disclaimer: I have no affiliation with Oracle or Sun.  This entire post is clues/speculation based on what is on the internet.

What is the implied mapping?

Based on the information available, the following chart shows what it looks like Oracle is planning.  Below the chart, I write my evidence for each. This just addresses exam naming at the moment.  I’m sure the content will change over time.  Sun was looking toward changes anyway with the Programmer Plus exam for Java 1.7.  For those uncertain about whether to get certified now, the name is important as is the fact that the certifications have a future.

Current Certification name Future Certification name
Sun Certified Java Associate (SCJA) Same
Sun Certified Java Programmer (SCJP) Same (no word on the programmer plus yet)
Sun Certified Java Developer (SCJD) Not enough info to tell
Sun Certified Web Component Developer (SCWCD) Split into Sun Certified Servlet/JSP developer and Sun Certified JSF developer.  Add JPA to form the Master Sun Certified Enterprise Web Developer
Sun Certified Business Component Developer (SCBCD) Split into Sun Certified EJB developer and Sun Certified JPA developer.  Combine to form the Master Sun Certified Business Application Developer
Sun Certified Developer for Java Web Services (SCDJWS) Renamed to Sun Certified Web Services Developer.  Add JPA and Servlet/JSP to form the Master Sun Certified Web Services Developer
Sun Certified Mobile Application Developer (SCMAD) Not enough info to tell
Sun Certified Enterprise Architect (SCEA) Same
  1. SCJA – As Oracle uses the words associate, professional, master and expert for their own database certification, it is unlikely they would get rid of Sun’s associate exam.  While the SCJA doesn’t show up on the JEE 6 learning path, it didn’t for the JEE 5 one either.  It was treated as an optional pre-requisite to the SCJP or a standalone exam.  I see no reason t his exam would not continue for the forseeable future.
  2. SCJP – Explicitly mentioned in the JEE 6 curriculum.
  3. SCJD – Not part of JEE so no info available.
  4. SCWCD  – The JEE 6 learning path shows this split into Servlet/JSP and JSF.
  5. SCBCD  – The JEE 6 learning path shows this split into EJB and JPA.
  6. SCDJWS – The JEE 6 learning path clearly shows this as a renamed exam.
  7. SCMAD – Not part of JEE so no info available.
  8. SCEA – Explicitly mentioned in the JEE 6 curriculum.  Continues to be independent of the other exams.  (The SCJP pre-requisite has only been for training classes, not for the actual exam.)

What is implied overall?

  1. There are more exams in the new world.  More money for Oracle.
  2. Exams combine to form “master” certifications in an area.  This is good if you want to get certified on just part of an area.  Say you don’t use JSF or EJB but want the other part of the certification.
  3. The word “Sun” is still in the name.  This is good for Oracle as far as branding goes.  Keeping Sun as a brand preserves the legacy built around the certifications.  There is some precedence for this.  The Hyperion and Peoplesoft certifications still have their old parent’s names.

Interesting facts:

  1. This image is named Java-EE-6-Curriculum-Path_option2.gif.  Option 1 is not available on the web server, but is shows some thought has gone into the new certifications.  While these are class listings rather than certification details, but still give some insight into the thinking going forward.
  2. Oracle’s learning paths show the difference between JEE 5 and JEE 6. (When clicking the links, you may have to choose a country and then go back to click the link again.)  The fact that they mesh with the roadmap on Sun’s site shows good consistency.
  3. Oracle’s learning path for core Java is the same.  It still shows the programmer, developer and mobile paths.  It hasn’t been updated in a while so we can’t assume much from this.
  4. We know Oracle is planning JEE 6 exams under the Sun branding.  They are currently advertising betas for the Sun Certified EJB Developer and the Sun Certified JSP and Servlette Developer.  And no, that’s not a typo.  Oracle seems to think “Servlet” is spelled “Servlette.”

What’s next?

Only time will tell.  Until Oracle announces things, all we can do is look for clues.