Creating a tar.gz file in Java

Today’s article demonstrates how to create a tar.gz file in a single pass in Java. While there’s number of websites that provide instructions for creating a gzip or tar archive via Java, there aren’t any that will tell you how to make a tar.gz file without performing the same operations twice.

Reviewing Tar and Gzip Compression

First, download the Apache Commons Compression library. It is actually a subset of the code found in the Ant Jar for those performing compression operations that do not require all of Ant’s many features. Below is the code to create a tar and gzip archive, respectively, using the Compression library.

TarArchiveOutputStream out = null;
try {
     out = new TarArchiveOutputStream(
          new BufferedOutputStream(new FileOutputStream("myFile.tar")));
     // Add data to out and flush stream
     ...
} finally {
     if(out != null) out.close();
}
GZIPOutputStream out = null;
try {
     out = new GZIPOutputStream(
          new BufferedOutputStream(new FileOutputStream("myFile.tar")));
     // Add data to out and flush stream
     ...
} finally {
     if(out != null) out.close();
}

One subtlety in this example is that we use a BufferedOutputStream on the file stream for performance reasons. Often, archive files are large so that buffering the output is desirable. Another good practice is to always close your resources in a finally block after you are done with them.

The Solution

The solution is to wrap the tar stream around a gzip stream, since the order of writing goes inward from outer most to inner most stream. The code below first creates a tar archive, then compresses it inside a gzip stream. Buffering is applied and the result is written to disk.

TarArchiveOutputStream out = null;
try {
     out = new TarArchiveOutputStream(
          new GZIPOutputStream(
               new BufferedOutputStream(new FileOutputStream("myFile.tar.gz"))));
     // Add data to out and flush stream
     ...
} finally {
     if(out != null) out.close();
}

You can then treat the stream as a tar file using the TarArchiveEntry API to add entries and write data directly to the stream. The gzip compression will happen automatically as the stream is written.