Today’s article demonstrates how to create a tar.gz file in a single pass in Java. While there’s number of websites that provide instructions for creating a gzip or tar archive via Java, there aren’t any that will tell you how to make a tar.gz file without performing the same operations twice.
Reviewing Tar and Gzip Compression
First, download the Apache Commons Compression library. It is actually a subset of the code found in the Ant Jar for those performing compression operations that do not require all of Ant’s many features. Below is the code to create a tar and gzip archive, respectively, using the Compression library.
TarArchiveOutputStream out = null;
try {
out = new TarArchiveOutputStream(
new BufferedOutputStream(new FileOutputStream("myFile.tar")));
// Add data to out and flush stream
...
} finally {
if(out != null) out.close();
}
GZIPOutputStream out = null;
try {
out = new GZIPOutputStream(
new BufferedOutputStream(new FileOutputStream("myFile.tar")));
// Add data to out and flush stream
...
} finally {
if(out != null) out.close();
}
One subtlety in this example is that we use a BufferedOutputStream on the file stream for performance reasons. Often, archive files are large so that buffering the output is desirable. Another good practice is to always close your resources in a finally block after you are done with them.
The Solution
The solution is to wrap the tar stream around a gzip stream, since the order of writing goes inward from outer most to inner most stream. The code below first creates a tar archive, then compresses it inside a gzip stream. Buffering is applied and the result is written to disk.
TarArchiveOutputStream out = null;
try {
out = new TarArchiveOutputStream(
new GZIPOutputStream(
new BufferedOutputStream(new FileOutputStream("myFile.tar.gz"))));
// Add data to out and flush stream
...
} finally {
if(out != null) out.close();
}
You can then treat the stream as a tar file using the TarArchiveEntry API to add entries and write data directly to the stream. The gzip compression will happen automatically as the stream is written.