Short: which compression to use?
Table of Contents
I recently had to push a few backups around and I wanted to know if there is a better choice than XZ around: it is!
Why? What? #
I usually use gzip(1) for archives that need to be quickly made and xz(1) for archives that need to be smaller and I already learned about zstd(1) but barely used it so far.
That has to change!
Because I did a quick test on my websites files and was astonished:
Test #1 with default options #
I will archive my local website folder with tar and compress the archive within tar with a) gzip, b) xz and c) zstd.
GZIP #
% time tar -czf oe7drt-website.tar.gz oe7drt-website
tar -czf oe7drt-website.tar.gz oe7drt-website 18,87s user 0,63s system 101% cpu 19,199 total
XZ #
% time tar -cJf oe7drt-website.tar.xz oe7drt-website
tar -cJf oe7drt-website.tar.xz oe7drt-website 528,65s user 2,26s system 719% cpu 1:13,81 total
ZSTD #
% time tar --create --file oe7drt-website.tar.zst --zstd oe7drt-website
tar --create --file oe7drt-website.tar.zst --zstd oe7drt-website 3,25s user 1,08s system 278% cpu 1,555 total
Result #
% lld *.tar.*
580 MB oe7drt-website.tar.gz
868 MB oe7drt-website.tar.xz
871 MB oe7drt-website.tar.zst
% du -sh oe7drt-website/
915M oe7drt-website/
Compression | Time | Size | Ratio |
---|---|---|---|
GZIP | 00:18 | 580MB | 63% |
XZ | 08:48 | 868MB | 94% |
ZSTD | 00:03 | 871MB | 95% |
With ZSTD being super-fast and just slightly bigger than XZ it is the logical choice for me in terms of “saving time”. GZIP produced the smallest file (in this case!) but had to work quite some time for it.
And if I really want small files, I will probably compress outside of tar using zstd
directly. But let’s see this in another example with the same directory again.
Test #2 with --best
options #
I will create the “normal” archive with tar(1) .
% time tar -cf oe7drt-website.tar oe7drt-website
tar -cf oe7drt-website.tar oe7drt-website 0,01s user 0,53s system 99% cpu 0,543 total
Now we will compress the tar archive with different tools:
GZIP #
% time gzip -k --best oe7drt-website.tar
gzip -k --best oe7drt-website.tar 25,96s user 0,38s system 99% cpu 26,417 total
XZ #
% time xz -k --best oe7drt-website.tar
xz -k --best oe7drt-website.tar 484,74s user 1,01s system 242% cpu 3:20,21 total
ZSTD #
% time zstd -k -9 oe7drt-website.tar
oe7drt-website.tar : 95.36% ( 911 MiB => 869 MiB, oe7drt-website.tar.zst)
zstd -k -9 oe7drt-website.tar 10,38s user 1,03s system 223% cpu 5,113 total
Whereas -9 is the equivalent of using –best on gzip, but as I do not link gzip
to
zstd
I had to specify this manually.
Result #
% lld *.tar.*
875 MB oe7drt-website.tar.gz
858 MB oe7drt-website.tar.xz
869 MB oe7drt-website.tar.zst
% lld *.tar
911 MB oe7drt-website.tar
Compression | Time | Size | Ratio |
---|---|---|---|
GZIP | 00:25 | 875MB | 96% |
XZ | 04:00 | 858MB | 94% |
ZSTD | 00:03 | 869MB | 95% |
Some additional zstd
commands #
zstd
with --ultra
is what I use sometimes but now that I tested this aswell I doubt this
was a good idea back when I started using this:
% time zstd -k -T0 --ultra -20 oe7drt-website.tar
oe7drt-website.tar : 93.69% ( 911 MiB => 853 MiB, oe7drt-website.tar.zst)
zstd -k -T0 --ultra -20 oe7drt-website.tar 400,88s user 1,42s system 353% cpu 1:53,70 total
But today I found the option --max
which will optimize for maximum compression:
A ratio of 33% in 4 minutes with about 900MB of data – not bad:
% time zstd -k -T0 --max oe7drt-website.tar
oe7drt-website.tar : 33.46% ( 911 MiB => 305 MiB, oe7drt-website.tar.zst)
zstd -k -T0 --max oe7drt-website.tar 243,90s user 19,41s system 95% cpu 4:37,11 total
Though, that last command made the laptop a bit laggy as it is very time-consuming and resource hungry.
Comparing them #
% lld *.tar *zst
911 MB oe7drt-website.tar
869 MB oe7drt-website.tar.best.zst
853 MB oe7drt-website.tar.ultra.zst
305 MB oe7drt-website.tar.max.zst
Compression options | Time | Original size | Size | Ratio |
---|---|---|---|---|
default | 00:03 | 915MB | 871MB | 95% |
--best | 00:10 | 911MB | 869MB | 95% |
--ultra | 06:40 | 911MB | 853MB | 93% |
--max | 04:03 | 911MB | 305MB | 33% |
My conclusion #
If you want to create an archive quickly: use zstd with its default settings.
If you want to create a small archive: use zstd with the --max
option
(and probably -T0
)1
to increase the working threads (defaults to 1 otherwise; 0 tries to detect the amount of physical cores). ↩︎