Re: 7z: ditch the underground, use xz insteadPublished 2020-11-14 Updated 11:34pm 2020-11-16
Here's the short of it:
The problem with 7z is that there's no cross-platform tool that supports it - not even one!
And of the assorted tools that do exist, they're quirky, incomplete, hard to get, or just plain bad.
- Windows has the official 7-zip, but no CLI support
- Mac can only get
7zthrough p7zip on brew - a 400mb download!!
- Most Linuxes do have the
7zCLI tool, but
xzsupport is ubiquitous
unarrsupports 7z, but it's buggy
Use xz instead!
xz also uses LZMA (and LZMA2) - so you get nearly identical compression to
7z, it is integrated into several tools (including 7zip and p7zip) and libraries,
and has at least one cross-platform tool (arc)
that is easy to install and easy to use from the CLI:
arc unarchive example.tar.xz
xz is also supported by 7zip on Windows, by
tar on Linux,
tar on Mac - if
xz is installed (via brew, ugh!).
With such widespread support for
xz through various tools and libraries,
and with basically no support for
7z... it just makes sense to me to use
That all said... 90% of the time, I think it's better to just use
(100% native cross-platform GUI support) or
tar.gz (100% native cross-platform CLI support, including Windows).
Why 7z/LZMA at all?
As far as I can tell
7z is mostly used by Windows "power users", hackers, and
retro video game archivists.
Most compression algorithms work at the file level by creating small dictionaries (or "cheat sheets", if you will) that compress data within a certain chunk size - say 64kb or 900kb. After the window of X kb is exhausted, they start over and build a new dictionary for the next chunk.
That's generally good for small streams of data, especially when you need it to be fast and not take up too much RAM - such as for a webserver, an on-the-fly backup, or an installer - but it's not great for things like games and software on CD-ROM where you often have multiple copies of very similar files scattered across the entire disc.
In that case LZMA (the algorithm behind xz and 7z) really is a winner.
Rather than compressing per-file with a small window, it compresses everything in context of everything else. If you have many different copies of the same image texture or code library across hundreds of megabytes (or gigabytes) they all get compressed down into just one copy, which can be a significant savings.
The downside is that this means that compressing and decompressing are both slow and take up a lot of RAM. When decompressing it can't just grab a single file from an index "real quick", so it's not efficient to use as a virtual filesystem in the way that zip and tar files are.
If you want to understand this better you can read Unzipping the GZIP compression protocol, and listen to the Ars Technical War Stories episode on Crash Bandicoot, in which they explain the CD-ROM asset duplication strategy in entertaining detail:
- Ars Technica WAR STORIES S1.E21 - Crash Bandicoot
- Ars Technica WAR STORIES S1.E21 - Crash Bandicoot (extended)
There are more and more legitimate use cases for LZMA, but they're all happening in the
7z's long-lived legacy in piracy, and the lack of good tools and libraries will probably
always keep it out of the mainstream.
Time to ditch the underground. Just use
By AJ ONeal
Did I make your day?
Buy me a coffee