Re: 7z: ditch the underground, use xz instead
Published 2020-11-14Here's the short of it:
The problem with 7z is that there's no cross-platform tool that supports it - not even one!
And of the assorted tools that do exist, they're quirky, incomplete, hard to get, or just plain bad.
- Windows has the official 7-zip, but no CLI support
- Mac can only get
7z
through p7zip on brew - a 400mb download!! - Most Linuxes do have the
7z
CLI tool, butxz
support is ubiquitous unarr
supports 7z, but it's buggy
Use xz instead!
xz
also uses LZMA (and LZMA2) - so you get nearly identical compression to 7z
but,
unlike 7z
, it is integrated into several tools (including 7zip and p7zip) and libraries,
and has at least two cross-platform tools -
XZ Utils
and Arc -
that are easy to install and easy to use from the CLI:
unxz example.xz
arc unarchive example.tar.xz
xz
is also supported by 7zip on Windows, by tar
on Linux,
and by tar
on Mac - if xz
is installed (via brew, ugh!).
With such widespread support for xz
through various tools and libraries,
and with basically no support for 7z
... it just makes sense to me to use xz
.
That all said... 90% of the time, I think it's better to just use zip
(100% native cross-platform GUI support) or tar.gz
(100% native cross-platform CLI support, including Windows).
Why 7z/LZMA at all?
As far as I can tell 7z
is mostly used by Windows "power users", hackers, and
retro video game archivists.
Most compression algorithms work at the file level by creating small dictionaries (or "cheat sheets", if you will) that compress data within a certain chunk size - say 64kb or 900kb. After the window of X kb is exhausted, they start over and build a new dictionary for the next chunk.
That's generally good for small streams of data, especially when you need it to be fast and not take up too much RAM - such as for a webserver, an on-the-fly backup, or an installer - but it's not great for things like games and software on CD-ROM where you often have multiple copies of very similar files scattered across the entire disc.
In that case LZMA (the algorithm behind xz and 7z) really is a winner.
Rather than compressing per-file with a small window, it compresses everything in context of everything else. If you have many different copies of the same image texture or code library across hundreds of megabytes (or gigabytes) they all get compressed down into just one copy, which can be a significant savings.
The downside is that this means that compressing and decompressing are both slow and take up a lot of RAM. When decompressing it can't just grab a single file from an index "real quick", so it's not efficient to use as a virtual filesystem in the way that zip and tar files are.
If you want to understand this better you can read Unzipping the GZIP compression protocol, and listen to the Ars Technical War Stories episode on Crash Bandicoot, in which they explain the CD-ROM asset duplication strategy in entertaining detail:
- Ars Technica WAR STORIES S1.E21 - Crash Bandicoot
- Ars Technica WAR STORIES S1.E21 - Crash Bandicoot (extended)
There are more and more legitimate use cases for LZMA, but they're all happening in the xz
space.
I think 7z
's long-lived legacy in piracy, and the lack of good tools and libraries will probably
always keep it out of the mainstream.
Time to ditch the underground. Just use xz
.
By AJ ONeal
Did I make your day?
Buy me a coffee
(you can learn about the bigger picture I'm working towards on my patreon page )