Here's the short of it:

The problem with 7z is that there's no cross-platform tool that supports it - not even one!

And of the assorted tools that do exist, they're quirky, incomplete, hard to get, or just plain bad.

  • Windows has the official 7-zip, but no CLI support
  • Mac can only get 7z through p7zip on brew - a 400mb download!!
  • Most Linuxes do have the 7z CLI tool, but xz support is ubiquitous
  • unarr supports 7z, but it's buggy

Use xz instead!

xz also uses LZMA (and LZMA2) - so you get nearly identical compression to 7z but, unlike 7z, it is integrated into several tools (including 7zip and p7zip) and libraries, and has at least one cross-platform tool (arc) that is easy to install and easy to use from the CLI:

arc unarchive example.tar.xz

xz is also supported by 7zip on Windows, by tar on Linux, and by tar on Mac - if xz is installed (via brew, ugh!).

With such widespread support for xz through various tools and libraries, and with basically no support for 7z... it just makes sense to me to use xz.

That all said... 90% of the time, I think it's better to just use zip (100% native cross-platform GUI support) or tar.gz (100% native cross-platform CLI support, including Windows).

Why 7z/LZMA at all?

As far as I can tell 7z is mostly used by Windows "power users", hackers, and retro video game archivists.

Most compression algorithms work at the file level by creating small dictionaries (or "cheat sheets", if you will) that compress data within a certain chunk size - say 64kb or 900kb. After the window of X kb is exhausted, they start over and build a new dictionary for the next chunk.

That's generally good for small streams of data, especially when you need it to be fast and not take up too much RAM - such as for a webserver, an on-the-fly backup, or an installer - but it's not great for things like games and software on CD-ROM where you often have multiple copies of very similar files scattered across the entire disc.

In that case LZMA (the algorithm behind xz and 7z) really is a winner.

Rather than compressing per-file with a small window, it compresses everything in context of everything else. If you have many different copies of the same image texture or code library across hundreds of megabytes (or gigabytes) they all get compressed down into just one copy, which can be a significant savings.

The downside is that this means that compressing and decompressing are both slow and take up a lot of RAM. When decompressing it can't just grab a single file from an index "real quick", so it's not efficient to use as a virtual filesystem in the way that zip and tar files are.

If you want to understand this better you can read Unzipping the GZIP compression protocol, and listen to the Ars Technical War Stories episode on Crash Bandicoot, in which they explain the CD-ROM asset duplication strategy in entertaining detail:

There are more and more legitimate use cases for LZMA, but they're all happening in the xz space. I think 7z's long-lived legacy in piracy, and the lack of good tools and libraries will probably always keep it out of the mainstream.

Time to ditch the underground. Just use xz.

By AJ ONeal

If you loved this and want more like it, sign up!

Did I make your day?
Buy me a coffeeBuy me a coffee  

(you can learn about the bigger picture I'm working towards on my patreon page )