XZ(1) | XZ Utils | XZ(1) |
unxz is equivalent to xz --decompress.
When writing scripts that need to decompress files, it is recommended to always use the name xz with appropriate arguments (xz -d or xz -dc) instead of the names unxz and xzcat.
xz compresses or decompresses each file according to the selected operation mode. If no files are given or file is -, xz reads from standard input and writes the processed data to standard output. xz will refuse (display an error and skip the file) to write compressed data to standard output if it is a terminal. Similarly, xz will refuse to read compressed data from standard input if it is a terminal.
Unless --stdout is specified, files other than - are written to a new file whose name is derived from the source file name:
If the target file already exists, an error is displayed and the file is skipped.
Unless writing to standard output, xz will display a warning and skip the file if any of the following applies:
After successfully compressing or decompressing the file, xz copies the owner, group, permissions, access time, and modification time from the source file to the target file. If copying the group fails, the permissions are modified so that the target file doesn't become accessible to users who didn't have permission to access the source file. xz doesn't support copying other metadata like access control lists or extended attributes yet.
Once the target file has been successfully closed, the source file is removed unless --keep was specified. The source file is never removed if the output is written to standard output.
Sending SIGINFO or SIGUSR1 to the xz process makes it print progress information to standard error. This has only limited use since when standard error is a terminal, using --verbose will display an automatically updating progress indicator.
Especially users of older systems may find the possibility of very large memory usage annoying. To prevent uncomfortable surprises, xz has a built-in memory usage limiter, which is disabled by default. While some operating systems provide ways to limit the memory usage of processes, relying on it wasn't deemed to be flexible enough (e.g. using ulimit(1) to limit virtual memory tends to cripple mmap(2)).
The memory usage limiter can be enabled with the command line option --memlimit=limit. Often it is more convenient to enable the limiter by default by setting the environment variable XZ_DEFAULTS, e.g. XZ_DEFAULTS=--memlimit=150MiB. It is possible to set the limits separately for compression and decompression by using --memlimit-compress=limit and --memlimit-decompress=limit. Using these two options outside XZ_DEFAULTS is rarely useful because a single run of xz cannot do both compression and decompression and --memlimit=limit (or -M limit) is shorter to type on the command line.
If the specified memory usage limit is exceeded when decompressing, xz will display an error and decompressing the file will fail. If the limit is exceeded when compressing, xz will try to scale the settings down so that the limit is no longer exceeded (except when using --format=raw or --no-adjust). This way the operation won't fail unless the limit is very small. The scaling of the settings is done in steps that don't match the compression level presets, e.g. if the limit is only slightly less than the amount required for xz -9, the settings will be scaled down only a little, not all the way down to xz -8.
It is possible to insert padding between the concatenated parts or after the last part. The padding must consist of null bytes and the size of the padding must be a multiple of four bytes. This can be useful e.g. if the .xz file is stored on a medium that measures file sizes in 512-byte blocks.
Concatenation and padding are not allowed with .lzma files or raw streams.
The special value max can be used to indicate the maximum integer value supported by the option.
Preset | DictSize | CompCPU | CompMem | DecMem |
-0 | 256 KiB | 0 | 3 MiB | 1 MiB |
-1 | 1 MiB | 1 | 9 MiB | 2 MiB |
-2 | 2 MiB | 2 | 17 MiB | 3 MiB |
-3 | 4 MiB | 3 | 32 MiB | 5 MiB |
-4 | 4 MiB | 4 | 48 MiB | 5 MiB |
-5 | 8 MiB | 5 | 94 MiB | 9 MiB |
-6 | 8 MiB | 6 | 94 MiB | 9 MiB |
-7 | 16 MiB | 6 | 186 MiB | 17 MiB |
-8 | 32 MiB | 6 | 370 MiB | 33 MiB |
-9 | 64 MiB | 6 | 674 MiB | 65 MiB |
Preset | DictSize | CompCPU | CompMem | DecMem |
-0e | 256 KiB | 8 | 4 MiB | 1 MiB |
-1e | 1 MiB | 8 | 13 MiB | 2 MiB |
-2e | 2 MiB | 8 | 25 MiB | 3 MiB |
-3e | 4 MiB | 7 | 48 MiB | 5 MiB |
-4e | 4 MiB | 8 | 48 MiB | 5 MiB |
-5e | 8 MiB | 7 | 94 MiB | 9 MiB |
-6e | 8 MiB | 8 | 94 MiB | 9 MiB |
-7e | 16 MiB | 8 | 186 MiB | 17 MiB |
-8e | 32 MiB | 8 | 370 MiB | 33 MiB |
-9e | 64 MiB | 8 | 674 MiB | 65 MiB |
A filter chain is comparable to piping on the command line. When compressing, the uncompressed input goes to the first filter, whose output goes to the next filter (if any). The output of the last filter gets written to the compressed file. The maximum number of filters in the chain is four, but typically a filter chain has only one or two filters.
Many filters have limitations on where they can be in the filter chain: some filters can work only as the last filter in the chain, some only as a non-last filter, and some work in any position in the chain. Depending on the filter, this limitation is either inherent to the filter design or exists to prevent security issues.
A custom filter chain is specified by using one or more filter options in the order they are wanted in the filter chain. That is, the order of filter options is significant! When decoding raw streams (--format=raw), the filter chain is specified in the same order as it was specified when compressing.
Filters take filter-specific options as a comma-separated list. Extra commas in options are ignored. Every option has a default value, so you need to specify only those you want to change.
Filter | Alignment | Notes |
x86 | 1 | 32-bit or 64-bit x86 |
PowerPC | 4 | Big endian only |
ARM | 4 | Little endian only |
ARM-Thumb | 2 | Little endian only |
IA-64 | 16 | Big or little endian |
SPARC | 4 | Big or little endian |
XZ_VERSION=XYYYZZZS
XYYYZZZS are the same on both lines if xz and liblzma are from the same XZ Utils release.
Examples: 4.999.9beta is 49990091 and 5.0.0 is 50000002.
In the future, the output of xz --robot --info-memory may have more columns, but never more than a single line.
The columns of the file lines:
The columns of the stream lines:
The columns of the block lines:
If --verbose was specified twice, additional columns are included on the block lines. These are not displayed with a single --verbose, because getting this information requires many seeks and can thus be slow:
The columns of the totals line:
If --verbose was specified twice, additional columns are included on the totals line:
Future versions may add new line types and new columns can be added to the existing line types, but the existing columns won't be changed.
Notices (not warnings or errors) printed on standard error don't affect the exit status.
XZ_OPT=-2v tar caf foo.tar.xz foo
XZ_OPT=${XZ_OPT-"-7e"}
export XZ_OPT
Level | xz | LZMA Utils |
-0 | 256 KiB | N/A |
-1 | 1 MiB | 64 KiB |
-2 | 2 MiB | 1 MiB |
-3 | 4 MiB | 512 KiB |
-4 | 4 MiB | 1 MiB |
-5 | 8 MiB | 2 MiB |
-6 | 8 MiB | 4 MiB |
-7 | 16 MiB | 8 MiB |
-8 | 32 MiB | 16 MiB |
-9 | 64 MiB | 32 MiB |
The dictionary size differences affect the compressor memory usage too, but there are some other differences between LZMA Utils and XZ Utils, which make the difference even bigger:
Level | xz | LZMA Utils 4.32.x |
-0 | 3 MiB | N/A |
-1 | 9 MiB | 2 MiB |
-2 | 17 MiB | 12 MiB |
-3 | 32 MiB | 12 MiB |
-4 | 48 MiB | 16 MiB |
-5 | 94 MiB | 26 MiB |
-6 | 94 MiB | 45 MiB |
-7 | 186 MiB | 83 MiB |
-8 | 370 MiB | 159 MiB |
-9 | 674 MiB | 311 MiB |
The default preset level in LZMA Utils is -7 while in XZ Utils it is -6, so both use an 8 MiB dictionary by default.
xz supports decompressing .lzma files with or without end-of-payload marker, but all .lzma files created by xz will use end-of-payload marker and have uncompressed size marked as unknown in the .lzma header. This may be a problem in some uncommon situations. For example, a .lzma decompressor in an embedded device might work only with files that have known uncompressed size. If you hit this problem, you need to use LZMA Utils or LZMA SDK to create .lzma files with known uncompressed size.
The implementation of the LZMA1 filter in liblzma requires that the sum of lc and lp must not exceed 4. Thus, .lzma files, which exceed this limitation, cannot be decompressed with xz.
LZMA Utils creates only .lzma files which have a dictionary size of 2^n (a power of 2) but accepts files with any dictionary size. liblzma accepts only .lzma files which have a dictionary size of 2^n or 2^n + 2^(n-1). This is to decrease false positives when detecting .lzma files.
These limitations shouldn't be a problem in practice, since practically all .lzma files have been compressed with settings that liblzma will accept.
If there is data left after the first .lzma stream, xz considers the file to be corrupt. This may break obscure scripts which have assumed that trailing garbage is ignored.
The above means that implementing --rsyncable to create rsyncable .xz files is not going to happen without freezing a part of the encoder implementation, which can then be used with --rsyncable.
Outside embedded systems, all .xz format decompressors support all the check types, or at least are able to decompress the file without verifying the integrity check if the particular check is not supported.
XZ Embedded supports BCJ filters, but only with the default start offset.
xz foo
Decompress bar.xz into bar and don't remove bar.xz even if decompression is successful:
xz -dk bar.xz
Create baz.tar.xz with the preset -4e (-4 --extreme), which is slower than e.g. the default -6, but needs less memory for compression and decompression (48 MiB and 5 MiB, respectively):
tar cf - baz | xz -4e > baz.tar.xz
A mix of compressed and uncompressed files can be decompressed to standard output with a single command:
xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
find . -type f \! -name '*.xz' -print0 \
| xargs -0r -P4 -n16 xz -T1
The -P option to xargs(1) sets the number of parallel xz processes. The best value for the -n option depends on how many files there are to be compressed. If there are only a couple of files, the value should probably be 1; with tens of thousands of files, 100 or even more may be appropriate to reduce the number of xz processes that xargs(1) will eventually create.
The option -T1 for xz is there to force it to single-threaded mode, because xargs(1) is used to control the amount of parallelization.
xz --robot --list *.xz | awk '/^totals/{print $5-$4}'
A script may want to know that it is using new enough xz. The following sh(1) script checks that the version number of the xz tool is at least 5.0.0. This method is compatible with old beta versions, which didn't support the --robot option:
if ! eval "$(xz --robot --version 2> /dev/null)" ||
[ "$XZ_VERSION" -lt 50000002 ]; then
echo "Your xz is too old."
fi
unset XZ_VERSION LIBLZMA_VERSION
Set a memory usage limit for decompression using XZ_OPT, but if a limit has already been set, don't increase it:
NEWLIM=$((123 << 20)) # 123 MiB
OLDLIM=$(xz --robot --info-memory | cut -f3)
if [ $OLDLIM -eq 0 -o $OLDLIM -gt $NEWLIM ]; then
XZ_OPT="$XZ_OPT --memlimit-decompress=$NEWLIM"
export XZ_OPT
fi
The CompCPU columns of the tables from the descriptions of the options -0 ... -9 and --extreme are useful when customizing LZMA2 presets. Here are the relevant parts collected from those two tables:
Preset | CompCPU |
-0 | 0 |
-1 | 1 |
-2 | 2 |
-3 | 3 |
-4 | 4 |
-5 | 5 |
-6 | 6 |
-5e | 7 |
-6e | 8 |
If you know that a file requires somewhat big dictionary (e.g. 32 MiB) to compress well, but you want to compress it quicker than xz -8 would do, a preset with a low CompCPU value (e.g. 1) can be modified to use a bigger dictionary:
xz --lzma2=preset=1,dict=32MiB foo.tar
With certain files, the above command may be faster than xz -6 while compressing significantly better. However, it must be emphasized that only some files benefit from a big dictionary while keeping the CompCPU value low. The most obvious situation, where a big dictionary can help a lot, is an archive containing very similar files of at least a few megabytes each. The dictionary size has to be significantly bigger than any individual file to allow LZMA2 to take full advantage of the similarities between consecutive files.
If very high compressor and decompressor memory usage is fine, and the file being compressed is at least several hundred megabytes, it may be useful to use an even bigger dictionary than the 64 MiB that xz -9 would use:
xz -vv --lzma2=dict=192MiB big_foo.tar
Using -vv (--verbose --verbose) like in the above example can be useful to see the memory requirements of the compressor and decompressor. Remember that using a dictionary bigger than the size of the uncompressed file is waste of memory, so the above command isn't useful for small files.
Sometimes the compression time doesn't matter, but the decompressor memory usage has to be kept low e.g. to make it possible to decompress the file on an embedded system. The following command uses -6e (-6 --extreme) as a base and sets the dictionary to only 64 KiB. The resulting file can be decompressed with XZ Embedded (that's why there is --check=crc32) using about 100 KiB of memory.
xz --check=crc32 --lzma2=preset=6e,dict=64KiB foo
If you want to squeeze out as many bytes as possible, adjusting the number of literal context bits (lc) and number of position bits (pb) can sometimes help. Adjusting the number of literal position bits (lp) might help too, but usually lc and pb are more important. E.g. a source code archive contains mostly US-ASCII text, so something like the following might give slightly (like 0.1 %) smaller file than xz -6e (try also without lc=4):
xz --lzma2=preset=6e,pb=0,lc=4 source_code.tar
Using another filter together with LZMA2 can improve compression with certain file types. E.g. to compress a x86-32 or x86-64 shared library using the x86 BCJ filter:
xz --x86 --lzma2 libfoo.so
Note that the order of the filter options is significant. If --x86 is specified after --lzma2, xz will give an error, because there cannot be any filter after LZMA2, and also because the x86 BCJ filter cannot be used as the last filter in the chain.
The Delta filter together with LZMA2 can give good results with bitmap images. It should usually beat PNG, which has a few more advanced filters than simple delta but uses Deflate for the actual compression.
The image has to be saved in uncompressed format, e.g. as uncompressed TIFF. The distance parameter of the Delta filter is set to match the number of bytes per pixel in the image. E.g. 24-bit RGB bitmap needs dist=3, and it is also good to pass pb=0 to LZMA2 to accommodate the three-byte alignment:
xz --delta=dist=3 --lzma2=pb=0 foo.tiff
If multiple images have been put into a single archive (e.g. .tar), the Delta filter will work on that too as long as all images have the same number of bytes per pixel.
XZ Utils: <http://tukaani.org/xz/>
2010-10-04 | Tukaani |