sort(1)sort(1)NAMEsort - sort or merge files
SYNOPSIS
output] char] keydef] [kmem]] recsz] dir] [file ...]
char] keydef] [kmem]] recsz] dir] [file ...]
DESCRIPTION
performs one of the following functions:
1. Sorts lines of all the named files together and writes the
result to the specified output.
2. Merges lines of all the named (presorted) files together and
writes the result to the specified output.
3. Checks that a single input file is correctly presorted.
The standard input is read if is used as a file name or no input files
are specified.
Comparisons are based on one or more sort keys extracted from each line
of input. By default, there is one sort key, the entire input line.
Ordering is lexicographic by characters using the collating sequence of
the current locale. If the locale is not specified or is set to the
locale, then ordering is lexicographic by bytes in machine-collating
sequence. If the locale includes multi-byte characters, single-byte
characters are machine-collated before multi-byte characters.
Behavior Modification Options
The following options alter the default behavior:
Sorts on a byte-by-byte basis using each character's encoded
value.
On some systems, extended characters will be consid‐
ered negative values, and so sort before ASCII char‐
acters. If you are sorting ASCII characters in a
non-C/POSIX locale, this flag performs much faster.
Check that the single input file is sorted according to the
ordering rules.
No output is produced; the exit code is set to indi‐
cate the result.
Merge only; the input files are assumed to be already sorted.
The argument given is the name of an output file
to use instead of the standard output. This file
can be the same as one of the input files.
Unique: suppress all but one in each
set of lines having equal keys. If used with the
option, check to see that there are no lines with
duplicate keys, in addition to checking that the
input file is sorted.
The amount of main memory used by the sort
can have a large impact on its performance. If this
option is omitted, begins using a system default
memory size, and continues to use more space as
needed. If this option is presented with a value,
kmem, starts using that number of kilobytes of mem‐
ory, unless the administrative minimum or maximum is
violated, in which case the corresponding extremum
will be used. Thus, is guaranteed to start with
minimum memory. By convention, (with no argument)
starts with maximum memory.
The size of the longest line read is recorded
in the sort phase so that buffers can be allocated
during the merge phase. If the sort phase is omit‐
ted via the or options, a popular system default
size will be used. Lines longer than the buffer
size will cause to terminate abnormally. Supplying
the actual number of bytes in the longest line to be
merged (or some larger value) will prevent abnormal
termination.
Use dir as the directory for temporary scratch files
rather than the default directory, which is is one
of the following, tried in order: the directory as
specified in the environment variable; and finally,
Ordering Rule Options
When ordering options appear before restricted sort key specifications,
the ordering rules are applied globally to all sort keys. When
attached to a specific sort key (described below), the ordering options
override all global ordering options for that key.
The following options override the default ordering rules:
Quasi-dictionary order:
only alphanumeric characters and blanks (spaces and
tabs), as defined by are significant in comparisons
(see environ(5)).
(UNIX Standard only, see standards(5)) The behavior
is undefined for a sort key to which -i or -n also
applies.
Fold letters.
Prior to being compared, all lowercase letters are
effectively converted into their uppercase equiva‐
lents, as defined by
In non-numeric comparisons, ignore all characters which are non-
printable,
as defined by For the ASCII character set, octal
character codes 001 through 037 and 0177 are
ignored.
The sort key is restricted to
an initial numeric string consisting of optional
blanks, an optional minus sign, zero or more digits
with optional radix character, and optional thou‐
sands separators. The radix and thousands separator
characters are defined by The field is sorted by
arithmetic value. An empty (missing) numeric field
is treated as arithmetic zero. Leading zeros and
plus or minus signs on zeros do not affect the
ordering. The option implies the option (see
below).
Reverse the sense of comparisons.
Compare as months.
The first several non-blank characters of the field
are folded to uppercase and compared with the lang‐
info(5) items < < ... < An invalid field is treated
as being less than string. For example, American
month names are compared such that < < ... < An
invalid field is treated as being less than all
months. The option implies the option (see below).
Field Separator Options
The treatment of field separators can be altered using the options:
Use char as the field separator character; char is not
considered to be part of a field (although it can be
included in a sort key). Each occurrence of char is
significant (for example, <char><char> delimits an
empty field). If is not specified, <blank> charac‐
ters will be used as default field separators; each
maximal sequence of <blank> characters that follows
a non-<blank> character is a field separator.
Ignore leading blanks when determining the starting and ending
positions of a restricted sort key. If the option
is specified before the first option pos1 argument),
it is applied to all options pos1 arguments). Oth‐
erwise, the option can be attached independently to
each field_start or field_end option pos1 or pos2
argument; see below). Note that the option is only
effective when restricted sort key specifications
are given.
Restricted Sort Key
The keydef argument defines a restricted sort key. The
format of this definition is
field_start[type][,field_end[type]]
which defines a key field beginning at field_start
and ending at field_end. The characters at posi‐
tions field_start and field_end are included in the
key field, providing that field_end does not precede
field_start. A missing field_end means the end of
the line. Fields and characters within fields are
numbered starting with Note that this is different
than the obsolete form of restricted sort keys,
where numbering starts at See below.
Specifying field_start and field_end involves the
notion of a field, a minimal sequence of characters
followed by a field separator or a new-line. By
default, the first blank of a sequence of blanks
acts as the field separator. All blanks in a
sequence of blanks are considered to be part of the
next field; for example, all blanks at the beginning
of a line are considered to be part of the first
field.
The arguments field_start and field_end each have
the form which are optionally followed by one or
more of the type options or These modifiers have the
functionality for this key only, that their command-
line counterparts have for the entire record.
A field_start position specified by is interpreted
to mean the nth character in the mth field. A miss‐
ing n means indicating the first character of the
mth field. If the option is in effect, n is counted
from the first non-blank character in the mth field.
A field_end position specified by is interpreted to
mean the nth character in the mth field. If n is
missing, the mth field ends at the last character of
the field. If the option is in effect, n is counted
from the first non-<blank> character in the mth
field.
Multiple options are permitted and are significant
in command line order. A maximum of 9 options can
be given. If no option is specified, a default sort
key of the entire line is used. When there are mul‐
tiple sort keys, later keys are compared only after
all earlier keys compare equal. Lines that other‐
wise compare equal are ordered with all bytes sig‐
nificant. If all the specified keys compare equal,
the entire record is used as the final key.
The option is intended to replace the obsolete pos1
pos2]] notation, using field_start and field_end
respectively. The fully specified pos1 pos2]] form:
+w.x-y.z
is equivalent to:
-k w+1.x+1,y.0 (if z == 0)
-k w+1.x+1,y+1.z (if z > 0)
Obsolete Restricted Sort Key
The notation restricts a sort key to one beginning at pos1 and ending
at pos2. The characters at positions pos1 and pos2 are included in the
sort key (provided that pos2 does not precede pos1). A missing means
the end of the line.
Specifying pos1 and pos2 involves the notion of a field, a minimal
sequence of characters followed by a field separator or a new-line. By
default, the first blank (space or tab) of a sequence of blanks acts as
the field separator. All blanks in a sequence of blanks are considered
to be part of the next field; for example, all blanks at the beginning
of a line are considered to be part of the first field.
pos1 and pos2 each have the form optionally followed by one or more of
the flags A starting position specified by is interpreted to mean char‐
acter n+1 in field m+1. A missing means indicating the first character
of field m+1. If the flag is in effect, n is counted from the first
non-blank in field m+1; refers to the first non-blank character in
field m+1.
A last position specified by is interpreted to mean the nth character
(including separators) after the last character of the mth field. A
missing means indicating the last character of the mth field. If the
flag is in effect, n is counted from the last leading blank in field
m+1; refers to the first non-blank in field m+1.
EXTERNAL INFLUENCES
For information about the UNIX standard environment, see standards(5).
Environment Variables
determines the default ordering rules applied to the sort.
determines the locale for interpretation of sequences of bytes of text
data as characters (e.g., single- verses multibyte characters in argu‐
ments and input files) and the behavior of character classification for
the and options.
determines the definition of the radix and thousands separator charac‐
ters for the option.
determines the month names for the option.
determines the language in which messages are displayed.
determines the locale to use to override the values of all the other
internationalization variables.
determines the location of message catalogs for the processing of
provides a default value for the internationalization variables that
are unset or null. If is unset or null, the default value of "C" (see
lang(5)) is used.
If any of the internationalization variables contains an invalid set‐
ting, behaves as if all internationalization variables are set to "C".
See environ(5).
International Code Set Support
Single- and multi-byte character code sets are supported.
EXAMPLES
Sort the contents of with the second field as the sort key:
Sort, in reverse order, the contents of and placing the output in and
using the first two characters of the second field as the sort key:
Sort, in reverse order, the contents of and using the first non-blank
character of the fourth field as the sort key:
Print the password file sorted by numeric user ID (the third colon-sep‐
arated field):
Print the lines of the presorted file suppressing all but the first
occurrence of lines having the same third field:
DIAGNOSTICS
exits with one of the following values:
All input files were output successfully, or
was specified and the input file was correctly presorted.
Under the
option, the file was not ordered as specified, or if the
and options were both specified, two input lines were found
with equal keys. This exit status is not returned if the
option is not used.
An error occurred such as when one or more input lines are too
long.
When the last line of an input file is missing a new-line character,
appends one, prints a warning message, and continues.
If an error occurs when accessing the tables that contain the collation
rules for the specified language, prints a warning message and defaults
to the locale.
If a or option is specified for a language with multi-byte characters,
prints a warning message and ignores the option.
WARNINGS
Numbering of fields and characters within fields option) has changed to
conform to the POSIX standard. Beginning at HP-UX Release 9.0, the
option numbers fields and characters within fields, starting with Prior
to HP-UX Release 9.0, numbering started at
A field separator specified by the option is recognized only if it is a
single-byte character.
The character type classification categories and are not defined for
multi-byte characters. For languages with multi-byte characters, all
characters are significant in comparisons.
For non-text input files, the behaviour is undefined.
AUTHOR
was developed by OSF and HP.
FILESSEE ALSOcomm(1), join(1), uniq(1), environ(5), lang(5), standards(5).
STANDARDS CONFORMANCEsort(1)