localedef(4)

localedef -- format for locale specification source file

Description

This manual page describes the format of a source file used as input to the localedef command. This file (called a localedef( ) file in this manual page) defines information about a locale. The localedef command uses the localization specifications in the source file to create data files used by the localization system to map system input and output into a local format.

A locale defines your execution environment in terms of language and cultural conventions. It consists of one or more categories corresponding to these environment variable names:

LC_COLLATE: Collation order used by commands such as sort and uniq.
LC_CTYPE: Character classification and case conversion.
LC_MESSAGES: Formats of informative and diagnostic messages and interactive responses.
LC_MONETARY: Monetary formatting.
LC_NUMERIC: Numeric, non-monetary formatting.
LC_TIME: Date and time formats.

When you run the localedef command, only those categories defined in the localedef( ) file used as input to the command are defined for the locale created.

Category source definition

The localedef( ) file consists of one or more category source definitions. A category source definition consists of a category header (one of the strings listed above), the category body (consisting of one or more lines of text), and the string END.

A category body contains lines of text. Each line contains an identifier and (optionally) operands. Identifiers are keywords (identifying a locale element) or collating elements. Operands may be characters, strings of characters, or collating elements. (Strings are enclosed in double quotes; literal double quotes in strings should be escaped by a preceding backslash.)

Comments are allowed in the localedef( ) file. By default, the comment character is a pound sign (#); blank lines and lines containing a # in the first position are ignored. The comment character can be changed if the first category header in the file is preceded by a line modifying the comment character. The line should have the following format:

   comment_char #

where # is the alternative comment character to use.

Lines can be continued by placing a backslash at the end of the line; the backslash and subsequent newline character are ignored. Comment lines are not treated in this way (the backslash is ignored, because it is read as part of the comment).

Individual characters, characters in strings, and collating elements, are represented by symbolic names. Alternatively, characters can be represented by themselves, or as octal, hexadecimal, or decimal values. Octal numbers are two or three digits, preceded by a backslash; hexadecimal numbers are two digits, preceded by \x and decimal numbers are two or more digits, preceded by \d.

Symbolic values are enclosed in angle brackets (< and >). The symbolic name (including angle brackets) must match a symbolic name defined in the charmap file specified by localedef -f.

LC_CTYPE

This category section defines character classification, case conversion, and other character attributes. In addition, a series of characters within a single character set can be represented by three adjacent periods ``...'', representing an ellipsis.

The following keywords are recognized:

upper: Defines characters classified as uppercase.
lower: Defines characters classified as lowercase.
alpha: Defines characters classified as letters (upper- or lowercase).
digit: Defines characters classified as digits.
space: Defines characters classified as whitespace.
cntrl: Defines characters classified as control characters.
punct: Defines characters classified as punctuation.
graph: Defines characters classified as printable characters, not including the space character.
print: Defines characters classified as printable characters, including the space character.
xdigit: Defines characters classified as hexadecimal digits. (Note that both upper and lowercase versions of the letters "A" to "F" are included in the POSIX locale; this practice is recommended.)
blank: Defines characters classified as blank characters.
toupper: Defines the mapping of lowercase letters to uppercase letters. (The operand consists of character pairs separated by semicolons; the characters in each pair are separated by a comma and enclosed in parentheses. The first character is lowercase, the second is its uppercase equivalent.)
tolower: Defines the mapping of uppercase letters to lowercase letters. (Same format as in toupper, except that the first character in each pair is uppercase and the second is lowercase.)
copy: Specify an existing locale to be used in the definition of this category. If this keyword is specified, no other keyword is allowed.

Note that some of the above categories are mutually exclusive or automatically inclusive. For example, it is not possible for a character to be alpha and digit. See the X/Open Interface Definitions, Version 4 Issue 2 for further details.

LC_COLLATE

This category section provides a collation sequence for utilities that rely on sorting (such as sort, uniq, and awk, regular expression matching, and some system calls.

A collation sequence defines the order between collating elements (characters and multibyte collating sequences) in the locale. The following keywords are recognized:

collating-element

Defines a collating-element symbol representing a multicharacter collating element.

collating-symbol

Defines a collating symbol for use in collation order statements.

order_start

Define collation rules. This statement is followed by one or more collation order statements, that assign character collation values and weights to collating elements. The syntax of the order_start command is:

   order_start <sort_rules>;<sort_rules> ... ;

The following sort_rules collating weights are allowed:

backward: Specifies that comparison operations for the weight level proceed from the end of the string towards the beginning of the string.
forward: Specifies that comparison operations for the weight level proceed from start of string towards the end of the string.
position: Specifies that comparison operations for the weight level will consider the relative position of elements in the strings not subject to IGNORE.
order_end: Defines the end of collation order statements.
copy: Specifies an existing locale to copy the collating sequence from. (If used, no other keyword is permitted.)

See the X/Open Interface Definitions, Version 4 Issue 2 for further details.

LC_MONETARY

This category section defines the rules and symbols used to format monetary numeric information. The following items are defined in this section:

int_curr_symbol

Specifies the international currency symbol. This is a four-character string; the first three characters contain the alphabetic international currency symbol (in accordance with ISO 4217:1987), while the fourth character is the separator used between the currency symbol and the monetary quantity.

currency_symbol

Specifies the string used as the local currency symbol.

mon_decimal_point

Specifies the symbol used as the decimal delimiter in monetary formatted quantities.

mon_thousands_sep

Specifies the symbol used as a separator for groups of digits to the left of a decimal delimiter in monetary formatted quantities.

mon_grouping

Specifies the size of each group of digits in formatted monetary quantities. Requires as an operand a sequence of integers separated by semicolons. Each integer specifies the number of digits in each group, with the initial integer defining the size of the group immediately preceding the decimal delimiter, and the following integers defining the groups before that. If the last integer is not -1, then the size of the previous group will be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping is performed.

positive_sign

Specifies the sign used to indicate a non-negative formatted monetary quantity.

negative_sign

Specifies the sign used to indicate a negative formatted monetary quantity.

int_frac_digits

Specifies an integer representing the number of fractional digits to be written in a formatted monetary quantity using int_curr_symbol.

frac_digits

Specifies an integer representing the number of fractional digits to be written in a formatted monetary quantity using currency_symbol.

p_cs_precedes

Specifies that the currency_symbol or int_curr_symbol precedes the monetary quantity (if value is 1) or follows the monetary quantity (if value is 0). This only applies for non-negative monetary quantities.

p_sep_by_space

Specifies that no space separates the currency symbol from the monetary quantity (if value is 0), or that a space separates the currency symbol from the quantity (if value is 1), or that a space separates the symbol and the sign string (if value is 2). This only applies to non-negative monetary quantities.

n_cs_precedes

n_sep_by_space

p_sign_posn

Specifies the positioning of the positive_sign for a non-negative monetary quantity. The following values are possible:

0: Parentheses enclose the quantity and the currency symbol.
1: The sign string precedes the quantity and the currency symbol.
2: The sign string follows the quantity and the currency symbol.
3: The sign string precedes the currency symbol.
4: The sign string follows the currency symbol.

n_sign_posn

Specifies the positioning of the negative_sign for a negative monetary quantity. (Accepts the same values as p_sign_posn.)

copy

Specifies the name of an existing locale from which to copy the definition of this section. If specified, no other keywords are allowed.

See the X/Open Interface Definitions, Version 4 Issue 2 for further details.

LC_NUMERIC

This category defines the rules and symbols used to format non-monetary numeric information. The following keywords are recognized:

copy: Specifies the name of an existing locale to use as the definition for this category. (If specified, no other keywords are allowed.)
decimal_point: Specifies a string containing the symbol used as the decimal delimiter. This keyword cannot be omitted and cannot be set to the empty string.
grouping: Specifies the size of each group of digits to the left of the decimal point (or other separator). The operand is a sequence of integers separated by semicolons. Each integer specifies the number of digits in each group, in ascending ordinality. If the last integer is not -1, the size of the last group is repeatedly applied to the remaining digits; if the last integer is -1, then no further grouping is applied.
thousands_sep: Specifies the symbol used as a separator for groups of digits to the left of the decimal point.

See the X/Open Interface Definitions, Version 4 Issue 2 for further details.

LC_TIME

This category defines the formatting of time and date values. The definitions imply a Gregorian style calendar: formatting time strings for other types of calendar is outside the scope of the X/Open specifications.

The following mandatory keywords are recognized:

abday: Specify the abbreviated weekday names, corresponding to the %a field descriptor. The operand consists of seven semicolon separated strings, each surrounded by double quotes. The first string corresponds to Sunday, the second to Monday, and so on.
day: Specify the full weekday names, corresponding to the %A field descriptor. The operand consists of seven semicolon separated strings, each surrounded by double quotes. The first string corresponds to Sunday, the second to Monday, and so on.
abmon: Specify the abbreviated month names, corresponding to the %b field descriptor. The operand consists of twelve semicolon separated strings, each surrounded by double quotes. The first string corresponds to the first month (January), the second to the second month, and so on.
mon: Specify the full month names, corresponding to the %B field descriptor. The operand consists of twelve semicolon separated strings, each surrounded by double quotes. The first string corresponds to the first month (January), the second to the second month, and so on.
d_t_fmt: Specify the date and time representation, corresponding to the %c field descriptor. The operand consists of a string and can contain any combination of characters and field descriptors. In addition the string can contain escape sequences (\\, \a, \b, \f, \n, \r, \t, \v).
d_fmt: Specify the date representation, corresponding to the %x field descriptor. (Operand as for d_t_fmt.)
t_fmt: Specify the time representation, corresponding to the %X field descriptor. (Operand as for d_t_fmt.)
am_pm: Specify the AM/PM representation, corresponding to the %p field descriptor. The operand consists of two double-quoted strings, separated by a semicolon.
t_fmt_ampm: Specify the time representation in the 12-hour format, corresponding to the %r field descriptor. (Operand as for d_t_fmt.)
copy: Specifies the name of an existing locale from which to copy the definition of this section. If specified, no other keywords are allowed.
ABDAY_x: The abbreviated weekday names, where x is a number from 1 to 7.
DAY_x: The full weekday names, where x is a number from 1 to 7.
ABMON_x: The abbreviated month names, where x is a number from 1 to 12.
MON_x: The full month names, where x is a number from 1 to 12.
D_T_FMT: The appropriate date and time representation.
D_FMT: The appropriate date representation.
T_FMT: The appropriate time representation.
AM_STR: The appropriate AM affix.
PM_STR: The appropriate PM affix.
T_FMT_AMPM: The appropriate time representation in the 12 hour clock format with AM_STR and PM_STR.
ERA: The Era description segments (specifying how years are counted and displayed for each era in a locale).
ERA_D_FMT: The era date format.
ALT_DIGITS: Specifies the alternative symbols for digits, corresponding to the %O conversion specification modifier. The operands consist of semicolon-separated symbols. The first is the alternative symbol for zero, the second is the alternative symbol for one, and so on. (A maximum of 100 alternative symbols are supported.)

See the X/Open Interface Definitions, Version 4 Issue 2 for further details.

LC_MESSAGES

This category defines the format and values for affirmative or negative responses. The following keywords are recognized:

yesexpr: Specifies an operand that describes the acceptable affirmative response to a question expecting a yes/no response. Operand is an extended regular expression.
noexpr: Specifies an operand that describes the acceptable negative response to a question expecting a yes/no response. Operand is an extended regular expression.
copy: Specifies the name of an existing locale from which to copy the definition of this section. If specified, no other keywords are allowed.

See the X/Open System Interface Definitions, Version 4 Issue 2 for further details.

References

awk(1), localedef(1), locale(1), locale(4), sort(1), uniq(1).

Standards conformance

localedef( ) is conformant with X/Open Interface Definitions, Version 4 Issue 2.