NLS(7) | Miscellaneous Information Manual | NLS(7) |
All information pertaining to cultural conventions and language is obtained at program run time.
“Internationalization” (often abbreviated “i18n”) refers to the operation by which system software is developed to support multiple cultural-specific and language-specific conventions. This is a generalization process by which the system is untied from calling only English strings or other English-specific conventions. “Localization” (often abbreviated “l10n”) refers to the operations by which the user environment is customized to handle its input and output appropriate for specific language and cultural conventions. This is a specialization process, by which generic methods already implemented in an internationalized system are used in specific ways. The formal description of cultural conventions for some country, together with all associated translations targeted to the native language, is called the “locale”.
NetBSD provides extensive support to programmers and system developers to enable internationalized software to be developed. NetBSD also supplies a large variety of locales for system localization.
A locale is divided into categories. A category is a group of language-specific and culture-specific conventions as outlined in the list above. ISO C specifies the following six standard categories supported by NetBSD:
Localization of the system is achieved by setting appropriate values in environment variables to identify which locale should be used. The environment variables have the same names as their respective locale categories. Additionally, the LANG, LC_ALL, and NLSPATH environment variables are used. The NLSPATH environment variable specifies a colon-separated list of directory names where the message catalog files of the NLS database are located. The LC_ALL and LANG environment variables also determine the current locale.
The values of these environment variables contains a string format as:
language[_territory][.codeset][@modifier]
Valid values for the language field come from the ISO639 standard which defines two-character codes for many languages. Some common language codes are:
Language Name | Code | Language Family |
ABKHAZIAN | AB | IBERO-CAUCASIAN |
AFAN (OROMO) | OM | HAMITIC |
AFAR | AA | HAMITIC |
AFRIKAANS | AF | GERMANIC |
ALBANIAN | SQ | INDO-EUROPEAN (OTHER) |
AMHARIC | AM | SEMITIC |
ARABIC | AR | SEMITIC |
ARMENIAN | HY | INDO-EUROPEAN (OTHER) |
ASSAMESE | AS | INDIAN |
AYMARA | AY | AMERINDIAN |
AZERBAIJANI | AZ | TURKIC/ALTAIC |
BASHKIR | BA | TURKIC/ALTAIC |
BASQUE | EU | BASQUE |
BENGALI | BN | INDIAN |
BHUTANI | DZ | ASIAN |
BIHARI | BH | INDIAN |
BISLAMA | BI | |
BRETON | BR | CELTIC |
BULGARIAN | BG | SLAVIC |
BURMESE | MY | ASIAN |
BYELORUSSIAN | BE | SLAVIC |
CAMBODIAN | KM | ASIAN |
CATALAN | CA | ROMANCE |
CHINESE | ZH | ASIAN |
CORSICAN | CO | ROMANCE |
CROATIAN | HR | SLAVIC |
CZECH | CS | SLAVIC |
DANISH | DA | GERMANIC |
DUTCH | NL | GERMANIC |
ENGLISH | EN | GERMANIC |
ESPERANTO | EO | INTERNATIONAL AUX. |
ESTONIAN | ET | FINNO-UGRIC |
FAROESE | FO | GERMANIC |
FIJI | FJ | OCEANIC/INDONESIAN |
FINNISH | FI | FINNO-UGRIC |
FRENCH | FR | ROMANCE |
FRISIAN | FY | GERMANIC |
GALICIAN | GL | ROMANCE |
GEORGIAN | KA | IBERO-CAUCASIAN |
GERMAN | DE | GERMANIC |
GREEK | EL | LATIN/GREEK |
GREENLANDIC | KL | ESKIMO |
GUARANI | GN | AMERINDIAN |
GUJARATI | GU | INDIAN |
HAUSA | HA | NEGRO-AFRICAN |
HEBREW | HE | SEMITIC |
HINDI | HI | INDIAN |
HUNGARIAN | HU | FINNO-UGRIC |
ICELANDIC | IS | GERMANIC |
INDONESIAN | ID | OCEANIC/INDONESIAN |
INTERLINGUA | IA | INTERNATIONAL AUX. |
INTERLINGUE | IE | INTERNATIONAL AUX. |
INUKTITUT | IU | |
INUPIAK | IK | ESKIMO |
IRISH | GA | CELTIC |
ITALIAN | IT | ROMANCE |
JAPANESE | JA | ASIAN |
JAVANESE | JV | OCEANIC/INDONESIAN |
KANNADA | KN | DRAVIDIAN |
KASHMIRI | KS | INDIAN |
KAZAKH | KK | TURKIC/ALTAIC |
KINYARWANDA | RW | NEGRO-AFRICAN |
KIRGHIZ | KY | TURKIC/ALTAIC |
KURUNDI | RN | NEGRO-AFRICAN |
KOREAN | KO | ASIAN |
KURDISH | KU | IRANIAN |
LAOTHIAN | LO | ASIAN |
LATIN | LA | LATIN/GREEK |
LATVIAN | LV | BALTIC |
LINGALA | LN | NEGRO-AFRICAN |
LITHUANIAN | LT | BALTIC |
MACEDONIAN | MK | SLAVIC |
MALAGASY | MG | OCEANIC/INDONESIAN |
MALAY | MS | OCEANIC/INDONESIAN |
MALAYALAM | ML | DRAVIDIAN |
MALTESE | MT | SEMITIC |
MAORI | MI | OCEANIC/INDONESIAN |
MARATHI | MR | INDIAN |
MOLDAVIAN | MO | ROMANCE |
MONGOLIAN | MN | |
NAURU | NA | |
NEPALI | NE | INDIAN |
NORWEGIAN | NO | GERMANIC |
OCCITAN | OC | ROMANCE |
ORIYA | OR | INDIAN |
PASHTO | PS | IRANIAN |
PERSIAN (farsi) | FA | IRANIAN |
POLISH | PL | SLAVIC |
PORTUGUESE | PT | ROMANCE |
PUNJABI | PA | INDIAN |
QUECHUA | QU | AMERINDIAN |
RHAETO-ROMANCE | RM | ROMANCE |
ROMANIAN | RO | ROMANCE |
RUSSIAN | RU | SLAVIC |
SAMOAN | SM | OCEANIC/INDONESIAN |
SANGHO | SG | NEGRO-AFRICAN |
SANSKRIT | SA | INDIAN |
SCOTS GAELIC | GD | CELTIC |
SERBIAN | SR | SLAVIC |
SERBO-CROATIAN | SH | SLAVIC |
SESOTHO | ST | NEGRO-AFRICAN |
SETSWANA | TN | NEGRO-AFRICAN |
SHONA | SN | NEGRO-AFRICAN |
SINDHI | SD | INDIAN |
SINGHALESE | SI | INDIAN |
SISWATI | SS | NEGRO-AFRICAN |
SLOVAK | SK | SLAVIC |
SLOVENIAN | SL | SLAVIC |
SOMALI | SO | HAMITIC |
SPANISH | ES | ROMANCE |
SUNDANESE | SU | OCEANIC/INDONESIAN |
SWAHILI | SW | NEGRO-AFRICAN |
SWEDISH | SV | GERMANIC |
TAGALOG | TL | OCEANIC/INDONESIAN |
TAJIK | TG | IRANIAN |
TAMIL | TA | DRAVIDIAN |
TATAR | TT | TURKIC/ALTAIC |
TELUGU | TE | DRAVIDIAN |
THAI | TH | ASIAN |
TIBETAN | BO | ASIAN |
TIGRINYA | TI | SEMITIC |
TONGA | TO | OCEANIC/INDONESIAN |
TSONGA | TS | NEGRO-AFRICAN |
TURKISH | TR | TURKIC/ALTAIC |
TURKMEN | TK | TURKIC/ALTAIC |
TWI | TW | NEGRO-AFRICAN |
UIGUR | UG | |
UKRAINIAN | UK | SLAVIC |
URDU | UR | INDIAN |
UZBEK | UZ | TURKIC/ALTAIC |
VIETNAMESE | VI | ASIAN |
VOLAPUK | VO | INTERNATIONAL AUX. |
WELSH | CY | CELTIC |
WOLOF | WO | NEGRO-AFRICAN |
XHOSA | XH | NEGRO-AFRICAN |
YIDDISH | YI | GERMANIC |
YORUBA | YO | NEGRO-AFRICAN |
ZHUANG | ZA | |
ZULU | ZU | NEGRO-AFRICAN |
For example, the locale for the Danish language spoken in Denmark using the ISO 8859-1 character set is da_DK.ISO8859-1. The da stands for the Danish language and the DK stands for Denmark. The short form of da_DK is sufficient to indicate this locale.
The environment variable settings are queried by their priority level in the following manner:
The following character sets are supported in NetBSD:
The NetBSD wscons(4) console provides support for loading fonts using the wsfontload(8) utility. Currently, only fonts for the ISO8859-1 family of character sets are supported.
Access to locale information is provided through the setlocale(3) and nl_langinfo(3) interfaces. See their respective man pages for further information.
Message source files containing application messages are created by the programmer and converted to message catalogs. These catalogs are used by the application to retrieve and display messages, as needed.
NetBSD supports two message catalog interfaces: the X/Open catgets(3) interface and the Uniforum gettext(3) interface. The catgets(3) interface has the advantage that it belongs to a standard which is well supported. Unfortunately the interface is complicated to use and maintenance of the catalogs is difficult. The implementation also doesn't support different character sets. The gettext(3) interface has not been standardized yet, however it is being supported by an increasing number of systems. It also provides many additional tools which make programming and catalog maintenance much easier.
A wide character is specified in ISO C as being a fixed number of bits wide and is stateless. There are two types for wide characters: wchar_t and wint_t. wchar_t is a type which can contain one wide character and operates like 'char' type does for one character. wint_t can contain one wide character or WEOF (wide EOF).
There are functions that operate on wchar_t, and substitute for functions operating on 'char'. See wmemchr(3) and towlower(3) for details. There are some additional functions that operate on wchar_t. See wctype(3) and wctrans(3) for details.
Wide characters should be used for all I/O processing which may rely on locale-specific strings. The two primary issues requiring special use of wide characters are:
February 21, 2007 | NetBSD 6.1 |