next up previous contents index
Next: Locale control Up: Character case mapping Previous: The tolower function

The toupper function

4 4 Localization<locale.h> <locale.h>

  C has become an international language. Users of the language outside the United States have been forced to deal with the various Americanisms built into the standard library routines.

Areas affected by international considerations include:

Alphabet.
The English language uses 26 letters derived from the Latin alphabet. This set of letters suffices for English, Swahili, and Hawaiian; all other living languages use either the Latin alphabet plus other characters, or other, non-Latin alphabets or syllabaries.

In English, each letter has an upper-case and lower-case form. The German ``sharp S'', ß, occurs only in lower-case. European French usually omits diacriticals on upper-case letters. Some languages do not have the concept of two cases.

Collation.
In both EBCDIC and ASCII the code for `z' is greater than the code for `a', and so on for other letters in the alphabet, so a ``machine sort'' gives not unreasonable results for ordering strings. In contrast, most European languages use a codeset resembling ASCII in which some of the codes used in ASCII for punctuation characters are used for alphabetic characters. (See .2.1.) The ordering of these codes is not alphabetic. In some languages letters with diacritics sort as separate letters; in others they should be collated just as the unmarked form. In Spanish, ``ll'' sorts as a single letter following ``l''; in German, ``ß'' sorts like ``ss''.

Formatting of numbers and currency amounts.
In the United States the period is invariably used for the decimal point; this usage was built into the definitions of such functions as printf and scanf. Prevalent practice in several major European countries is to use a comma; a raised dot is employed in some locales. Similarly, in the United States a comma is used to separate groups of three digits to the left of the decimal point; a period is common in Europe, and in some countries digits are not grouped by threes. In printing currency amounts, the currency symbol (which may be more than one character) may precede, follow, or be embedded in the digits.

Date and time.
The standard function asctime returns a string which includes abbreviations for month and weekday names, and returns the various elements in a format which might be considered unusual even in its country of origin.

Various common date formats include

 
	1776-07-04 			ISO Format

4.7.76

7/4/76 customary U.S. usage

4.VII.76 Italian usage

76186 Julian date (YYDDD)

04JUL76 airline usage

Thursday, July 4, 1776 full U.S. format

Donnerstag, 4. Juli 1776 full German format

Time formats are also quite diverse:

 
	3:30 PM 			customary U.S. and British format

1530 U.S. military format

15h.30 Italian usage

15.30 German usage

15:30 common European usage

The Committee has introduced mechanisms into the C library to allow these and other issues to be treated in the appropriate locale-specific manner.

The localization features of the Standard are based on these principles:

English for C source.
The C language proper is based on English. Keywords are based on English words. A program which uses ``national characters'' in identifiers is not strictly conforming. (Use of national characters in comments is strictly conforming, though what happens when such a program is printed in a different locale is unspecified.) The decimal point must be a period in C source, and no thousands delimiter may be used.
Runtime selectability.
The locale must be selectable at runtime, from an implementation-defined set of possibilities. Translate-time selection does not offer sufficient flexibility. Software vendors do not want to supply different object forms of their programs in different locales. Users do not want to use different versions of a program just because they deal with several different locales.
Function interface.
Locale is changed by calling a function, thus allowing the implementation to recognize the change, rather than by, say, changing a memory location that contains the decimal point character.
Immediate effect.
When a new locale is selected, affected functions reflect the change immediately. (This is not meant to imply if a signal-handling function were to change the selected locale and return to a library function, that the return value from that library function must be completely correct with respect to the new locale.)

4 4 1


next up previous contents index
Next: Locale control Up: Character case mapping Previous: The tolower function



Pete Jinks
Fri Jan 19 12:31:56 GMT 1996