|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Internationalization Using Standard C++
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Internationalization Using Standard C++
C/C++
User Journal, September 1997
Computer users all over the world prefer to interact with their systems using their own language and cultural conventions. Cultural differences affect for instance the display of monetary values, of date and time. Just think of the way numeric values are formatted in different cultures: 1,000,000.00 in the US is 1.000.000,00 in Germany and 10,00,000.00 in Nepal. If you aim for high international acceptance of your products you must build into your software the flexibility to adapt to varying requirements that stem from cultural differences. Building into software the potential for worldwide use is called internationalization . It is one of the challenges of software development in these days. Traditionally, internationalization was achieved by means of C. Standards like POSIX and X/Open define locales and wide character input and output for standard C. Windows 95 and Windows NT have a C interface, too, the Win32 NLSAPI. None of the Win32 NLSAPI interfaces matches any of the standard C interfaces though, and locales are thread-specific in Windows whereas they are provided per process in Unix. These are important differences. The concept and level of support, however, is equivalent. There is a common notion of locales, and the services provided cover almost the same range of i18n problems. Naturally, C++ cannot stand back. The ISO/ANSI C++ standard defines an extensible framework that facilitates internationalization of C++ programs in a portable manner. Its main elements are locales and facets . This article gives an overview of the locale framework and the standard facets defined by ISO/ANSI C++.
OverviewA Recap - C LocalesThe C++ Locales C Locales vs. C++ Locales Relationship between C Locale and C++ Locales Using C++ Locales and Facets Locales and IOStreams Summary
Code
Examples
A Recap - C LocalesAs a software developer and reader of C++ Users Journal, you may already have some background in the C programming language, and the internationalization services provided by the ANSI C library. For this reason, let us start with a short recap of the internationalization services provided by the C library, and then build on existing knowledge to describe the C++ locales in terms of the C locale.
Internationalization requires that developers consciously
design and
implement their software and avoid hard-coding information or rules
that
can be localized. For example, careful developers never assume specific
conventions for formatting numeric or monetary values, or for
displaying
date and time, not even for comparing or sorting strings. For
internationalization,
all culture and language dependencies need to be represented in a kind
of language table. Such a table is called a
locale
.
A locale in
the C library contains support for the several problem domains. The
information
in a C locale is composed of
categories
. Each of
the categories
represents a set of related information:
The C++ LocalesIn C++, internationalization semantics are broken out into separate classes, so-called facets . Each facet offers a set of internationalization services. For instance, the formatting of monetary values is encapsulated in the money_put<> facet. (Don't get distracted by the template parenthesis; they are added because all facets are class templates.) Facets may also represent a set of information about certain culture and language dependencies. The rules and symbols for monetary information are an example; they are contained in a facet called moneypunct<> .In C++, there is also a class called locale . Different from a C locale, which is a global data structure representing various culture and language dependencies, the C++ class locale is an abstraction that manages facets. Basically, you can think of a C++ locale as a container of facets. This concept is illustrated graphically below: The Standard FacetsThe C++ standard defines a number of standard facets . They provide services and information similar to those contained in the C library. As we have seen, the C locale is composed of six categories of information. Similarly, there are six groups of standard facets. Here is a brief overview:
The facet numpunct<charT> specifies numeric formats and punctuation. It provides functions like decimal_point(),thousands_sep(), etc. The facet moneypunct <charT, bool International> handles monetary formats and punctuation like the facet numpunct<charT> handles numeric formats and punctuation. It comes with functions like curr_symbol(), etc. C Locale vs. C++ LocalesApparently, the C locale and the C++ locale along with the standard facets offer similar services. However, the semantics of the C++ locale are different from the semantics of the C locale:
Let's discuss an application that works with multiple locales. Say, the application runs at a US company that ships products worldwide. Our application's responsibility is printing of invoices to be sent to customers all over the world. Of course, the invoices need to be printed in the customer's native language. Say, the application reads input (the product price list) in US English, and writes output (the invoice) in the customer's native language, say German. Since there is only one global locale in C that affects both input and output, the global locale must change between input and output operations. Here is the C code that corresponds to the previous example: float price;Using C++ locale objects dramatically simplifies the task of using multiple locales. The iostreams in the Standard C++ Library are internationalized so that streams can be imbued with separate locale objects. For example, the input stream can be imbued with an English locale object, and the output stream can be imbued with a German locale object. In this way, switching locales becomes unnecessary. Here is the C++ code corresponding to the previous example: priceFile.imbue(locale("En_US”));With these toy examples given above switching locales might look like a minor inconvenience. However, consider the need for multiple locales in an application with multiple threads of execution. Because all threads share one global locale in C, access to the global locale must be serialized by means of mutual exclusion. A lot of locking would occur and mostly slow down the program. Ideally, you would want to have locales be completely independent of each other. Each component shall have a locale of its own, that is unrelated to other locales in your program. This is what you have in C++. You can create infinitely many, independent, light-weight locale objects that you can attach to streams, and exchange between components, or pass around as function arguments for instance. Relationship between the C Locale and the C++ Locale. The C locale and the C++ locales are mostly unrelated. There is only one occasion when they effect each other: making a C++ locale global. The matter is that there is a global locale in C++, as there is in C. You can make a given locale object global by calling locale::global() . The notion of a global C++ locale was added for all those users who do not want to bother with internationalization and rely on internationalized components to pick a sensible default locale. The global C++ locale is often used as the default locale. IOStreams, for instance, uses it; if you do not explicitly imbue your streams with any particular locale object, a snapshot of the global locale is used. Making a C++ locale object global via locale::global() affects the global C locale in that it results in a call to setlocale() . When this happens, locale-sensitive C functions called from within a C++ program will use the global C++ locale. Conversely, there is no way to affect the C++ locale from within a C program though.
Using C++ Locales and FacetsAfter this brief overview of C++ locales and facets let us now explore how they are used. Remember, a locale in C++ is a container of facets, and a facet is a set of internationalization services and information. The general pattern of usage is:
Creating LocalesClass locale has numerous constructors; see Box 2 for a comprehensive list. Basically they fall into three categories:
class locale {The following example uses the first constructor and shows how you can construct a locale object as a copy of the classic locale object with the classic numeric facets replaced by the numeric facet objects taken from a German locale object. locale loc ( locale::classic(), locale("De_DE”), LC_NUMERIC );The classic locale is created via locale::classic(), the German locale is crated via locale("De_DE").LC_NUMERIC is a locale category. As mentioned earlier, the facets fall into categories, and the LC_NUMERIC is the category that designates all numeric facets in a locale. Note that some of the constructors are member templates, which is a language feature that is relatively new to the language and not supported by all compilers. Immutability of Locales . It's important to understand that locales are immutable objects: once a locale object is created, it cannot be modified, i.e. no facets can be replaced after construction. This makes locales reliable and easy to use and you can safely pass them around between components. Copying locales. Copying a locale object is a cheap operation. You should have no hesitation about passing locale objects around by value. You may copy locale objects for composing new locale objects; you may pass copies of locale objects as arguments to functions, etc. Locales are implemented using reference counting and the handle-body-idiom: When a locale object is copied, only its handle is duplicated, a fast and inexpensive action. The following figure gives an overview of the locale architecture. A locale is a handle to a body that maintains a sequence of pointers to facets. The facets are reference-counted, too.
Accessing a Locale's FacetsA ccess to the facet objects of a locale object is via two template functions, use_facet and has_facet :template <class Facet> const Facet& use_facet(const locale&);The function use_facet is the one that gives access to a facet by providing a constant reference to a facet. The function has_facet is for checking whether a certain facet is present in a given locale. The requested facet is specified via its type. Note, that both functions are template functions. The template parameter they take is the type of the facet they try to access in a locale. In other words, these function are capable of deciding which facet object is meant from just the information about the facet's type. It works because a locale contains at most one exemplar of a certain facet type. This kind of compile-time dispatch is a novel technique in C++. A discussion of it and the design of the locale framework's architecture is beyond the scope of this article. A detailed description can be found in C++ Report, September 1997, "The Locale Framework" by Klaus Kreft & Angelika Langer. The code below demonstrates how these functions are used to get access to a facet and invoke an internationalization service. It is an example of the conversion service tolower() from the ctype facet; all upper case letters of a string read from the standard input stream are converted to lower case letters and are written to the standard output stream. string in;The function template use_facet< ctype<char> >() returns a constant reference to the locale's facet object. Then the facet object's member function tolower() is called. It has the functionality of the C function tolower() ; it converts all upper case letters into lower case letters. A couple of further comments on this example: Explicit Template Argument Specification. The syntax of the call use_facet < ctype<char> > (locale::locale()) might look surprising to you. It is an example of explicit template argument specification, a language feature that is relatively new to C++. Template arguments of a function instantiated from a function template can either be explicitly specified in a call or be deduced from the function arguments. The explicit template argument specification is needed in the call to use_facet above, because the compiler can only deduce a template argument if it is the type of one of the function arguments. Storing references to facets. Note, that we do not store the reference to the facet, but just use the temporary reference returned by use_facet for immediately calling the desired member function of that facet. This is a safe way of using facets retrieved from a locale. If you kept the reference, you needed to keep track of the object's lifetime and validity. The facet reference does stay valid throughout the lifetime of the locale object it was retrieved from. Moreover, the facet referred to does not even change in any way; it is immutable. However, when the locale goes out of scope, the references obtained from it might become invalid. For this reason it is advisable to combine retrieval and invocation as shown in the example above, unless you have a need for doing differently. Need for has_facet . Note also, that we did not call has_facet< ctype<char> >() in order to check whether the locale has a ctype facet. In most situations, you do not have to check for the presence of a standard facet object like ctype<char> . This is because locale objects are created by composition; you start with the classic locale or a locale object constructed "by name" from a C locale's external representation. Because you can only add or replace facet objects in a locale object, you cannot compose a locale that misses one of the standard facets. A call to has_facet() is useful, however, when you expect that a certain non-standard facet object should be present in a locale object.
Locales and IOStreamsThe standard iostreams are an example of an internationalized component that uses locales and facets. This feature of iostreams enables you to implement locale-sensitive standard i/o operations for your user-defined types. Each stream has a locale object attached. Attaching a locale to a stream is done via the stream's imbue() operation. If you do not explicitly imbue a locale the stream uses a snapshot of the current global locale as a default.Here is an example that demonstrates how one can use a stream's locale for printing a date. Let us assume we have a date object of type tm , which is the time structure defined in the standard C library, and we want to print it. Let's assume our program is supposed to run in a German-speaking canton of Switzerland. Hence, we attach a Swiss locale to the standard output stream. When we print the date we expect an output like: 1. September 1989 or 01.09.89 struct tm aDate;As there is no operator<<() defined in the Standard C++ Library for the time structure tm from the C library, we have to provide this inserter ourselves. The following code suggests a way this can be done. To keep it simple, the handling of exceptions thrown during the formatting is omitted. template<class Ostream >} There's a lot going on here. Let's discuss the interface of the shift operator first. The code above shows a typical stream inserter. As function arguments it takes a reference to an output stream and a constant reference to the object to be printed. It returns a reference to the same stream. The inserter is a template function because the standard iostreams are templates; they take a character type and an associated traits type describing the character type as template arguments. Naturally, we have the same template parameters for our date inserter. Now, we need to get hold of the stream's locale object, because we want to use its time formatting facet for output of our date object. As you can see in the code above, the stream's locale object is obtained via the stream's member function getloc() . We retrieve the time formatting facet from the locale via use_facet ; that's an old hat meanwhile. We then call the facet's member function put(). The put() function does all the magic, i.e. it produces a character sequence that represents the equivalent of the date object, formatted according to culture-dependent rules and information. It then inserts the formatted output into the stream via an output iterator. Before we delve into the details of the put() function let us take a look at its return value. The put() function returns an output iterator that points to the position immediately after the last inserted character. The output iterator used here is an output stream buffer iterator. These are special purpose iterators contained in the standard C++ library that bypass the stream's formatting layer and write directly to the output stream's underlying stream buffer. Output stream buffer iterators have a member function failed() for error indication. So we can check for errors happening during the time formatting. If there was an error, we set the stream's state accordingly which is done via the stream's setstate() function. Let's return to the facet's formatting service put() and see what arguments it takes. Here is the function's interface:
Here is the actual call: nextpos = fac.put(os,os,os.fill(),&date,'x');Now let's see what the arguments mean:
Summary
AcknowledgementsThis article is based on material we put together for a book on "Standard C++ IOStreams and Locales" to be published by Addison-Wesley-Longman in 1998. Part of the article was inspired by work Angelika Langer did for Rogue Wave Software, Inc. in 1996. We also want to thank Nathan Myers, who initially proposed locales and facets to the standards committee. He patiently answered countless questions during the past months.
Code ExamplesThe subsequent code examples are compiled with MVC V4.0 and the locale component that comes with it. This locale component does only partly comply to the standard. Workarounds are marked as such.Example 1: Multiple locales in C and C++MVC does not support namespaces, hence we had to comment out the otherwise necessary using statement. The locale names used are those supported on Windows 95 and Windows NT. Also, we read input from standard input rather than a price list file, as was suggested in the example.#include <stdio.h> Example 2: Class locale as defined in the C++ standardnamespace std { Example 3: Accessing a Locale's FacetMVC does not support namespaces, hence we had to comment out the otherwise necessary using statement.MVC does not support explicit template argument specification. For this reason the standard interface of function templates like use_facet and has_facet cannot be implemented. The library that comes with MVC 4.0 offers a workaround, which is shown below. #include <iostr eam> Example 4: A locale-sensitive date inserterMVC does not support namespaces, hence we had to comment out the otherwise necessary using statement.MVC does not support explicit template argument specification. For this reason the standard interface of function templates like use_facet and has_facet cannot be implemented. The library that comes with MVC 4.0 offers a workaround, which is shown below. MVC does not support default template arguments. For reasons we simplified the example and implemented the inserter only for tiny character streams of type ostream , instead of providing an inserter template, which would be natural. MVC does not support the standard interface of the time_put facet's put() function; we had to omit one of the arguments. #include <iostream>
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
© Copyright 1995-2005 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/Cuj/Internationalization/I18N.html> last update: 16 Aug 2005 |