Angelika Langer - Training & Consulting

	HOME


	OVERVIEW BY TOPIC JAVA C++ BY COLUMN EFFECTIVE JAVA EFFECTIVE STDLIB BY MAGAZINE JAVA MAGAZIN JAVA SPEKTRUM JAVA WORLD JAVA SOLUTIONS JAVA PRO C++ REPORT CUJ OTHER
	GENERICS
	LAMBDAS
	IOSTREAMS
	ABOUT
	CONTACT

User-Defined Facets

User-Defined Facets
User-Defined Facets
Extending the Locale Framework

C++ Report, February 1998
Klaus Kreft & Angelika Langer

Locales and facets in the standard C++ library together form an extensible framework for support of internationalization. A short recap: A locale is a class that represents a container of facets; a facet is a class that contains information and provides functionality related to a certain aspect of internationalization. The standard library contains a number of standard facets. They support classification of characters, collation of character sequences, character code conversion, retrieval of message texts from message catalogues, and the parsing and formatting of structured information like numbers, monetary values, date, and time. The locale framework as a whole is designed to be extensible: user-defined facet types can be added. Locales are used by standard iostreams for parsing and formatting of numeric values and for code conversion.

In this article we will show you how user-defined facets can be added to the locale framework. We will demonstrate the technique of building facet classes and their usage in conjunction with the input and output streams of the standard library.

A facet for international address formats

Locales are designed in a way that they can hold any information that varies on cultural conventions. All kinds of examples are conceivable, for instance: conversion of date and time according to time zones, types and sizes of paper sheets and envelopes, parsing and formatting of telephone numbers, conversion of weights and measures, and many more. In this column we will pick an arbitrary example: international address formats. Imagine a program that prints the address labels for an international mail order service. It should be capable to handle the differences between address formats. To illustrate the issue, we'll show you two examples of address formats. Most readers will probably be familiar with the US American way of formatting addresses. Here is the general pattern and an example for addresses in private mail exchange:

<FirstName> <SecondName> <LastName>
<Address1>
<Address2>
<City>, <State> <PostalCode>
[<Country>]
Irene Myer
28 SW 10^TH Street
Eugene, OR 97330
U.S.A.

Now, in Germany addresses have a slightly different format: It is, for instance, not customary to print a person's second name. A country code is placed in front of the zip code separated by a hyphen. States are irrelevant. And so on and so forth. Here is the general pattern and an example of an address in Germany:

<FirstName> <LastName>
<Address1>
<Address2>
<blank line>
[<CountryCode>]-<PostalCode> <City>
Irene Myer
Lindenstraße 5
D-80727 München

Of course, we cannot show you how to build a full-fledged address formatting facet in this article. Instead we will drastically simplify matters. We want to focus on the techniques of building any kind of user-defined facet, of integrating it into the standard locale framework, and of using it with standard iostreams. The address formatting facet is just an example for a generally applicable programming technique.

The address class

We start the implementation by introducing a simple address class. Actually it is a class template, because we want that the address representation is so flexible that it can consist of either wide or narrow character strings. For instance, it shall be capable of representing Japanese addresses that contain Kanji characters.

template<class charT>
class address
{

friend ostream& operator<<(ostream& os, const address<charT>& ad);
typedef basic_string<charT> String;
public:
address(const String& firstname, const String& secname, const String& lastname,
        const String& address1, const String& address2,
        const String& town, const String& zipcode,
        const String& state, const String& country, const String& cntrycode)
: firstname_(firstname), secname_(secname), lastname_(lastname),address1_(address1),
             address2_(address2), town_(town), zipcode_(zipcode), state_(state),
             country_(country), cntrycode_(cntrycode) {}
private:
String firstname_;
String secname_;
String lastname_;
String address1_;
String address2_;
String town_;
String zipcode_;
String state_;
String country_;
String cntrycode_;

};

Listing 1: An address class

The address class contains private data members that hold the various elements of an address. The constructor initializes these elements. An operator<<() , also called inserter , shall print addresses according to a stream's current locale object. It is a friend function of the address class.We will see its implementation later.

The address formatting facet

Now we come to the design and implementation of the address facet. In a previous article (/ 1 /) we described the locale framework and explained that facets must have the following two properties: Facets have to be subclasses of class locale::facet. Additionally, they must contain a facet identification in form of a static data member that is declared as static locale::id id; This identification is used for maintenance and retrieval of facets from a locale and identifies an entire family of facets: All facets with the same identification belong to the same facet family. A locale cannot contain two facets with identical identification. Hence, facets from the same family can only be replacements of each other. New types of facets can be added by either deriving from existing facet types, in which case the facet identification is inherited and the new facet belongs to an already existing facet family, or by defining a new facet class that has a facet identification of its own, in which case a new facet family is introduced.

In our example, address formatting shall be present in a locale additional to other internationalization facilities and is not meant to replace any existing information. Hence, we define a new facet family for address formatting by building a new facet type with an identification of its own.

Following the naming conventions of the standard, we call our address facet address_put because it handles the formatting of addresses. This is in line with the names of the standard facets num_put (formatting of numeric values), money_put (formatting of monetary values), and time_put (formatting of time and date). The formatting operation is a member function called put() .

For the implementation of address_put we follow the design and implementation idioms for formatting facets, that are established in the standard library:

Output iterators . Formatting operations in the standard library, like num_put<charT>::put(), take an iterator to the begin of the output sequence as an argument. This approach allows a flexible solution and fits smoothly into the overall concept of the entire standard library, where iterators are used as generic connectors between independent components. (See sidebar on stream and stream buffer iterators for more information.) In line with this policy we, too, use an output iterator to designate the target location of the formatted address string.

output stream buffer iterator

address_put

Public and virtual protected interface . An idiom used in many places throughout the standard library is delegation to protected virtual member functions. A public member function foo() calls a protected virtual function do_foo(), which does the real work. Derived classes can only overwrrite the protected virtual member function. Hence, the public member functioncan contain certain functionality that must not be changed in derived classes. A typical example for such functionality is the acquisition and release of a mutex lock, which would be needed to allow use of this facet in a multi-threaded environment.

address_put

put(),

do_put(),

Listing 2 shows the implementation of the address_put facet.

template<class charT, class OutIter = ostreambuf_iterator<charT> >
class address_put : public locale::facet
{

typedef basic_string<charT> String;

public:

typedef OutIter iter_type;
static locale::id id;
address_put(size_t refs = 0) : locale::facet(refs) {}
void put(OutIter oi,
         const String& firstname, const String& secname, const String& lastname,
         const String& address1, const String& address2,
         const String& town, const String& zipcode, const String& state,
         const String& country, const String& cntrycode) const
{
do_put(oi, firstname, secondname, lastname,
address1, address2, town, zipcode, country, cntrycode);
}

protected:

virtual void do_put (OutIter oi,
                     const String& firstname, const String& secname, const String& lastname,
                     const String& address1, const String& address2,
                     const String& town, const String& zipcode, const String& state,
                     const String& country, const String& cntrycode) const;
void put_string(OutIter oi, String s) const
{
typename String::iterator si, end;
for (si=s.begin(), end= s.end(); si!=end ; si++, oi++)
*oi = *si;
}

};

Listing 2: The address formatting facet

In Listing 2: The address formatting facet above, you can see the design decisions made so far:

The new facet type is a class derived from locale::facet with an identification of its own.
It's a class template taking the character type and the output iterator type as parameters.
It has a public put() and a protected do_put() function. (The member function put_string() is a helper function that writes strings to an output iterator.)

Design options rejected, for sake of simplicity, were:

The patterns for international address formats could have been encapsulated into an addresspunct facet, similar to a numpunct or moneypunct facet. The "punct" facets in the standard library are used by related formatting and parsing facets for finding rules, pattern, and other information. We decided in favor of an alternative technique and put the knowledge about specific address pattern directly into the respective formatting operations, rather than factoring it out into a separate facet. This technique can be found in the standard library, too. It is demonstrated by the standard time and date facets time_put and time_get , which, unlike num_put/num_get and money_put/money_get, do not rely on a timepunct facet.

Facets for different cultural areas

So far, we've left open how a facet comes to represent the knowledge of a certain cultural area, i.e. what turns our address facet into a German or a US address facet? In the standard library, facets support the concept of locale names . These are strings that specify a cultural area, e.g. De_DE (for Germany) or En_US (for US English). The "byname" facets, like num_put_byname , time_put_byname , etc. take a locale name as argument to their constructor. They have the knowledge to retrieve the respective culture-dependent information from somewhere. "Somewhere" can be a database, or a couple of files, or anything else that a (library) vendor ships to provide the required information. It fully depends on the implementation of the facet. The C++ standard does not impose any requirements with regard to the maintenance of culture-dependent data. Not even the locale names are standardized.

To keep our example focused on extending the standard locale framework rather than the maintenance of culture-dependent data, we are going to use a different solution. Instead of putting all the intelligence into a "byname" facet, which would also force us into dealing with the maintenance of address format patterns in general, we derive an address facet for each specific cultural area from the base class template address_put. Also, we restrict the demonstration to US and German address formatting.

As we do not prefer any particular way of formatting over others, we refrain from defining a default formatting. For this reason we make the base class template an abstract base class by turning its address_put<>::do_put() function into a pure virtual function with no implementation.

The derived class templates US_address_put and German_address_put can be found in Listing 3.

template<class charT, class OutIter = ostreambuf_iterator<charT> >
class address_put : public locale::facet
{
... // as in Listing 2: The address formatting facet
protected:
virtual void do_put (OutIter oi,
const String& firstname, const String& secname, const String& lastname,
const String& address1, const String& address2,
const String& town, const String& zipcode, const String& state,
const String& country, const String& cntrycode) const = 0;
};
template<class charT, class OutIter = ostreambuf_iterator<charT> >
class US_address_put : public address_put<charT, OutIter>
{
public:
US_address_put(size_t refs = 0) : address_put<charT,OutIter>(refs) {}
protected:
void do_put(OutIter oi,
const String& firstname, const String& secname, const String& lastname,
const String& address1, const String& address2,
const String& town, const String& zipcode, const String& state,
const String& country, const String& cntrycode) const
{
String s(firstname);
s.append(" ").append(secname).append(" ").append(lastname)
.append("\n");
s.append(address1).append("\n");
s.append(address2).append("\n");
s.append(town).append(", ").append(state).append(" ").append(zipcode)
.append("\n");
s.append(country).append("\n");
put_string(oi,s);
}
};
template<class charT, class OutIter = ostreambuf_iterator<charT> >
class German_address_put : public address_put<charT, OutIter>
{
public:
German_address_put(size_t refs = 0) : address_put<charT,OutIter>(refs) {}
protected:
void do_put(OutIter oi,
const String& firstname, const String& secname, const String& lastname,
const String& address1, const String& address2,
const String& town, const String& zipcode, const String& state,
const String& country, const String& cntrycode) const
{
String s(firstname);
s.append(" ").append(lastname).append("\n");
s.append(address1).append("\n");
s.append(address2).append("\n");
s.append("\n");
s.append(cntrycode).append("-").append(zipcode).append(" ").append(town)
.append("\n");
put_string(oi,s);
}
};

Listing 3: The US and the German address formatting facets

The core of these address facets is the implementation of the respective do_put() function. do_put() concatenates the address elements to one large address string, according to US and German address formatting rules respectively. The helper function address_put<>::put_string() then writes the formatted string to the output iterator.

The address inserter

Eventually, we are going to implement the already mentioned stream inserter for the address class. Its implementation, shown in Listing 4: The inserter for addresses , is a simplified one, focused on the usage of the newly defined address_put facet.

friend ostream& operator<<(ostream& os, const address<charT>& ad)
{

locale loc = os.getloc();
try
{
const address_put<charT>& apFacet = use_facet<address_put<charT> > (loc);
apFacet.put(os, ad.firstname_, ad.secname_, ad.lastname,
ad.address1_, ad.address2_, ad.town_, ad.zipcode_, ad.state_
ad.country_, ad.cntrycode);
}
catch (bad_cast&)
{
// locale does not contain a address_put facet ;
}
return (os);

}

Listing 4: The inserter for addresses

For culture-sensitive address formatting, the inserter must retrieve the address formatting facet from the stream's current locale. Streams have a member function getloc() that returns the stream's locale object. From that locale the address facet can be retrieved via the template function use_facet<Facet>(), as was explained in / 1 /. Note, that the user-defined address formatting facet address_put is retrieved in the exactly the same way as it would be done for any standard facet.

The inserter then calls the facet's put() function and delegates the actual formatting to it. All the elements of an address are passed as arguments to the put() function. The first argument to put() is expected to be the iterator designating the begin of the output sequence. A stream buffer iterator pointing to the current position of the output stream can be created from a reference to an output stream. (See sidebar on stream and stream buffer iterators for more information.) Hence we pass in the stream itself. The implicit conversion mechanism for function arguments in C++ cares for construction of an output stream buffer iterator.

Equipping locales with address facets

We have seen above how a an address_put facet is retrieved from a locale object. In addition to retrieval, we need to consider ways and means of storing address facets in locale objects in the first place. In /1/ we explained that locales are immutable objects. Facets are stuffed into a locale when the locale object is created and cannot be replaced or added later on. Locale objects are build by composition: You start off with the copy of an existing locale and replace and add facets to create a new locale object.

In our example, we want to equip a "standard" locale, i.e. one that contains all standard facets for a cultural environment, with an additional address formatting facet.

A standard locale can be created by means of the following constructor:

explicit locale(const char* name);

It constructs a locale object, that contains all standard facets for a cultural environment specified by the locale name.

A new locale object containing all facets from an existing locale object, plus an additional new facet, can be composed via the following locale member template constructor:

template<class Facet> locale(const locale& other, Facet* facetPtr); To add an US_address_put facet object to the locale that contains all standard US facets, we have to write: locale usLocaleWithAddressPut(locale("En_US"), new US_address_put<char, osIter>);

Putting the pieces together

Listing 5: Printing an address according to international address formats shows a function that puts all the elements together. It receives an output stream, an address, and a locale name. The output stream is temporarily "imbued" with a locale that has an address facet, before the address is eventually inserted into the stream.

void printAddressWithoutFactory(ostream& os, const address<char>& add, const string locname)
{
locale::facet* addr_put = 0;
if (locname == "En_US")
addr_put = new US_address_put<char>;
else if (locname == "De_DE")
addr_put = new German_address_put<char>;
if (addr_put)
{
locale original = os.imbue(locale(locale(locname.c_str()),addr_put));
os << add <<endl;
os.imbue(original);
}
else
os << add <<endl;

}

Listing 5: Printing an address according to international address formats

Summary

In this column we demonstrated a technique for adding arbitrary, user-defined facets to the locale framework in the standard library and their usage in conjunction with iostreams. The example of choice was an address formatting facet. The technique itself, however, is more general and can be applied to arbitrary facet types. Here is a wrap-up of the essentials: Mandatory. It is required that a user-defined facet type is derived from class locale::facet and has a facet identification in form of a static data member named id of type locale::id .

Recommended . (1) A facet name should follow the naming conventions of the standard facets.

Formatting and parsing operations should access source or destination via iterators. Formatting and parsing facets should be templatized on the iterator type and use stream buffer iterators as a default.
Public member function should delegate to protected member functions.

References

Klaus Kreft & Angelika Langer

Klaus Kreft & Angelika Langer

X/Open Consortium.

Nadine Kano

David Schmitt

Erich Gamma, et. al.

INTERNATIONAL STANDARD

Sidebar: Stream and Stream Buffer Iterators

Iterators in general

I terators are pointer-like objects that allow to traverse a sequence of elements of the same type and to access these elements without any further knowledge of the sequence and the way it is organized. Iterators were introduced into the Standard C++ Library with the adoption of the STL (= the Standard Template Library) as part of the standard. Therefore, iterators are typically used for access to the library’s generic containers, such as list<class T> , map<class T> , and vector<class T> .

Iterator categories

According to their properties, iterator are classified into five iterator categories. Their characteristics in brevity:

Input iterators allow algorithms to advance the iterator and give "read only" access to the value.
Output iterators allow algorithms to advance the iterator and give "write only" access to the value.
Forward iterators combine read and write access, but only in one direction (i.e., forward).
Bi-directional iterators allow algorithms to traverse the sequence in both directions, forward and backward.
Random access iterators allow jumps and "pointer arithmetics".

Each category adds new features to the previous one. The iterator categories obey the following order:

Note that an iterator category is an abstraction. It represents a set of requirements to an iterator.

Iterator ranges

Related to iterators is the notion of iterator ranges. An iterator range is a pair of iterators. The first iterator designates the begin of a sequence; the second iterator points to the element past the end of the sequence. It is important to keep in mind that the end iterator is always pointing to the past-the-end element of a sequence, which need not be a valid, accessible element. Therefore, never dereference the end iterator!

Stream and Stream Buffer Iterators

The principle of accessing collections of elements through iterators was extended to sequences other than the STL containers. Stream iterators and stream buffer iterators are examples of such extensions. They fall into the input and output iterator categories.

Creation of the begin iterator. A stream or stream buffer iterator pointing to the current position of an output or input stream can be created from the stream itself. This is possible because stream and stream buffer iterators have a constructor taking a reference to a stream object.

Creation of the end iterator . An input stream or stream buffer iterator designating the end of an input stream can be created via the default constructor for input stream or stream buffer iterators. Default constructors of iterators by convention always create an iterator pointing to the position one step past the end of the sequence.

Stream iterators allow to see a stream as a sequence of elements of type T , that are extracted from or inserted to a stream while traversing the stream. The stream iterator types in the standard library are istream_iterator and ostream_iterator . Here is a example for using stream iterators. All elements of a vector are read from the input stream cin and written to the output stream cout :

vector<string> names;
istream_iterator<string> begin(cin),end();
for (istream_iterator<string> i = begin; i != end; i++) names.push_back(*i); ostream_iterator<string> out(cout,"\n");
for (vector<string>::iterator i = names.begin(); i != names.end(); i++,out++) *out = *i; Stream buffer iterators see a stream buffer as a sequence of characters and thus allow to traverse a stream buffer character by character. All formatting and parsing facets in the Standard C++ Library access their source or destination directly via stream buffer iterators. The stream buffer iterator types in the standard library are istreambuf_iterator and ostreambuf_iterator.

Formatting operations, like num_put<charT>::put(), take one iterator, the iterator to the begin of the output sequence. Parsing operations like num_get<charT>::get() take an iterator range designating begin and end of the subsequence of characters to be parsed.

If you are interested to hear more about this and related topics you might want to check out the following seminar:

Seminar


	Effective STL Programming - The Standard Template Library in Depth 4-day seminar (open enrollment and on-site)
	IOStreams and Locales - Standard C++ IOStreams and Locales in Depth 5-day seminar (open enrollment and on-site)