User-Defined Facets
User-Defined Facets
Extending the Locale Framework
C++ Report, February 1998
Klaus Kreft & Angelika Langer
Locales and facets in the standard C++ library together form an extensible
framework for support of internationalization. A short recap: A locale
is a class that represents a container of facets; a facet is a class that
contains information and provides functionality related to a certain aspect
of internationalization. The standard library contains a number of standard
facets. They support classification of characters, collation of character
sequences, character code conversion, retrieval of message texts from message
catalogues, and the parsing and formatting of structured information like
numbers, monetary values, date, and time. The locale framework as a whole
is designed to be extensible: user-defined facet types can be added. Locales
are used by standard iostreams for parsing and formatting of numeric values
and for code conversion.
In this article we will show you how user-defined facets can be added
to the locale framework. We will demonstrate the technique of building
facet classes and their usage in conjunction with the input and output
streams of the standard library.
A facet for international address formats
Locales are designed in a way that they can hold any information that varies
on cultural conventions. All kinds of examples are conceivable, for instance:
conversion of date and time according to time zones, types and sizes of
paper sheets and envelopes, parsing and formatting of telephone numbers,
conversion of weights and measures, and many more. In this column we will
pick an arbitrary example: international address formats. Imagine a program
that prints the address labels for an international mail order service.
It should be capable to handle the differences between address formats.
To illustrate the issue, we'll show you two examples of address formats.
Most readers will probably be familiar with the US American way of formatting
addresses. Here is the general pattern and an example for addresses in
private mail exchange:
<FirstName> <SecondName> <LastName>
<Address1>
<Address2>
<City>, <State> <PostalCode>
[<Country>]
Irene Myer
28 SW 10
TH
Street
Eugene, OR 97330
U.S.A.
Now, in Germany addresses have a slightly different format: It is, for
instance, not customary to print a person's second name. A country code
is placed in front of the zip code separated by a hyphen. States are irrelevant.
And so on and so forth. Here is the general pattern and an example of an
address in Germany:
<FirstName> <LastName>
<Address1>
<Address2>
<blank line>
[<CountryCode>]-<PostalCode> <City>
Irene Myer
Lindenstraße 5
D-80727 München
Of course, we cannot show you how to build a full-fledged address formatting
facet in this article. Instead we will drastically simplify matters. We
want to focus on the techniques of building any kind of user-defined facet,
of integrating it into the standard locale framework, and of using it with
standard iostreams. The address formatting facet is just an example for
a generally applicable programming technique.
The address class
We start the implementation by introducing a simple address class. Actually
it is a class template, because we want that the address representation
is so flexible that it can consist of either wide or narrow character strings.
For instance, it shall be capable of representing Japanese addresses that
contain Kanji characters.
template<class
charT>
class address
{
friend ostream&
operator<<(ostream& os, const address<charT>& ad);
typedef basic_string<charT>
String;
public:
address(const String&
firstname, const String& secname, const String& lastname,
const String& address1, const String& address2,
const String& town, const String& zipcode,
const String& state, const String& country, const String& cntrycode)
: firstname_(firstname), secname_(secname),
lastname_(lastname),address1_(address1),
address2_(address2), town_(town), zipcode_(zipcode), state_(state),
country_(country), cntrycode_(cntrycode) {}
private:
String firstname_;
String secname_;
String lastname_;
String address1_;
String address2_;
String town_;
String zipcode_;
String state_;
String country_;
String cntrycode_;
};
|
Listing 1: An address class
The address class contains private data members that hold the various
elements of an address. The constructor initializes these elements. An
operator<<()
,
also called
inserter
, shall print addresses according to a stream's
current locale object. It is a friend function of the address class.We
will see its implementation later.
The address formatting facet
Now we come to the design and implementation of the address facet. In a
previous article (/
1
/) we described the locale framework
and explained that facets must have the following two properties: Facets
have to be subclasses of class
locale::facet.
Additionally, they must contain a
facet identification
in form of
a static data member that is declared as
static
locale::id id;
This identification is used for maintenance and retrieval
of facets from a locale and identifies an entire family of facets: All
facets with the same identification belong to the same facet family. A
locale cannot contain two facets with identical identification. Hence,
facets from the same family can only be replacements of each other. New
types of facets can be added by either deriving from existing facet types,
in which case the facet identification is inherited and the new facet belongs
to an already existing facet family, or by defining a new facet class that
has a facet identification of its own, in which case a new facet family
is introduced.
In our example, address formatting shall be present in a locale additional
to other internationalization facilities and is not meant to replace any
existing information. Hence, we define a new facet family for address formatting
by building a new facet type with an identification of its own.
Following the naming conventions of the standard, we call our address
facet
address_put
because it handles
the formatting of addresses. This is in line with the names of the standard
facets
num_put
(formatting of numeric
values),
money_put
(formatting
of monetary values), and
time_put
(formatting
of time and date). The formatting operation is a member function called
put()
.
For the implementation of
address_put
we follow the design and implementation idioms for formatting facets, that
are established in the standard library:
-
Output iterators
. Formatting operations in the standard library,
like
num_put<charT>::put(),
take an iterator to the begin of the output sequence as an argument. This
approach allows a flexible solution and fits smoothly into the overall
concept of the entire standard library, where iterators are used as generic
connectors between independent components. (See
sidebar
on stream and stream buffer iterators for more information.) In line with
this policy we, too, use an output iterator to designate the target location
of the formatted address string.
In the standard library, the type of this output iterator is a template
argument of the respective facet class template. By default, the output
iterator type is a so-called
output stream buffer iterator
. It allows
direct access to a stream buffer and is a sensible default for use of facets
in iostreams. (See sidebar on stream and stream buffer iterators for more
information.) We adopt this policy for the
address_put
facet and make it a class template taking the output iterator type as a
template argument.
-
Public and virtual protected interface
. An idiom used in many places
throughout the standard library is delegation to protected virtual member
functions. A public member function
foo()
calls a protected virtual function
do_foo(),
which does the real work. Derived classes can only overwrrite the protected
virtual member function. Hence, the public member functioncan contain certain
functionality that must not be changed in derived classes. A typical example
for such functionality is the acquisition and release of a mutex lock,
which would be needed to allow use of this facet in a multi-threaded environment.
In our example, the public interface of the
address_put
class template contains a member function
put(),
which calls a protected virtual function
do_put(),
which does the real work. These functions take the output iterator that
specifies the target location, and all elements that form the address (e.g.
name, city, etc.) as parameters.
Listing 2 shows the implementation of the
address_put
facet.
template<class
charT, class OutIter = ostreambuf_iterator<charT> >
class address_put : public locale::facet
{
typedef basic_string<charT>
String;
public:
typedef OutIter iter_type;
static locale::id id;
address_put(size_t refs = 0)
: locale::facet(refs) {}
void put(OutIter oi,
const String& firstname, const String& secname, const String&
lastname,
const String& address1, const String& address2,
const String& town, const String& zipcode, const String& state,
const String& country, const String& cntrycode) const
{
do_put(oi, firstname,
secondname, lastname,
address1, address2, town, zipcode,
country, cntrycode);
}
protected:
virtual void do_put
(OutIter oi,
const String& firstname, const String& secname, const String&
lastname,
const String& address1, const String& address2,
const String& town, const String& zipcode, const String& state,
const String& country, const String& cntrycode) const;
void put_string(OutIter oi,
String s) const
{
typename String::iterator
si, end;
for (si=s.begin(), end= s.end();
si!=end ; si++, oi++)
*oi = *si;
}
};
|
Listing
2: The address formatting facet
In Listing 2:
The address formatting facet
above, you can see
the design decisions made so far:
-
The new facet type is a class derived from
locale::facet
with an identification of its own.
-
It's a class template taking the character type and the output iterator
type as parameters.
-
It has a public
put()
and a protected
do_put()
function. (The member function
put_string()
is a helper function that writes strings to an output iterator.)
Design options rejected, for sake of simplicity, were:
-
The patterns for international address formats could have been encapsulated
into an
addresspunct
facet, similar
to a
numpunct
or
moneypunct
facet. The "punct" facets in the standard library are used by related formatting
and parsing facets for finding rules, pattern, and other information. We
decided in favor of an alternative technique and put the knowledge about
specific address pattern directly into the respective formatting operations,
rather than factoring it out into a separate facet. This technique can
be found in the standard library, too. It is demonstrated by the standard
time and date facets
time_put
and
time_get
,
which, unlike
num_put/num_get
and
money_put/money_get,
do not rely on a
timepunct
facet.
Facets for different cultural areas
So far, we've left open how a facet comes to represent the knowledge
of a certain cultural area, i.e. what turns our address facet into a German
or a US address facet? In the standard library, facets support the concept
of
locale names
. These are strings that specify a cultural area,
e.g. De_DE (for Germany) or En_US (for US English). The "byname" facets,
like
num_put_byname
,
time_put_byname
,
etc. take a locale name as argument to their constructor. They have the
knowledge to retrieve the respective culture-dependent information from
somewhere. "Somewhere" can be a database, or a couple of files, or anything
else that a (library) vendor ships to provide the required information.
It fully depends on the implementation of the facet. The C++ standard does
not impose any requirements with regard to the maintenance of culture-dependent
data. Not even the locale names are standardized.
To keep our example focused on extending the standard locale framework
rather than the maintenance of culture-dependent data, we are going to
use a different solution. Instead of putting all the intelligence into
a "byname" facet, which would also force us into dealing with the maintenance
of address format patterns in general, we derive an address facet for each
specific cultural area from the base class template
address_put.
Also, we restrict the demonstration to US and German
address formatting.
As we do not prefer any particular way of formatting over others, we
refrain from defining a default formatting. For this reason we make the
base class template an abstract base class by turning its
address_put<>::do_put()
function into a pure virtual function with no implementation.
The derived class templates
US_address_put
and
German_address_put
can be found
in Listing 3.
template<class
charT, class OutIter = ostreambuf_iterator<charT> >
class address_put : public locale::facet
{
... // as in Listing 2:
The address formatting
facet
protected:
virtual void do_put
(OutIter oi,
const String& firstname,
const String& secname, const String& lastname,
const String& address1,
const String& address2,
const String& town, const
String& zipcode, const String& state,
const String& country, const
String& cntrycode) const = 0;
};
template<class charT, class
OutIter = ostreambuf_iterator<charT> >
class US_address_put : public
address_put<charT, OutIter>
{
public:
US_address_put(size_t
refs = 0) : address_put<charT,OutIter>(refs) {}
protected:
void do_put(OutIter
oi,
const String& firstname,
const String& secname, const String& lastname,
const String& address1,
const String& address2,
const String& town, const
String& zipcode, const String& state,
const String& country, const
String& cntrycode) const
{
String s(firstname);
s.append(" ").append(secname).append("
").append(lastname)
.append("\n");
s.append(address1).append("\n");
s.append(address2).append("\n");
s.append(town).append(", ").append(state).append("
").append(zipcode)
.append("\n");
s.append(country).append("\n");
put_string(oi,s);
}
};
template<class charT, class
OutIter = ostreambuf_iterator<charT> >
class German_address_put : public
address_put<charT, OutIter>
{
public:
German_address_put(size_t
refs = 0) : address_put<charT,OutIter>(refs) {}
protected:
void do_put(OutIter
oi,
const String& firstname,
const String& secname, const String& lastname,
const String& address1,
const String& address2,
const String& town, const
String& zipcode, const String& state,
const String& country, const
String& cntrycode) const
{
String s(firstname);
s.append(" ").append(lastname).append("\n");
s.append(address1).append("\n");
s.append(address2).append("\n");
s.append("\n");
s.append(cntrycode).append("-").append(zipcode).append("
").append(town)
.append("\n");
put_string(oi,s);
}
};
|
Listing 3: The US and the
German address formatting facets
The core of these address facets is the implementation of the respective
do_put()
function.
do_put()
concatenates
the address elements to one large address string, according to US and German
address formatting rules respectively. The helper function
address_put<>::put_string()
then
writes the formatted string to the output iterator.
The address inserter
Eventually, we are going to implement the already mentioned stream inserter
for the address class. Its implementation, shown in Listing 4:
The inserter
for addresses
, is a simplified one, focused on the usage of the newly
defined
address_put
facet.
friend ostream&
operator<<(ostream& os, const address<charT>& ad)
{
locale loc =
os.getloc();
try
{
const address_put<charT>&
apFacet = use_facet<address_put<charT> > (loc);
apFacet.put(os, ad.firstname_,
ad.secname_, ad.lastname,
ad.address1_, ad.address2_,
ad.town_, ad.zipcode_, ad.state_
ad.country_, ad.cntrycode);
}
catch (bad_cast&)
{
// locale does
not contain a address_put facet ;
}
return (os);
}
|
Listing 4: The inserter
for addresses
For culture-sensitive address formatting, the inserter must retrieve
the address formatting facet from the stream's current locale. Streams
have a member function
getloc()
that returns the stream's locale object. From that locale the address facet
can be retrieved via the template function
use_facet<Facet>(),
as was explained in /
1
/. Note, that the user-defined address
formatting facet
address_put
is
retrieved in the exactly the same way as it would be done for any standard
facet.
The inserter then calls the facet's
put()
function and delegates the actual formatting to it. All the elements of
an address are passed as arguments to the
put()
function. The first argument to
put()
is expected to be the iterator designating the begin of the output sequence.
A stream buffer iterator pointing to the current position of the output
stream can be created from a reference to an output stream. (See sidebar
on stream and stream buffer iterators for more information.) Hence we pass
in the stream itself. The implicit conversion mechanism for function arguments
in C++ cares for construction of an output stream buffer iterator.
Equipping locales with address facets
We have seen above how a an
address_put
facet is retrieved from a locale object. In addition to retrieval, we need
to consider ways and means of storing address facets in locale objects
in the first place. In /1/ we explained that locales are immutable objects.
Facets are stuffed into a locale when the locale object is created and
cannot be replaced or added later on. Locale objects are build by composition:
You start off with the copy of an existing locale and replace and add facets
to create a new locale object.
In our example, we want to equip a "standard" locale, i.e. one that
contains all standard facets for a cultural environment, with an additional
address formatting facet.
A standard locale can be created by means of the following constructor:
explicit locale(const char* name);
It constructs a locale object, that contains all standard facets
for a cultural environment specified by the locale name.
A new locale object containing all facets from an existing locale object,
plus an additional new facet, can be composed via the following locale
member template constructor:
template<class Facet> locale(const locale& other, Facet*
facetPtr);
To add an
US_address_put
facet
object to the locale that contains all standard US facets, we have to write:
locale usLocaleWithAddressPut(locale("En_US"), new US_address_put<char,
osIter>);
Putting the pieces together
Listing 5:
Printing an address according to international address
formats
shows a function that puts all the elements together. It receives
an output stream, an address, and a locale name. The output stream is temporarily
"imbued" with a locale that has an address facet, before the address is
eventually inserted into the stream.
void printAddressWithoutFactory(ostream&
os, const address<char>& add, const string locname)
{
locale::facet*
addr_put = 0;
if (locname == "En_US")
addr_put = new
US_address_put<char>;
else if (locname == "De_DE")
addr_put = new
German_address_put<char>;
if (addr_put)
{
locale original
= os.imbue(locale(locale(locname.c_str()),addr_put));
os << add <<endl;
os.imbue(original);
}
else
os <<
add <<endl;
}
|
Listing 5:
Printing
an address according to international address formats
Summary
In this column we demonstrated a technique for adding arbitrary, user-defined
facets to the locale framework in the standard library and their usage
in conjunction with iostreams. The example of choice was an address formatting
facet. The technique itself, however, is more general and can be applied
to arbitrary facet types. Here is a wrap-up of the essentials:
Mandatory.
It is required that a user-defined facet type is
derived from class
locale::facet
and has a facet identification in form of a static data member named
id
of type
locale::id
.
Recommended
. (1) A facet name should follow the naming conventions
of the standard facets.
-
Formatting and parsing operations should access source or destination via
iterators. Formatting and parsing facets should be templatized on the iterator
type and use stream buffer iterators as a default.
-
Public member function should delegate to protected member functions.
References
-
Klaus Kreft & Angelika Langer
The Locale Framework
C++ Report, September 1997
-
Klaus Kreft & Angelika Langer
Standard C++ Locale II - The Standard Facets
C++ Report, November 1997
-
X/Open Consortium.
X/Open Guide: Internationalization Guide
-
Nadine Kano
Developing International Software for Windows 95 and Windows NT
Microsoft Press
-
David Schmitt
International Programming for Windows
Microsoft Press , April 2000
-
Erich Gamma, et. al.
Design Patters: elements of reusable object-oriented software
Addison-Wesley
-
INTERNATIONAL STANDARD
Programming languages - C++
ISO/IEC IS 14882:1998(E)
Sidebar: Stream
and Stream Buffer Iterators
|
Iterators in general
I
terators are pointer-like objects that allow to traverse a sequence
of elements of the same type and to access these elements without any further
knowledge of the sequence and the way it is organized. Iterators were introduced
into the Standard C++ Library with the adoption of the STL (= the Standard
Template Library) as part of the standard. Therefore, iterators are typically
used for access to the library’s generic containers, such as
list<class
T>
,
map<class T>
, and
vector<class
T>
.
Iterator categories
According to their properties, iterator are classified into five iterator
categories. Their characteristics in brevity:
-
Input iterators
allow algorithms to advance the iterator and give
"read only" access to the value.
-
Output iterators
allow algorithms to advance the iterator and give
"write only" access to the value.
-
Forward iterators
combine read and write access, but only in one
direction (i.e., forward).
-
Bi-directional iterators
allow algorithms to traverse the sequence
in both directions, forward and backward.
-
Random access iterators
allow jumps and "pointer arithmetics".
Each category adds new features to the previous one. The iterator categories
obey the following order:
Note that an iterator category is an abstraction. It represents a set
of requirements to an iterator.
Iterator ranges
Related to iterators is the notion of iterator ranges. An iterator range
is a pair of iterators. The first iterator designates the begin of a sequence;
the second iterator points to the element past the end of the sequence.
It is important to keep in mind that the end iterator is always pointing
to the past-the-end element of a sequence, which need not be a valid, accessible
element. Therefore, never dereference the end iterator! |
Stream and Stream Buffer Iterators
The principle of accessing collections of elements through iterators
was extended to sequences other than the STL containers. Stream iterators
and stream buffer iterators are examples of such extensions. They fall
into the input and output iterator categories.
Creation of the begin iterator.
A stream or stream buffer iterator
pointing to the current position of an output or input stream can be created
from the stream itself. This is possible because stream and stream buffer
iterators have a constructor taking a reference to a stream object.
Creation of the end iterator
. An input stream or stream buffer
iterator designating the end of an input stream can be created via the
default constructor for input stream or stream buffer iterators. Default
constructors of iterators by convention always create an iterator pointing
to the position one step past the end of the sequence.
Stream iterators allow to see a stream as a sequence of elements of
type
T
, that are extracted from
or inserted to a stream while traversing the stream. The stream iterator
types in the standard library are
istream_iterator
and
ostream_iterator
.
Here is a example for using stream iterators. All elements of a vector
are read from the input stream
cin
and written to the output stream
cout
:
vector<string> names;
istream_iterator<string>
begin(cin),end();
for (istream_iterator<string>
i = begin; i != end; i++)
names.push_back(*i);
ostream_iterator<string>
out(cout,"\n");
for (vector<string>::iterator
i = names.begin(); i != names.end(); i++,out++)
*out = *i;
Stream buffer iterators see a stream buffer as a sequence of characters
and thus allow to traverse a stream buffer character by character. All
formatting and parsing facets in the Standard C++ Library access their
source or destination directly via stream buffer iterators. The stream
buffer iterator types in the standard library are
istreambuf_iterator
and
ostreambuf_iterator.
Formatting operations, like
num_put<charT>::put(),
take one iterator, the iterator to the begin of the output sequence. Parsing
operations like
num_get<charT>::get()
take an iterator range designating begin and end of the subsequence of
characters to be parsed. |
If you are interested to hear more about this
and related topics you might want to check out the following seminar:
|
Seminar
|
|