|
|||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | |||||||||||||||||||
|
Unary Predicates in the STL
|
||||||||||||||||||
Several algorithms in the standard library use a unary predicate to
accomplish their tasks. Examples are the
_if
algorithms like
count_if
,
find_if
,
remove_if
,
or
replace_if
, but also algorithms like
partition
. In this
installment of the column, we take a closer look at unary predicates and
what they may and must not do.
Let us see how the Standard defines a unary predicate. It is simply
called
predicate
in the Standard
[1]
.
The Predicate parameter is used whenever an algorithm expects a function object that when applied to the result of dereferencing the corresponding iterator returns a value testable as true. In other words, if an algorithm takes Predicate pred as its argument and first as its iterator argument, it should work correctly in the construct (pred(*first)){...} . The function object pred shall not apply any non-constant function through the dereferenced iterator.
This function object may be a pointer to function, or an object of a
type with an appropriate function call operator.
Side-Effect Properties
Other Properties Basic Properties 1, 2, and 3A unary predicate must be callable, but need not necessarily be copyable, and must accept one argument and return a Boolean value.
These properties are more or less evident when we consider how the standard
defines use of a unary predicate by an algorithm, namely that the predicate
"should work correctly in the construct
if (pred(*first)){...}
".
Here is a typical implementation of an algorithm that demonstrates the
intended use of a unary predicate:
template <class InputIterator, class Predicate> typename iterator_traits<InputIterator>::difference_type count_if(InputIterator first, InputIterator last, Predicate pred) { typename iterator_traits<InputIterator>::difference_type n = 0; for ( ; first != last; ++first) if ( pred(*first) ) ++n; return n; }In other words, a unary predicate is called like a function. The requirement of "callable" is met by a pointer to a function, but also by an object (or a reference to an object) of a type that has the function call operator defined (a so-called functor ). One argument is supplied when the predicate is called. The argument is the result of dereferencing an iterator, that is, a reference to a sequence element. The return value is used as a conditional expression and must be convertible to type bool . And this fully describes the purpose of a unary predicate: it is called to produce a Boolean result for elements in a sequence. In particular, there is no requirement regarding the copy semantics of a unary predicate. It need not even be copyable at all. As a general rule, an algorithm must not rely on any properties that are not required of the objects it uses. This includes that an algorithm must not copy the predicate, because a user is not required to provide any reasonable copy semantics for his predicates. It's just fine if we declare copy constructor and copy assignment as private members of our predicate type and pass around our predicate objects by reference, if that makes sense to us. It should not break any algorithm. In practice, you will find libraries that assume copyability of predicates, although they shouldn't. One of the surprising effects of such standard library implementations have been discussed in an article in C++ Report [2] . Meanwhile some library implementations have eliminated this restriction and behave as expected; for instance, see Metrowerks CodeWarrior 6.0. Taking into account the different library implementations, we would say that unary predicates with "interesting" or no copy semantics are best avoided.
In practice, most predicates have normal copy semantics. This is because
usually we invoke an algorithm in a way that the predicate is passed by
value. To do this, the predicate must be copyable. Non-copyable predicates
can be useful, but they are unusual, because they must be passed by reference,
which takes extra care and requires interesting-looking template syntax.
We discussed some of the related issues like pass-by-reference of function
objects in a previous article, how we can do it, and why we might want
to do it
[3]
. We'll spare you any further discussion of
the matter right here. Instead let’s move on to the remaining predicate
properties, which address the issue of side effects produced by a unary
predicate.
Side-Effect Properties 4, 5, and 6A unary predicate can have any side effects except modification of its argument and invalidation of iterators or sequences that the algorithm works with.
The Standard prohibits certain side effects, but allows others. Why?
To understand this requirement, consider what happens inside an algorithm
that uses a unary predicate. There are two side-effect-producing entities:
the algorithm itself and the unary predicate. The algorithm iterates over
the sequence of input elements, inspects the elements, supplies them as
an argument to the predicate, modifies and copies them, and might produce
other side effects. The unary predicate receives a reference to an element
in the input sequence and might also inspect and modify the element and
produce other side effects. Naturally, these activities might collide.
For this reason, let us take a look at the side effects produced by a unary
predicate and classify them regarding their potential for conflict into
harmful, potentially harmful, and harmless side effects.
Harmful Side EffectsHarmful side effects invalidate any iterator or sequence that the algorithm operates on. (No function object should ever produce a harmful side effect. This is a general rule that applies to all function objects, not just to predicates. The Standard does not even explicitly prohibit these side-effects, probably because they are considered "common sense.")An example of a harmful predicate would be a predicate that internally keeps a pointer or reference to the sequence that the algorithm operates on and uses the reference for erasing sequence elements. The removal of elements might invalidate iterators that were supplied to the algorithm in order to describe the input or output sequence, and under these circumstance, the algorithm is likely to produce a program crash.
Removal of sequence elements is a fairly obvious source of problems,
but at times the invalidation of the sequence is less obvious. If for instance
the algorithms relies on the sorting order of the sequence and the predicate
intentionally or inadvertently modifies the sorting order when it is invoked,
then this would lead to unpredictable results.
In any case, predicates with harmful side effects must be avoided under
all circumstances. As a rule, never use function objects that invalidate
any iterator or sequence that the algorithm operates on.
Potentially Harmful Side EffectsThis category of side effects is explicitly prohibited by the Standard. All unary predicates that apply non-constant functionality to their arguments fall into this category, because they modify sequence elements. Let us refer to such predicates as mutating unary predicates. (Note the difference between "modifying the sequence" (harmful) and "modifying the sequence elements" (only potentially harmful): "modifying the sequence" means that elements are inserted or removed or shuffled around so that certain iterators or iterator ranges become invalid. "Modifying the sequence elements" means that sequence elements are accessed and their content is changed, but it does not invalidate any iterator.)The potential harm of a mutating unary predicate stems from the fact that the predicate is not the only one that can access elements from the input sequence and change them. The algorithm itself might attempt to modify the same sequence of elements. In such a case, there are two side-effect-producing entities (the algorithm and the predicate), and their activities might collide.
When does such a collision happen? Not all algorithms modify elements
from the input sequence, but some of them indeed do. Algorithms fall into
several categories: non-modifying and mutating algorithms, and the mutating
algorithms can be distinguished as in-place and copy algorithms. The
non-modifying
algorithms (e.g.,
count_if
) only inspect sequence elements; they
don't change anything. The
mutating copy
algorithms (e.g.,
replace_copy_if
)
do not modify elements in the input sequence, but copy them to the output
sequence; they modify elements in the output sequence. The
mutating
in-place
algorithms (e.g.,
replace_if
) modify elements "in place,"
which means that they modify elements in the input sequence; they are the
critical ones. Hence, the potential conflict between predicate and algorithm
occurs for mutating in-place algorithms in conjunction with a mutating
unary predicate.
Application of two modifications to the same sequence element raises
a couple of questions such as: Which modification is performed first and
potentially overwritten by the second? Is the result predictable at all?
In order to avoid any such conflict between the algorithm and the predicate,
the Standard requires that a unary predicate must not modify the elements
in the input sequence that it receives as arguments. Note that the mutating
side effect is disallowed for
all
unary predicates, not just for
those that are supplied to mutating in-place algorithms.
This restriction is often reflected in the predicate's function signature: typically a unary predicate takes its argument either by value or by constant reference, in order to make sure that its argument (the referred-to input sequence element) cannot be modified. Harmless Side EffectsLast, but not least, a predicate may have harmless side effects. All non-mutating access to sequence elements falls into this category. A predicate may apply constant functionality to its arguments; that is, it may inspect the sequence elements, but it must not modify them. In addition, a unary predicate may modify objects other than its argument. It may for instance have data members and change their values. Or it may refer to unrelated element sequences and change them. This is harmless as long as the sequences that are changed by the predicate are not simultaneously used by the algorithm.Why Do We Care?We've seen the different types of side effects that a predicate can produce and that many of the side effects are prohibited by the Standard. Why would we want to use predicates with side effects under these circumstances? Is a unary predicate with side effects a rare or a fairly common case?
Well, it depends. If you look at typical examples of predicates that
are given in C++ text books, then you'll find unary predicates such as
isEven
defined as
bool isEven(int elem) { return elem %2 == 0; }
or expressions
such as
bind2nd(greater<int>(),4)
, which is a unary predicate
created from a predefined binary predicate by means of a so-called binder.
Even if you study unary predicates that are implemented as function object
types (i.e., they are classes with an overloaded function call operator),
they rarely have data members or do anything complicated, and they never
have side effects.
In practice, this is slightly different. For instance, we care about
efficiency. Obviously, it is faster to perform several things in one pass
over the sequence instead of stepping through a lengthy sequence repeatedly.
Consider a couple of examples.
Say, we have a container that represents our customer base. We need to determine the number of all frequent buyers for our internal statistics, and, as we go, we want to build up a mailing list because we intend to send promotional mail to the frequent buyers. However, the mailing list should not exceed a limit of 5,000 addressees. That's the task. How can we achieve this? One conceivable solution is a unary predicate that yields true for a frequent buyer and accumulates information for the mailing list as it goes. When supplied to the count_if algorithm, it would produce the desired count and would build up the mailing list as a side effect. Such a unary predicate strictly conforms to the rules. It accepts a constant reference to a client, inspects the client, and produces a harmless side effect, namely the mailing list.
Let us consider another, similar example. We need to remove all
infrequent
buyers from our customer base and update the records of the remaining frequent
buyers regarding applicable discounts. Again, we try to be efficient and
want to do both in one pass over the customer base. In comparison to the
previous approach, we could try providing a unary predicate to
remove_if
that yields true for infrequent buyers (so that the algorithm removes them)
and adds information about the discounts to the remaining buyers. In contrast
to the earlier example, this is illegal, because adding information to
elements in the input sequence is a prohibited side effect. Remember: The
predicate must not modify its argument. But that's exactly what we intend
to do: we want to update the remaining elements in the sequence. So, what
do we do?
There is not much that we can do. A thorough study of the algorithms
section of the Standard reveals that the
for_each
algorithm is the
only algorithm that accepts a function object
and
allows the function
object to modify its argument. (We discussed
for_each
in a previous
article
[4]
.) For this reason, we are very restricted
in our choice of algorithms for a given task if that task includes modification
of elements in the input sequence;
for_each
is the basically the
only choice.
The consequence is that we must split the task into the non-modifying activities (implemented as a unary predicate for remove_if ) and the mutating activities (implemented as a functor for for_each ). This way an additional pass over the entire sequence cannot be avoided, including the inevitable loss in efficiency. The alternatives are:
If such a modification is desired (e.g., for efficiency reasons), then
we must split the task into modifying and non-modifying aspects and must
accept multiple passes over the input sequence if we want to use the standard
algorithms.
Property 7A unary predicate must be order-insensitive; that is, its effects must be independent of the order of invocation.
In conjunction with side effects of unary predicates, two other aspects
are relevant: order and number of invocation of the predicate. If a side
effect is produced each time the predicate is applied to an element of
the input sequence, then we would want to know how often and in which order
the side effects are produced. For instance, in our example, we increment
a count in order to determine the maximum size for the generated mailing
list. Naturally, it makes a difference if the predicate is applied exactly
once per element or maybe repeatedly. Depending on the nature of the side
effect, the order and number of invocations plays a role.
The number of invocations is exactly specified: algorithms like count_if or remove_if apply the unary predicate exactly once per element in the input sequence. Order of invocation is different: none of the algorithms that take a unary predicate specifies the order in which it supplies sequence elements to the predicate. As a consequence, unary predicates must be independent of the order of invocation. If we use a predicate that depends on the order of invocation, the result is unpredictable.
Here is an example: a (order-sensitive) predicate that yields true for
every n
th
element in the sequence:
class Nth { public: Nth(int n) : theN(n), theCnt(0) {} bool operator()(int) { return (++theCnt)%theN; } private: const int theN; int theCnt; };If we supply a predicate such as Nth(3) to an algorithm like remove_copy_if , then one might expect that it would move every third element from the input sequence to the output sequence. But this is not guaranteed, because the sequence elements are not necessarily supplied in any definite order. All that we can reasonably rely on is that a third of all elements will be moved from the input sequence to the output sequence.
Why doesn't the Standard give a guarantee regarding the order of invocation
of the unary predicates? This is because certain algorithms can be optimized
for certain types of iterators. For instance, an algorithm might step through
the sequence from beginning to end if the iterator is an input iterator,
but could do random jumps for a random access iterator. Since the Standard
does not want to restrict the potential for such optimizations, it does
not give any guarantees for the order of invocation of the unary predicates.
The consequence for users of the STL is that all unary predicates must
not depend on the order in which sequence elements are supplied to it.
If we want to use order-sensitive predicates, then we must implement our
user-defined algorithms that give us the guarantee we need regarding the
order of invocation of the unary predicate.
Related to the order and number of invocations of the unary predicate,
there is the following last observation.
Property 8A unary predicate need not yield the same result for different invocations with the same argument.
This property might sound a little far-fetched. We include this into
the list of properties because we've observed that occasionally it is assumed
that there is an implied requirement that predicates exhibit "stable" behavior,
in the sense that they yield the same result each time they are invoked
with the same or an "equal" argument. This assumption is not justified;
the Standard does not define any such requirement.
Why then is it occasionally assumed that "stable" behavior is required
of a unary predicate? Because it would give the implementations a considerable
amount of latitude. With "stable" behavior, it would not matter how often
the predicate is called for the same sequence element, since the result
would always be the same. It would also be irrelevant whether a reference
to a sequence element is supplied to the algorithm or a temporary copy
of the element (namely an "equal" element).
However, for unary predicates in the STL, there is no requirement of "stable" behavior. It perfectly makes sense to define an "instable" predicate. An example would be a predicate that yields true for all elements with a certain property until a limit is reached. It could be used to copy a maximum number of elements with a given property from an input sequence to an output sequence using remove_copy_if . SummaryThe following algorithms in the standard library use a unary predicate: replace_if , remove_if , partition , stable_partition , replace_copy_if , remove_copy_if , count_if , and find_if .
If a unary predicate has the properties listed below, then it can be
provided to any of the algorithms listed above, and the result will be
portable and predictable.
References[1] International Standard. Programming languages — C++ ISO/IEC IS 14882:1998(E).[2] Nicolai M. Josuttis. "Predicates vs. Function Objects," C++ Report , June 2000. [3] Klaus Kreft and Angelika Langer. "Effective Standard C++ Library: Explicit Function Template Argument Specification and the STL," C/C++ Users Journal , December 2000, http://www.cuj.com/experts/1812/langer.html .
[4] Klaus Kreft and Angelika Langer. "Effective Standard
C++ Library:
for_each
vs.
transform
,"
C/C++ Users Journal
,
February 2001.
http://www.cuj.com/experts/1902/langer.html
|
|||||||||||||||||||
© Copyright 1995-2012 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/Cuj/04.UnaryPredicates/UnaryPredicates.html> last update: 22 Aug 2012 |