|
|||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | |||||||||||||||||||
|
Sequence Points and Expression Evaluation in C++
|
||||||||||||||||||
Does anybody know what a sequence point is? Every C++ programmer
should know it. However, when we pose this question even experienced
programmers with many years of hands-on experience concede they have no
idea. In this article we want to shed some light on the secret and
explain what sequence points are and why it is important to be aware of
them.
What is a sequence point?A C++ program consists of basically two types of statements:
Side Effects of Expression EvaluationBefore we delve into sequence points, let's clarify what the side effects of expression evaluation are. The side effects that a C++ program can produce are:
Order of Evaluation of ExpressionsBack to sequence points: The compiler will generate executable instructions that will eventually evaluates all expressions in a program. The order in which the expressions will be evaluated is roughly the order in which they appear in the source code, that is, top to bottom and left to right. We say "roughly" because the compiler has some latitude to rearrange the order of expression evaluation if it sees potential for optimizations. However, the latitude granted to the compiler is limited - by sequence points. A sequence point is a point in the sequence of all expressions where the compiler is required to have finished the evaluation of all preceding expressions and has not yet started evaluation of any subsequent expressions.Which means that a sequence point is a point in the program where we as programmers know which expressions (or subexpressions) have already been evaluated and which expressions (or subexpressions) have not yet been evaluated. In other words, sequence points are points in the program where we know where we stand in the process of execution of your program. Between sequence points we know nothing about the order of evaluation of expressions and subexpressions. To most programmers' surprise there are only very few sequence points defined in C++, which means that most of the time we have no idea what has happened yet and which subexpressions have not yet been evaluated. ExamplesLet's take a look at some examples. When we see an expression like i=2; then we expect that once control flow reaches the semicolon variable i will have the value 2. This expectation is correct because there is indeed a sequence point at the end of every "full expression". (For the time being do not worry about the term "full expression"; usually the end of an expression is the semicolon that marks the end of a statement.) Which means that at the end of each statement, the statement has been executed as expected.But wait! How about this example: x[i]=i++ + 1;Let's assume variable i has the value 1 before we enter the statement. What will be the result of evaluation of this expression? The correct answer is: we don't know. However, programmers ever too often believe that they know what this program fragment does. Typical answers include: "x[1] will have the value 2", or "x[2] will have the value 2", or even "x[1] will have the value 3". The third option is definitely wrong. This will not happen because i++ is a postfix increment and returns i's initial value 1; hence the value of the right hand side of the assignment is 2, and definitely not 3. (For further information on the difference between prefix and postfix increment, see box 2 .) So far so good, but we do not know which entry of the array x will be modified. Will the index be 1 or 2 when the right hand side value will be assigned to x[i]?
There is no definite answer to this question. It fully depends on the
order in which the compiler evaluates the subexpressions. If the
compiler starts on the right hand side of the assignment and evaluates
i++ + 1 before it figures out which position in the array x must be assigned
to then x[2] will be modified because i will have already been incremented
in the course of evaluating the subexpression i++. Conversely, if
the compiler starts on the left hand side and figures out that it must
assign to position i in array x, which at that time will still be position
1, before it evaluates the right hand side then we'll end up with a modification
of x[1]. Both outcomes are equally likely and equally correct.
How can it be that the result of an innocent statement such as
x[i]=i++ + 1; is undefined? Well, it is the immediate consequence
of the fact that the order of evaluation of expressions and subexpression
between sequences points is undefined. Remember, the only sequence point
in this code fragment is at the end of the full expression, that is, at
the semicolon. The common misconception is that people believe the
assignment operator would introduce a sequence point, in which case the
compiler would have to evaluate the left hand side before the right hand
side. But the assignment operator does not mark a sequence point; practically
none of the operators in C++ is a sequence point. (The only exception
is the rarely used comma operator, as we will see later.)
Problematic vs. Safe ExpressionsWhat is it that renders the assignemnt x[i]=i++ + 1; a problematic one whereas the assignement i=2; is harmless, in the sense that its result is well-defined and predictable? The crux is that in the expression x[i]=i++ + 1; there are two accesses to variable i and one of the accesses, namely the i++, is a modifying access. Since the order of evaluation between sequence points is not defined we do not know whether i will be modified before it will be read or whether it will be read before the modification. Hence the root of the problem is multiple access to a variable between sequence points if one the accesses is a modification.Here is another example. What will happen here if i and j have values 1 and 2 before the statement is executed? f(i++, j++, i+j);Which value will be passed to function f as the third argument? Again, we don't know. It could be any of the following: 3, 4, or 5. It depends on the order in which the function arguments are evaluated. The common misconception here is that the arguments would be evaluated left to right. Or maybe right to left? In fact, there is no order whatsoever mandated by the language definition. Latitude for OptimizationsWhy actually does the language specification leave it open in which order compilers evaluate expressions between sequence points? Wouldn't matters be much clearer if a definite order of evaluation were defined?
The reason for the undefined order of evaluation between sequence points
is, like ever so often in C++, efficiency. Compilers are given the
liberty to evaluate expressions in the order that is most efficient for
a given platform and processor. Which means that the sequence
problems explained above are usually portability problems. The undefined
order of evaluation might never ever bother you as long as you stick to
the same compiler on the same platform. However, it can hit you as
soon as you install the latest version of your compiler and recompile your
source code. Or it hits you when you port your program to another platform.
In any case, this type of bug is hard to track down because it hits you
when you least expect it, because you can swear that you did not change
a single line of code ...
The Complete List of Sequence Points in C++So far we have only talked of the sequence point at the end of the full expression. Here is the complete list of all sequence points in C++:
End of a full expression . The end of the full expression is the semicolon in regular statements. In a conditional expression such as if(i==0 && j==f(x,y,z)) the end of the full expression not a semicolon but the closing bracket Function calls. The sequence points before and after a function call mean that execution of the invoked function and the calling context is not intermingled: all statements before the function call are executed, the function arguments are evaluated (in undefined order), and then the function body is executed. Similarly on return from the function: all statements of the function body are executed; the return expression is evaluated, and then control flow continues in the calling context with the statement after the function call.
Operators
. The sequence points in the logical expressions
such as && and || and ternary operator ?: and the comma operator
mean that the left hand side operand is evaluated before the right hand
side operand. These few operands are the only operands in C++ that
introduce sequence points.
Constructor initialization list. The last sequence point between initializations of base classes and members in the constructor initialization list exists because the order of these initializations is well-defined and not left to the compiler's discretion. Note, that the order is not the order in which bases and members appear in the constructor initialization list, but is the order in which they are listed in the class definition. Example: class Array {The order of initialization is data(new int(size)) followed by size(high-low+1) followed by lBound(low) followed by hBound(high), which of course is problematic in this case because size will be used before is will be initialized to a sensible value. This is another known pitfall in C++, which you find explained in / MEY /. The sequence point between initialization of each base and member in the constructor initialization list just makes sure the initializations are not intermingled. Instead they happen one after other in the order the bases and members are listed in the class definition. Hidden DependenciesNow that you know all sequence points you are ready for looking into a couple of additional examples. So far, we have only seen examples where the multiple access to at least variable was fairly obvious. How about this one?x = f() + g() + h(); Sequence Points and ExceptionsHere is another interesting case:f( new X(i++), new Y(i) );Lots of things are going on here: we produce the side effects of modification of variable i, memory allocation for objects of type X and Y, and construction of these objects. We do not know in which order these side effects will be produced. Well, we do know that i++ will be evaluated before the constructor of X will be invoked and we can safely assume that the runtime system will allocate memory before it will try to initialize that memory be means of the constructor calls. But we do not know whether i will be incremented before it will be passed to the constructor of Y. And we do not know which object will be allocated first and/or which constructor will be called first. It is even allowed that the compiler may allocate the memory for both objects before it calls any of the constructors.
This lack of certainty is particularly problematic in case of exceptions
being thrown by either operator new or any of the constructors. In
case of an exception we would not know what has already happened and for
this reason we would have no idea how to react and roll-back appropriately.
Avoid Complex ExpressionsThe problem in the example above was again: multiple access to the same variable including a modification. We simply tried to squeeze too many things into one expression. As a rule of thumb: avoid complex expressions. All of the problems discussed in this article can be prevented by breaking complex expressions down into smaller expressions. For instance, instead of sayingx[i]=i++ + 1;we could say x[i]=i + 1;or whatever else we want to see happening. By taking the expression apart we introduce additional sequence points and as a result we have a defined order of expression evaluation. The same applies to the last examples: instead of saying f( new X(i++), new Y(i) );we could say X* xptr = new X(i++);If we now catch a Yexception (from whose type we can tell that it was thrown by the Y constructor) then we would be able to tell that the X object must have already been successfully created and we have a much better chance to handle the exception more appropriately. Or we could wrap each of these statements into a try-block of its own if we wanted to catch and handle the bad_alloc exceptions from either of the calls to operator new. RecommendationAvoid complex expressions and especially those that include read and write access to the same object within one expression. With simpler expression we have significantly more sequence points and thus more control over the order of evaluation of the expressions that make up our program.ConclusionThe order of evaluation of expressions and subexpressions is undefined between sequence points. C++ has amazingly few sequence points defined. The pitfall lies in the common misconception that there were many more sequence points than there really are. Most operators are not sequence points; the only exceptions are &&, ||, ?:, and the comma operator. Separators are never sequence points. The only sequence point you can really rely on is the end of a full expression, which is normally a semicolon.
Sequence point problems are typically portability problem that are difficult
to track down in practice. Hence the recommendation: avoid them before
they hit you. Keep your source code simple. Avoid complex expressions
and especially those that contain multiple access to the same variable
if one of the accesses is a modification. In case of doubt, break
the expression down into several statements whereby you will introduce
additional sequence points and will have a defined order of evaluation
of your expressions.
References
|
|||||||||||||||||||
© Copyright 1995-2012 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/VSJ/SequencePoints/SequencePoints.html> last update: 22 Aug 2012 |