AngelikaLanger.com - Stream Iterators - Angelika Langer Training/Consulting

Angelika Langer - Training & Consulting

	HOME


	OVERVIEW BY TOPIC JAVA C++ BY COLUMN EFFECTIVE JAVA EFFECTIVE STDLIB BY MAGAZINE JAVA MAGAZIN JAVA SPEKTRUM JAVA WORLD JAVA SOLUTIONS JAVA PRO C++ REPORT CUJ OTHER
	GENERICS
	LAMBDAS
	IOSTREAMS
	ABOUT
	CONTACT

Stream Iterators

Stream Iterators
Stream Iterators

C++ Report, May 1999
Klaus Kreft & Angelika Langer

Let us continue our exploration of iterator types in the standard library: In our last article we discussed insert iterators; this time we examine stream iterators. You might remember the old, non-standard IOStreams library, which was initially developed by Jerry Schwarz at AT&T many years ago. This classic IOStreams library did not have stream iterators. The idea for those iterators came up when the standard committee adopted HP’s STL as part of the standard library. The STL is based on the idea of generic programming: iterators are provided by containers, and algorithms use these iterators to access the container elements. Here we had two entirely unrelated domains: the classic IOStreams abstraction and the STL with its concept of generic programming. Stream iterators build the bridge between both domains. They allow to see a stream as a sequence of elements, much like an STL containers, and allow to apply existing generic algorithms directly to streams. From this design effort two new iterators emerged:

istream iterators , which can be used to read from a stream, and
ostream iterators , which can be used to write to a stream.

Using Stream Iterators

Before we delve into the details of stream iterators, let’s first have a look at two examples. They show how powerful the combination of generic algorithms and streams via stream iterators can be in practice. Assume we have a text file and want to count how often the word ‘the’ is contained in the text. We can achieve it in the following way:

ifstream str("my_text_file");
istream_iterator<string> beginÍter(str);
istream_iterator<string> endIter;
cout << "number of _the_: " << count(beginIter, endIter, "the"); This solution uses, on one hand, the IOStreams functionality: A stream is constructed from a file name and formatted input is read from the stream via the istream_iterator . On the other hand, it uses STL components: The generic count -algorithm determines how often the word ‘the’ can be found inside the iterator range [ beginIter, endIter ).

The second example, that we want to look at, deals with the problem that standard library containers do not support any stream i/o directly. So, the question is: What is the best way to print a container? Here is a solution that prints all container elements separated by a blank. The approach is based on the ostream_iterator and the copy -algorithm:

List<int> myList;
// fill in some elements
copy (myList.begin(), myList.end(), ostream_iterator<int>(cout, " "));

Both examples show that stream iterators allow algorithms to see a stream as a container of homogeneous elements. Algorithms can apply their functionality to the stream in the same way as they would do to any other standard library container. In this sense, there is a certain analogy between stream iterators and container iterators. Naturally, there are differences, too. We will explore them in the following while we have a detailed look at the way stream iterators work.

Ostream Iterators

We start our exploration with the ostream_iterator because it has a lot in common with the insert_iterator that we discussed in the last column. This becomes clearer when we revisit the example above and see how the code changes when we inline the copy -algorithm:

List<int> myList;
// fill in some elements
List<int>::const_iterator sourceIter = myList.begin();
List<int>::const_iterator endIter = myList.end()
ostream_iterator<int> targetIter(cout, " ");
while (sourceIter != endIter)
*targetIter++ = *sourceIter++; We see that the ostream_iterator is used in a similar way as the insert_iterator :

It is used as an output iterator, i.e. values are always written via the iterator, but never read.
The applied operations are: operator*() and operator++() or operator++(int) .

Due to the similar usage, we expect that an output_iterator’s implementation has a lot in common with that of an insert_iterator . And so it is: template <class T, class charT=char, class traits=char_traits<charT> >
class ostream_iterator :
public iterator<output_iterator_tag,void,void,void,void>
{
public:
    typedef charT char_type;
    typedef traits traits_type;
    typedef basic_ostream<charT,traits> ostream_type;
    ostream_iterator(ostream_type& s) : ost(&s),delim(0) {}
    ostream_iterator(ostream_type& s, const charT* d) : ost(&s),delim(d){};
    ostream_iterator<T,charT,traits>& operator= (const T& t)
    {
        *ost << t;
        if (delim != 0) *ost << delim;
        return this;
    }
    ostream_iterator<T,charT,traits>& operator*() { return *this; }
    ostream_iterator<T,charT,traits>& operator++() { return *this; }
    ostream_iterator<T,charT,traits>& operator++(int); {return *this; }
private:
    const charT* delim;
    basic_ostream<charT,traits>* ost
}; Listing 1: Implementation of the ostream_iterator type

Listing 1 shows a typical ostream_iterator implementation. We do not want to go through each of ostream_iterator ‘s member functions. They are implemented in analogy to the insert_iterator’ s member functions, which we explained in detail in our last column (see [ 1 ]). The reasons for this similarity between the ostream_iterator and insert_iterator implementation are subject of a future article.

Interaction with IOStreams

One point in the ostream_iterator ’s implemention is worth being examined in more detail. operator=(const T& t) uses T ’s stream inserter to write t to the stream. As a consequence, the ostream_iterator template can only be instantiated for types that have an associated stream inserter. Otherwise, a compile time error will occur.

In our example above we used the copy -algorithm and the ostream_iterator to write a list<int> to standard output. This works out nicely. We instantiate the ostream_iterator template as ostream_iterator<int> . Hence, a stream inserter for int is needed, and such an inserter is supplied by the standard library.

Another observation is that each time the ostream_iterator ‘s assignment operator inserts an object into the output stream, the current stream position is advanced. This demonstrates that the ostream_iterator is not an artificial abstraction contrived out of the need to combine generic programming and output streams. Instead, it has genuine iterator-like semantics: It keeps track of the stream position at which the next object should be inserted, and it allows write access to this position via operator=.

Error Indication

The ostream_iterator has no specific feature to indicate that the insertion of an object into the stream failed or caused an error. For error detection only the error indication mechanisms of the underlying stream are available. (For details of new stream features such as IOStreams exceptions see [ 2 ].)

In our example, we can set badbit , eofbit , and failbit in cout’s exception mask before we apply the copy -algorithm to the ostream_iterator :

cout.exceptions(ios_base::badbit | ios_base::eofbit | ios_base::failbit);
copy (myList.begin(), myList.end(), ostream_iterator<int>(cout, " ")); The stream will then throw an ios_base::failure exception when an error occurs. Alternatively, we can check cout ’s stream state after the copy -algorithm has been run, to see if an error had occurred: copy (myList.begin(), myList.end(), ostream_iterator<int>(cout, " "));
if (!cout.good())
{
// do something about the error !
} Istream Iterator

While the ostream_iterator is an output iterator the istream_iterator naturally is an input iterator. It allows read access to the value that it refers to and it can be advanced in single steps.

Its implementation (see Listing 2) is straight forward. A private data member ( value ) buffers the next value from the input stream. Read only access to this data member is given by operator*() and operator->() : they return a const reference respectively a const pointer to this data member.

template <class T, class charT = char, class Traits = char_traits<charT>, class Distance = ptrdiff_t>
class istream_iterator : public iterator<input_iterator_tag, T, Distance, const T*, const T&>
{
public:
    typedef charT char_type
    typedef Traits traits_type;
    typedef basic_istream<charT,Traits> istream_type;
    istream_iterator() : istp(0) {}
    istream_iterator(istream_type& s) : istp(&s) { readElem(); }
    const T& operator*() const { return value; }
    const T* operator->() const { return &value; }
    istream_iterator<T,charT,Traits,Distance>& operator++()
    {
        readElem();
        return *this;
    }
    istream_iterator<T,charT,Traits,Distance> operator++(int)
    {
        istream_iterator<T,charT,Traits,Distance> tmp = *this;
        readElem();
        return tmp;
    }
private:
    void readElem()
    {
        if (istp != 0)
            if (!(*istp >> value))
                ist = 0;
    }
    basic_istream<charT,traits>* istp;
    T value;
}; Listing 2: Implementation of the istream_iterator type

Interaction with IOStreams

When we examined the ostream_iterator ’s implemention, we saw that it is based on T ’s inserter. Similarly, the istream_iterator ‘s implementation is based on T ’ s extractor. The extractor is used in the private member function readElem() . It reads the next object of type T from the stream and buffers it in the private data member value .

The standard specifies that the next object is extracted from the stream whenever operator++() or operator++(int) is applied to the istream_iterator . It is, however, implementation specific when the first object is read by the istream_iterator . According to the standard this can be done either when the istream_iterator is constructed or, as a kind of lazy evaluation, when operator*() or operator->() are used for the first time to access the value buffered in the iterator. The implementation given in Listing 2 extracts the first object in its constructor. Please note that the stream position of the input stream (that is the position of the read pointer) is moved forward in the stream with every extraction of an object from the stream. This means that an istream_iterator has true iterator-like semantics: It iterates over the input stream giving read access to the objects contained in the stream.

Error Indication and End-of-Stream Iterators

What happens if readElem() tries to extract the next object of type T from the stream and none is available? By IOStreams convention, the failbit is set in the stream state to indicate that an extraction has failed. After each extraction readElem() checks the stream state. If the stream state is not good() anymore, the private member istp is set to 0 , which indicates that the iterator is detached from its stream. As a consequence, the stream iterator cannot be used to extract any further objects. An istream_iterator that is in this state is called a end-of-stream iterator . While this name might be a bit misleading, it is easy to understand where it comes from. After the extraction of an object the istream_iterator checks whether the stream state is still good() . That is, it checks if either failbit , badbit , eofbit , or a combination of them has been set. If no error ever occurs, neither failbit nor badbit get set. The only other event that can cause an istream_iterator to turn into an end-of-stream iterator is hitting the end of the stream because then eofbit is set. To sum it up: When an input iterator turns into an end-of-stream iterator, then this either signals an error situation or indicates that the end of the input stream was reached. As with the output stream iterator, we have to fall back to the underlying stream’s error indication mechanisms to distinguish between these two situations.

Input Iterator Ranges

Generic algorithms, which read input from a containers, often receive two input iterators as parameters. The first one specifies the beginning and the second one the end of a sequence of iterator positions, to which the algorithm shall be applied. This sequence of iterator positions is called an iterator range . By convention, the iterator that specifies the beginning of the range is part of the range, while the iterator that specifies the end is not, that is, the algorithm is not applied to the end iterator. Say, we want to apply a generic algorithm to an input stream. How can we specify an according iterator range in terms of istream_iterator s?

When we construct an istream_iterator from a valid input stream, the iterator refers to the current stream position of that input stream. Hence, we can use this istream_iterator to specify the beginning of the iterator range. Still, which iterator shall we use to specify the end of the range?

Iterator ranges are typically used in while -loops like this:

while (beginIter != endIter)
{
... // do something
beginIter++; // increment iterator
} As an end iterator, we need an iterator that is reachable from the beginning of the range, that is, successive increments of the first iterator must eventually yield the second iterator.

A word on the while -condition: We silently assumed the existence of a (in-)equality operator for istream_iterator ’s. How is that defined? The standard requires the following semantics:

Two end-of-stream iterators of the same type are always equal.
An end-of-stream iterator is not equal to a non-end-of-stream iterator.
Two non-end-of-stream iterators are equal when they are constructed from the same stream.

From the third requirement follows that two istream_iterator s which are equal do not necessarily refer to the same stream position. One might intuitively expect such a property, because it is true for pointers and container iterators. Note, that it is not true for input stream iterators.

Back to our problem: how do we express an iterator range of istream_iterator s? For the begin iterator we can simply use the istream_iterator constructed from the input stream. It represents the current stream position. If we successively increment this iterator, which means that we successively extract items from the stream, we will eventually hit the end of the stream. For the end iterator we therefore need an input stream iterator in end-of-stream state. How do we get one? The istream_iterator ’s default constructor creates it.

Note, that the only input stream iterator ranges are from the current stream position to the end of the stream. It is not possible to specify a range from one stream position to another stream position, because any two non-end-of-stream iterators referring to the same stream always compare equal.

These explanations hopefully clarify the previous example, where we counted how often the word ‘the’ occurred in a text file:

ifstream str("my_text_file");
istream_iterator<string> beginÍter(str); // line 1
istream_iterator<string> endIter; // line 2
cout << "number of _the_: " << count(beginIter, endIter, "the"); // line 3 In line 1, we construct the begin iterator from the stream that we want to use. In line 2, the end iterator is created by the istream_iterator ’s default constructor. In line 3, we apply the count -algorithm to the iterator range [ beginIter,endIter ).

Reset after an error has occurred

By comparing an istream_iterator to an end-of-stream iterator, it is possible to detect if a stream error has occurred. Yet, istream_iterator s have no feature to reset an iterator that has gone into an end-of-stream state. If the error is not fatal we can do the following: clear the stream’s error state, construct a new istream_iterator , which then represents the current stream position, and restart the algorithm with this iterator as the begin iterator:

str.clear();
istream_iterator<string> newBeginIter(str);
cout << "... and we have found ";
cout << count(newBeginIter, endIter, "the");
cout << " more _the_"; More Differences Between Stream And Container Iterators

The detailed descriptions of the ostream_iterator and istream_iterator revealed quite a number of the differences between stream and container iterators.

Stream iterators impose specific requirements on the type of elements they contain: The element type must have an inserter and an extractor defined. Container iterators, on the other hand, make relatively weak, easy to meet, requirements like: the elements must be destructible, assignable, copy-constructible, and so on.
The end iterator of an input stream iterator range is always an end-of-stream input iterator. With container iterators you can specify any range of iterator positions within a container. Also, an end-of-stream iterator is independent of any stream objects. Not so with container iterators: The end iterator of an range of container iterators must refer to a position in the same container as the begin iterator, or end iterator must be obtained by that container’s member function end() .
Input stream iterators of the same type constructed from the same stream compare equal, independently of the stream position the refer to. Container iterators compare equal when they represent the same position in the container.

One-pass vs. multi-pass iterators

Yet another difference is that stream iterators are one-pass iterators , that is, you can access an element referred to by a stream iterator only once. For instance, we cannot re-read elements through a memorized iterator. The following would fail: memorizing the begin position of the stream, then extracting elements from the stream, and later trying to re-read the first element through the memorized begin iterator. The reason is that once extracted the element is consumed and cannot be re-read. The single-pass property can be best understood in terms of i/o from/to an terminal. Once we’ve read from the terminal stream, the input is consumed. Once we’ve written to the terminal stream, we cannot re-position and overwrite the output. In contrast, container iterators are multi-pass iterators. We can repeatedly access any element referred to by any iterator (except the end iterator, of course). The one-pass or multi-pass property is expressed in the iterator categories. Iterators in the input and output iterator category are one-pass iterators. Iterators in the forward, bidirectional, or random-access iterator categories must be multi-pass iterators.

Is the single-pass property a restriction? Can all algorithms live with this restriction? Let’s see what algorithms typically need of an iterator.

Algorithms that write output via an output iterator usually do not access the same output position twice. That would mean that they overwrite a position they had previously filled. No algorithm we know of does anything like that. Therefore, all algorithms that write output via an iterator require an iterator type of the output iterator category and happily live with its one-pass property. For this reason, an ostream_iterator can be used in all algorithms that require an output iterator.

Algorithms that read input via an iterator usually take an input iterator range. Not all such algorithms can live with the one-pass restriction of input iterators like the istream_iterator . The find_end() algorithm, for instance, does a look ahead and for that matter needs the multi-pass property. In order to explain this, let’s take a closer look at the find_end() algorithm. It finds a subsequence of equal values in a sequence and returns an iterator to the begin of the subsequence. Here is an example of how it would be used:

string s1 = "abcdefghijk";
string s2 = "def";
string::iterator i = find_end(s1.begin(),s1.end(),s2.begin(),s2.end());
cout << i << endl; The result would be an iterator to the letter ‘d’ in s1 . The algorithm maintains two iterators: the first refers to the first input sequence, the other one to the potential subsequence. In the beginning the first iterator points to the ‘a’ in s1 , the second iterator points to the ‘d’ in s2 . Then the algorithm looks for a match, that is, whether the ‘a’ is the beginning of the subsequence "def". It performs this search by successively advancing both iterators and comparing the referred to characters. As it can’t find a match here, it resets both iterators: the first one to the ‘b’ in s1 and the second iterator back to the beginning of "def". Then it starts looking for the match again. And so on and so forth. The crux is that find_end() algorithm needs to re-read elements from the input sequences. This cannot be done with iterators from the input iterator category because they only support one-pass access. And indeed, the interface description of the find_end() function asks for an iterator from the forward_iterator category: template<class ForwardIterator1, class ForwardIterator2> inline
ForwardIterator1 find_end(ForwardIterator1 first1, ForwardIterator1 last1
,ForwardIterator2 first2, ForwardIterator2 last2); Note, that the find_end() algorithm does not need the entire functionality that is required of a forward iterator. Forward iterators allow multi-pass access for reading and writing. Write access isn’t needed in find_end() . Hence, all that this algorithm really needs is a multi-pass input iterator. However, there is no such iterator category.

Let us hasten to add that there, of course, are algorithms for whom one-pass input iterators perfectly suffice. The often quoted find() algorithm is an example, and so is the count() algorithm that we used in our examples. These algorithms read elements successively until they find what they’re looking for or until they’ve counted all relevant elements. No element needs to be accessed twice, no look ahead, no repositioning is required. Their interface description only asks for iterators from the input iterator category:

template<class InputIterator, class T>
InputIterator find (InputIterator first, InputIterator last, const T& val);
template<class InputIterator, class T>
size_t count(InputIterator first, InputIterator last, const T& val); In sum, where input iterators are required, istream_iterator s can be provided. When an algorithm asks for a forward iterator, an istream_iterator will not suffice.

Summary

Stream iterators connect the world of generic programming with the input and output stream abstractions. They show that it is possible to include a formerly completely unrelated domain like IOStreams into the concepts of generic programming. The general idea is: see a stream as a kind of container. We have also recognized that the analogy between streams and containers sometimes breaks, and that a number of smart design details (such as the end-of-stream iterator) were needed to make the analogy work. However, the architectural bridge built by the stream iterators might serve as an inspiration for your own work. It takes no more than a pair of iterators to connect an old domain with the generic abstractions from the standard library.

References

New Features in Standard IOStreams
Klaus Kreft & Angelika Langer
C++ Report, June 1997

If you are interested to hear more about this and related topics you might want to check out the following seminar:

Seminar


	Effective STL Programming - The Standard Template Library in Depth 4-day seminar (open enrollment and on-site)
	IOStreams and Locales - Standard C++ IOStreams and Locales in Depth 5-day seminar (open enrollment and on-site)