|
|||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | |||||||||||||||||||
|
Stream Iterators
|
||||||||||||||||||
Stream Iterators
C++ Report, May 1999
Let us continue our exploration of iterator types in the standard library: In our last article we discussed insert iterators; this time we examine stream iterators. You might remember the old, non-standard IOStreams library, which was initially developed by Jerry Schwarz at AT&T many years ago. This classic IOStreams library did not have stream iterators. The idea for those iterators came up when the standard committee adopted HP’s STL as part of the standard library. The STL is based on the idea of generic programming: iterators are provided by containers, and algorithms use these iterators to access the container elements. Here we had two entirely unrelated domains: the classic IOStreams abstraction and the STL with its concept of generic programming. Stream iterators build the bridge between both domains. They allow to see a stream as a sequence of elements, much like an STL containers, and allow to apply existing generic algorithms directly to streams. From this design effort two new iterators emerged:
Before we delve into the details of stream iterators, let’s first have a look at two examples. They show how powerful the combination of generic algorithms and streams via stream iterators can be in practice. Assume we have a text file and want to count how often the word ‘the’ is contained in the text. We can achieve it in the following way: istream_iterator<string> beginÍter(str); istream_iterator<string> endIter; cout << "number of _the_: " << count(beginIter, endIter, "the"); The second example, that we want to look at, deals with the problem that standard library containers do not support any stream i/o directly. So, the question is: What is the best way to print a container? Here is a solution that prints all container elements separated by a blank. The approach is based on the ostream_iterator and the copy -algorithm: List<int> myList;Both examples show that stream iterators allow algorithms to see a stream as a container of homogeneous elements. Algorithms can apply their functionality to the stream in the same way as they would do to any other standard library container. In this sense, there is a certain analogy between stream iterators and container iterators. Naturally, there are differences, too. We will explore them in the following while we have a detailed look at the way stream iterators work. We start our exploration with the ostream_iterator because it has a lot in common with the insert_iterator that we discussed in the last column. This becomes clearer when we revisit the example above and see how the code changes when we inline the copy -algorithm: // fill in some elements List<int>::const_iterator sourceIter = myList.begin(); List<int>::const_iterator endIter = myList.end() ostream_iterator<int> targetIter(cout, " "); while (sourceIter != endIter) *targetIter++ = *sourceIter++;
class ostream_iterator : public iterator<output_iterator_tag,void,void,void,void> { public: typedef charT char_type; typedef traits traits_type; typedef basic_ostream<charT,traits> ostream_type; ostream_iterator(ostream_type& s) : ost(&s),delim(0) {} ostream_iterator(ostream_type& s, const charT* d) : ost(&s),delim(d){}; ostream_iterator<T,charT,traits>& operator= (const T& t) { *ost << t; if (delim != 0) *ost << delim; return this; } ostream_iterator<T,charT,traits>& operator*() { return *this; } ostream_iterator<T,charT,traits>& operator++() { return *this; } ostream_iterator<T,charT,traits>& operator++(int); {return *this; } private: const charT* delim; basic_ostream<charT,traits>* ost }; Listing 1 shows a typical ostream_iterator implementation. We do not want to go through each of ostream_iterator ‘s member functions. They are implemented in analogy to the insert_iterator’ s member functions, which we explained in detail in our last column (see [ 1 ]). The reasons for this similarity between the ostream_iterator and insert_iterator implementation are subject of a future article. One point in the ostream_iterator ’s implemention is worth being examined in more detail. operator=(const T& t) uses T ’s stream inserter to write t to the stream. As a consequence, the ostream_iterator template can only be instantiated for types that have an associated stream inserter. Otherwise, a compile time error will occur. In our example above we used the copy -algorithm and the ostream_iterator to write a list<int> to standard output. This works out nicely. We instantiate the ostream_iterator template as ostream_iterator<int> . Hence, a stream inserter for int is needed, and such an inserter is supplied by the standard library. Another observation is that each time the ostream_iterator ‘s assignment operator inserts an object into the output stream, the current stream position is advanced. This demonstrates that the ostream_iterator is not an artificial abstraction contrived out of the need to combine generic programming and output streams. Instead, it has genuine iterator-like semantics: It keeps track of the stream position at which the next object should be inserted, and it allows write access to this position via operator=. The ostream_iterator has no specific feature to indicate that the insertion of an object into the stream failed or caused an error. For error detection only the error indication mechanisms of the underlying stream are available. (For details of new stream features such as IOStreams exceptions see [ 2 ].) In our example, we can set badbit , eofbit , and failbit in cout’s exception mask before we apply the copy -algorithm to the ostream_iterator : copy (myList.begin(), myList.end(), ostream_iterator<int>(cout, " ")); if (!cout.good()) { // do something about the error ! } While the ostream_iterator is an output iterator the istream_iterator naturally is an input iterator. It allows read access to the value that it refers to and it can be advanced in single steps. Its implementation (see Listing 2) is straight forward. A private data member ( value ) buffers the next value from the input stream. Read only access to this data member is given by operator*() and operator->() : they return a const reference respectively a const pointer to this data member. class istream_iterator : public iterator<input_iterator_tag, T, Distance, const T*, const T&> { public: typedef charT char_type typedef Traits traits_type; typedef basic_istream<charT,Traits> istream_type; istream_iterator() : istp(0) {} istream_iterator(istream_type& s) : istp(&s) { readElem(); } const T& operator*() const { return value; } const T* operator->() const { return &value; } istream_iterator<T,charT,Traits,Distance>& operator++() { readElem(); return *this; } istream_iterator<T,charT,Traits,Distance> operator++(int) { istream_iterator<T,charT,Traits,Distance> tmp = *this; readElem(); return tmp; } private: void readElem() { if (istp != 0) if (!(*istp >> value)) ist = 0; } basic_istream<charT,traits>* istp; T value; }; When we examined the ostream_iterator ’s implemention, we saw that it is based on T ’s inserter. Similarly, the istream_iterator ‘s implementation is based on T ’ s extractor. The extractor is used in the private member function readElem() . It reads the next object of type T from the stream and buffers it in the private data member value . The standard specifies that the next object is extracted from the stream whenever operator++() or operator++(int) is applied to the istream_iterator . It is, however, implementation specific when the first object is read by the istream_iterator . According to the standard this can be done either when the istream_iterator is constructed or, as a kind of lazy evaluation, when operator*() or operator->() are used for the first time to access the value buffered in the iterator. The implementation given in Listing 2 extracts the first object in its constructor. Please note that the stream position of the input stream (that is the position of the read pointer) is moved forward in the stream with every extraction of an object from the stream. This means that an istream_iterator has true iterator-like semantics: It iterates over the input stream giving read access to the objects contained in the stream. Error Indication and End-of-Stream Iterators What happens if readElem() tries to extract the next object of type T from the stream and none is available? By IOStreams convention, the failbit is set in the stream state to indicate that an extraction has failed. After each extraction readElem() checks the stream state. If the stream state is not good() anymore, the private member istp is set to 0 , which indicates that the iterator is detached from its stream. As a consequence, the stream iterator cannot be used to extract any further objects. An istream_iterator that is in this state is called a end-of-stream iterator . While this name might be a bit misleading, it is easy to understand where it comes from. After the extraction of an object the istream_iterator checks whether the stream state is still good() . That is, it checks if either failbit , badbit , eofbit , or a combination of them has been set. If no error ever occurs, neither failbit nor badbit get set. The only other event that can cause an istream_iterator to turn into an end-of-stream iterator is hitting the end of the stream because then eofbit is set. To sum it up: When an input iterator turns into an end-of-stream iterator, then this either signals an error situation or indicates that the end of the input stream was reached. As with the output stream iterator, we have to fall back to the underlying stream’s error indication mechanisms to distinguish between these two situations. Generic algorithms, which read input from a containers, often receive two input iterators as parameters. The first one specifies the beginning and the second one the end of a sequence of iterator positions, to which the algorithm shall be applied. This sequence of iterator positions is called an iterator range . By convention, the iterator that specifies the beginning of the range is part of the range, while the iterator that specifies the end is not, that is, the algorithm is not applied to the end iterator. Say, we want to apply a generic algorithm to an input stream. How can we specify an according iterator range in terms of istream_iterator s? When we construct an istream_iterator from a valid input stream, the iterator refers to the current stream position of that input stream. Hence, we can use this istream_iterator to specify the beginning of the iterator range. Still, which iterator shall we use to specify the end of the range? Iterator ranges are typically used in while -loops like this: { ... // do something beginIter++; // increment iterator } A word on the while -condition: We silently assumed the existence of a (in-)equality operator for istream_iterator ’s. How is that defined? The standard requires the following semantics:
Back to our problem: how do we express an iterator range of istream_iterator s? For the begin iterator we can simply use the istream_iterator constructed from the input stream. It represents the current stream position. If we successively increment this iterator, which means that we successively extract items from the stream, we will eventually hit the end of the stream. For the end iterator we therefore need an input stream iterator in end-of-stream state. How do we get one? The istream_iterator ’s default constructor creates it. Note, that the only input stream iterator ranges are from the current stream position to the end of the stream. It is not possible to specify a range from one stream position to another stream position, because any two non-end-of-stream iterators referring to the same stream always compare equal. These explanations hopefully clarify the previous example, where we counted how often the word ‘the’ occurred in a text file: istream_iterator<string> beginÍter(str); // line 1 istream_iterator<string> endIter; // line 2 cout << "number of _the_: " << count(beginIter, endIter, "the"); // line 3 Reset after an error has occurred By comparing an istream_iterator to an end-of-stream iterator, it is possible to detect if a stream error has occurred. Yet, istream_iterator s have no feature to reset an iterator that has gone into an end-of-stream state. If the error is not fatal we can do the following: clear the stream’s error state, construct a new istream_iterator , which then represents the current stream position, and restart the algorithm with this iterator as the begin iterator: istream_iterator<string> newBeginIter(str); cout << "... and we have found "; cout << count(newBeginIter, endIter, "the"); cout << " more _the_"; The detailed descriptions of the ostream_iterator and istream_iterator revealed quite a number of the differences between stream and container iterators.
Yet another difference is that stream iterators are one-pass iterators , that is, you can access an element referred to by a stream iterator only once. For instance, we cannot re-read elements through a memorized iterator. The following would fail: memorizing the begin position of the stream, then extracting elements from the stream, and later trying to re-read the first element through the memorized begin iterator. The reason is that once extracted the element is consumed and cannot be re-read. The single-pass property can be best understood in terms of i/o from/to an terminal. Once we’ve read from the terminal stream, the input is consumed. Once we’ve written to the terminal stream, we cannot re-position and overwrite the output. In contrast, container iterators are multi-pass iterators. We can repeatedly access any element referred to by any iterator (except the end iterator, of course). The one-pass or multi-pass property is expressed in the iterator categories. Iterators in the input and output iterator category are one-pass iterators. Iterators in the forward, bidirectional, or random-access iterator categories must be multi-pass iterators. Is the single-pass property a restriction? Can all algorithms live with this restriction? Let’s see what algorithms typically need of an iterator. Algorithms that write output via an output iterator usually do not access the same output position twice. That would mean that they overwrite a position they had previously filled. No algorithm we know of does anything like that. Therefore, all algorithms that write output via an iterator require an iterator type of the output iterator category and happily live with its one-pass property. For this reason, an ostream_iterator can be used in all algorithms that require an output iterator. Algorithms that read input via an iterator usually take an input iterator range. Not all such algorithms can live with the one-pass restriction of input iterators like the istream_iterator . The find_end() algorithm, for instance, does a look ahead and for that matter needs the multi-pass property. In order to explain this, let’s take a closer look at the find_end() algorithm. It finds a subsequence of equal values in a sequence and returns an iterator to the begin of the subsequence. Here is an example of how it would be used: string s2 = "def"; string::iterator i = find_end(s1.begin(),s1.end(),s2.begin(),s2.end()); cout << i << endl; ForwardIterator1 find_end(ForwardIterator1 first1, ForwardIterator1 last1 ,ForwardIterator2 first2, ForwardIterator2 last2); Let us hasten to add that there, of course, are algorithms for whom one-pass input iterators perfectly suffice. The often quoted find() algorithm is an example, and so is the count() algorithm that we used in our examples. These algorithms read elements successively until they find what they’re looking for or until they’ve counted all relevant elements. No element needs to be accessed twice, no look ahead, no repositioning is required. Their interface description only asks for iterators from the input iterator category: InputIterator find (InputIterator first, InputIterator last, const T& val); template<class InputIterator, class T> size_t count(InputIterator first, InputIterator last, const T& val); SummaryStream iterators connect the world of generic programming with the input and output stream abstractions. They show that it is possible to include a formerly completely unrelated domain like IOStreams into the concepts of generic programming. The general idea is: see a stream as a kind of container. We have also recognized that the analogy between streams and containers sometimes breaks, and that a number of smart design details (such as the end-of-stream iterator) were needed to make the analogy work. However, the architectural bridge built by the stream iterators might serve as an inspiration for your own work. It takes no more than a pair of iterators to connect an old domain with the generic abstractions from the standard library.References
Klaus Kreft & Angelika Langer C++ Report, February 1999
New Features in Standard IOStreams
|
|||||||||||||||||||
© Copyright 1995-2010 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/C++Report/StreamIterators/StreamIterators.html> last update: 10 May 2010 |