|
|||||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | |||||||||||||||||||||
|
ANSI C++ - White Paper
|
||||||||||||||||||||
New Features in ANSI C++ How can we get the most out of them?
DevelopMentor Whitepaper, Vol.3,
1998
To give you an idea of what is awaiting you, here is a list of the major new features:
#include <iostream.h> #include <fstream.h> #include <stdlib.h> #include <string.h> void doIt(const char* in,const char* out) { /* allocate buffer with initial capacity */ size_t bufSiz =1024; char** buf = (char**) malloc(sizeof(char*)*bufSiz); if (buf == 0) quit(); size_t linCnt = 0; buf[linCnt] = 0; /* allocate line buffer as destination for read */ size_t linBufSiz = 256; char* linBuf = (char*) malloc(sizeof(char)*linBufSiz); if (linBuf == 0) quit(); linBuf[0] ='\0'; /* open input file */ ifstream inFile(in); /********************************************************/ /* read input */ while (!(inFile.getline(linBuf,linBufSiz)).eof() && !inFile.bad()) {/* while there is still input */ expandLinBuf(linBuf,linBufSiz,inFile); storeTok(buf,linCnt,bufSiz,linBuf); } /* sort strings */ qsort(buf, linCnt ,sizeof(char*), (int(*)(const void*,const void*))strcmp); /* open output file and write sorted strings to output file */ ofstream outFile(out); for (size_t i = 0; i<linCnt;i++) outFile<<buf[i]<<endl; }It needs a couple of helper functions, which are shown below: static void quit() { cerr << "memory exhausted" << endl; exit(1); } static void expandLinBuf(char*& linBuf, size_t& linBufSiz,ifstream& inFile ) { while (!inFile.eof()&&!inFile.bad()&&strlen(linBuf)==linBufSiz-1) { /* while line does not fit into string buffer */ /* reallocate line buffer */ linBufSiz += linBufSiz; linBuf = (char*) realloc(linBuf,sizeof(char)*linBufSiz); if (linBuf == 0) quit(); /* read more into buffer */ inFile.getline(linBuf+linBufSiz/2-1,linBufSiz/2+1); } } static void storeTok(char**& buf, size_t& linCnt, size_t& bufSiz, const char* token) { /* allocate memory for a copy of the token */ size_t tokLen =strlen(token); buf[linCnt] = (char*) malloc(sizeof(char)*tokLen+1); if (buf[linCnt] == 0) quit(); /* copy the token */ strncpy(buf[linCnt++],token,tokLen+1); /* expand the buffer, if full */ if (linCnt == bufSiz) { bufSiz +=bufSiz; buf = (char**) realloc(buf,sizeof(char*)*bufSiz); if (buf == 0) quit(); } }Quite a bit of code, isn't it? Basically, the program must provide and manage the memory for a line buffer into which the characters extracted from the file are stored. Plus it manages the memory for an array that holds all lines for subsequent invocation of the qsort() function. Both buffers must be of dynamic size, because neither the length nor the number of lines are known in advance. Reallocations might be necessary, which complicates matters even further. In ANSI C++ the read-sort-write program boils down to something as concise and elegant as this: #include <fstream> #include <string> #include <set> #include <algorithm> using namespace ::std; void doIt(const char* in,const char* out) { set<string> buf; string linBuf; ifstream inFile(in); while(getline(inFile,linBuf)) buf.insert(linBuf); ofstream outFile(out); copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n")); }Why is it such a piece of cake in ANSI C++ compared to the effort that it takes in classic C++? The answer lies in the use of abstractions such as string and set . They take over all the memory management chores that we had to do manually in the pre-standard version of the program. All the allocation and reallocation stuff is handled by string and set ; they manage their memory themselves and we don't have to care any longer. Plus, the set is an ordered collection of elements and we do not even have to sort it explicitly. Error indication is also much simpler. Situations such as memory exhaustion need not be indicated explicitly; instead the operator new , which is called somewhere in the innards of string and set , will raise a bad_alloc exception that is automatically propagated to the caller of our doIt() function. We need not do anything for error indication.
void doIt(const char* in,const char* out) { vector<string> buf; vector<string>::iterator insAt = buf.end(); string linBuf; ifstream inFile(in); while(getline(inFile,linBuf)) buf.insert(insAt,linBuf); sort(buf.begin(),buf.end()); ofstream outFile(out); copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n")); }Looks good, doesn't it? It compiles, but - too bad - at run time it crashes. Why? What is wrong here? We need to look under the hood of the vector container if we want to understand what is happening here. How is vector organized and what precisely does the insert() function do? A vector internally is a contiguous memory space. Insertion into a vector means that all elements after the point of insertion are moved to the back, in order make room for the new element, and then the new element is added to the collection. A side effect is that all references to elements after the point of insertion become invalid. Now, the insert() function inserts the new element before the specified location. The point of insertion itself, in our example designated by the iterator insAt , becomes invalid as a side effect of the insertion. Any subsequent access to the element referred to by insAt might lead to a crash. This explains why our innocent program crashes after the first insertion of a line into the vector container. There are several solutions to this problem. The insert() function returns an iterator to the newly inserted element and we can use this new, valid position as the point of insertion for subsequent additions to the vector. It would look like this: void d(Itconst char* in,const char* out) { vector<string> buf; vector<string>::iterator insAt = buf.end(); string linBuf; ifstream inFile(in); while(getline(inFile,linBuf)) insAt = buf.insert(insAt,linBuf); sort(buf.begin(),buf.end()); ofstream outFile(out); copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));} More elegant and easier to comprehend is the use of the push_back() function instead of the insert() function. It inserts elements at the end of a vector . Our example then looks like this: void doIt(const char* in,const char* out) { vector<string> buf; string linBuf; ifstream inFile(in); while(getline(inFile,linBuf)) buf.push_back(linBuf); sort(buf.begin(),buf.end()); ofstream outFile(out); copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));} What do we conclude from the program crash that we inadvertently caused? To effectively use the standard library, and all the other new language features, we need to thoroughly understand them. They come with subtle pitfalls that we need to know, so that we can avoid them. Before you get scared and think: "Well, the new stuff looks cool and will most likely save me lot of work, but it also lures me into lots of booby traps - is it really worth it?", let me tell you that we have barely touched on the possibilities that open up for you by using the standard library. Just as an example: How are the lines ordered in the code snippet above? We didn't care, so what happens? Basically, what happens is a strcmp() style comparison: the strings are ordered by comparing the ASCII codes of the contained characters. Where did we say so? Well, we did not. It is the default behavior of the sort() algorithm. If no compare function, in ANSI C++ more generally called a comparitor , is provided to the sort() function, then it uses the operator<() of the element type, which in our example is string . The ANSI string class has an operator<() defined and this operator performs an ASCII compare. The sort() algorithm implicitly uses it as the sorting criteria in the example above. Equipped with this knowledge, we can consider other sorting orders. Ordering by ASCII codes does not meet the requirements of dictionary like sorting, where upper case letters appear next to their lower case equivalents. In ASCII the capital letters precede all the lower case letters, so that capital 'Z' precedes lower case 'a' . Can we provide a dictionary type ordering instead of the ASCII default? How about a case-sensitive ordering? How about culture dependent sorting? Foreign alphabets include interesting special characters. How do they affect the sorting order? Lots of questions .... As an example, let us consider a culture sensitive sorting order. The standard library includes predefined abstractions for internationalization of programs. Among them is class locale , which provides culture dependent string collation via its overloaded function call operator. An object of a class type that has the function call operator overloaded is called a functor in ANSI C++ and can be invoked like a function. In particular, we can pass it to the sort() algorithm as the sorting criteria. Here is the respective code: #include <fstream> #include <string> #include <vector> #include <algorithm> #include <locale> using namespace ::std; void doIt(const char* in,const char* out) { vector<string> buf; string linBuf; ifstream inFile(in); while(getline(inFile,linBuf)) buf.push_back(linBuf); // works sort(buf.begin(),buf.end(),locale("German")); ofstream outFile(out); copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n")); }The culture dependent sorting order just serves as an example here. We can define any other sorting criteria, as a function or as a functor, and plug it in with comparable ease. It works so nicely because the sort() algorithm is a function template that has the type of the comparitor as a template argument. This way you can use any type of comparitor for sorting. As you can see, the standard library makes your programs significantly more flexible and easy to extend.
|
|||||||||||||||||||||
© Copyright 1995-2012 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/Papers/AnsiC++/AnsiC++WhitePaper.htm> last update: 4 Nov 2012 |