Text File I/O in C and in C++

A text file is a disk file which consists of lines of text. The end of each line is marked by a special byte or a special pair of bytes. In the DOS/Windows environment an end-of-line is marked by the pair carriage_return (byte value 13) plus linefeed (byte value 10). Every C/C++ programmer has to master the techniques of reading data from text files and writing data to text files. This is done differently in C++ from the way it is done in C.

The difference will be illustrated here by two versions of a program called REMTAB, which is designed to remove tab characters (byte value 9) from text files. Contrary to what you might think this program (i) has a use (beyond being pedagogical) and (ii) is non-trivial to write.

An example of usefulness is in processing of files which contain tab characters but are such that certain parts of each line should occur at certain column positions (e.g. FORTRAN source code files). The occurrence of tab characters messes up the alignment. The solution is to replace the tab characters with the appropriate number of spaces.

This is non-trivial to do because it is not sufficient simply to replace each tab character with a fixed number (e.g. four) of spaces. A tab represents the number of spaces required to move to the next tab stop, and this depends on the position of the tab character itself. For example, if tab stops occur at every 4th position in a line then the byte sequence "[A][tab][B]" should be expanded to "[A][space][space][space][B]", whereas the byte sequence "[A][B][tab][C]" should be expanded to "[A][B][space][space][C]" (i.e. the tab character is to be replaced by two spaces, not by three as in the first case).

REMTAB.C contains the C source code for a program to do this. REMTAB.CPP contains the same program rewritten in C++. REMTAB.EXE is the 32-bit Intel executable. All three files are contained in REMTAB.ZIP. The ZIP file also contains a small file eg.txt which contains 10 tab characters.

The REMTAB.EXE in the ZIP file is the C version. Its size is 35,328 bytes, compared to the 49,152 bytes of the C++ version (both compiled using Visual C++ Version 5.0 in Release mode).

The syntax of the program is:

REMTAB input_file_name output_file_name

or

REMTAB input_file_name output_file_name size_of_tab

The tab size defaults to 4. The name of the output file must be different from that of the input file.