Subsections

A simple utility

When you are writing computer programs, it is very useful to be able to copy your Algol 68 source programs to a printer with line numbers. Many editors, including vim, Emacs and FTE, use line numbers. When the Algol 68 compiler finds an error in your program, it displays the offending line together with its number and a descriptive message on the screen and the number of the character in the line where the error occurred. However, it is insufficient to merely copy the contents of a file to the printer (unless you are using the spooling facility of a header file) because the output will not contain any identifying information.

What is required is a small program which will optionally write line numbers and which will write the name of the file being printed together with the date and time at which the file was last modified. A page number is another useful item as it prevents pages being lost when the listing is made on separate sheets of paper. It would also be very useful to be able to specify where in a file a listing should start and where it should finish. Such a program is called a utility. Notice that the program must be able to handle zero-length lines as well as lines which are too long to be printed on one line alone. Lastly, some editors allow you to insert tab characters into your document, so the utility must be able to print the file with the correct indentation.

The preceding problem analysis shows that we could write such a program if we knew how to obtain the date and time of last modification of a file from the operating system. In the directory /usr/share/doc/algol68toc/, you will find the source of the program lf which solves the problem described above for the Linux operating system. The source of lf is 526 lines long. Compile it and run it with the argument -h. The help information displayed by a program should be displayed by every program you write which is used at the command line: it prevents accidental use from causing damage to your operating system files or directories.

The source code

There are many ways of tackling the understanding of a program, but here is a method which does help with Algol 68 programs. In summary,

  1. See what the program does.
  2. Look at the principal processing.
  3. Examine the mode declarations.
  4. Examine the routines.
  5. Repeat steps 2-4 for each routine.

Stage one of examining a program is to see what it does. Examples of its output, and possibly its input, help you to identify the actions of various parts of the program. Documentation of the input and output would suffice, but neither exists in this case because the input is a plain text file and the output is better seen than described. Compile the Algol 68 example program lf in

   /usr/share/doc/a68toc/examples

and use it to list the file test.lf (in the same directory) with line numbers on your printer using the command

   lf -pg -n test.lf | lpr

to pipe the output to the printer unless you have a LaserJet 4 or 6L when you can omit the -pg argument. Notice that the time and date the file was last modified appears at the top of each page, together with the identifier of the file and the page number. If you used the -n parameter to print the test file, each line will be preceded by a line number and a colon. If you did not list the file with line numbers, do so now because the line numbers will highlight another feature of the program. The first line in test.lf is too long to be printed on one line, so the program breaks it into two parts. The second part does not have a line number since it is part of the same line in the input.

The second stage in understanding a program is to look at the principal processing. Since procedures and other values must be declared before use in the a68toc compiler, the last part of the program contains the main processing logic. Now print (or display) the source of lf.a68 using the command

   lf -n /usr/share/doc/a68toc/pame/lf.a68

In the source, the main processing logic is on lines 427-517. Examine those lines now.

Before processing any command line arguments, the program defines the actions to take when the last argument has been read. In other words, what should be done when the logical end of file has been reached for comm line. The default action is to terminate the program immediately with a suitable error message. In lf, no identification is given for comm line in the open procedure, because it isn't relevant, but if you insert such an identification, for example, command line file, then any error message issued by the transput system will include it. Notice that although the anonymous procedure used as the second parameter for on logical file end on line 448 occurs within the IF ... FI clause, because it is a denotation (a procedure denotation) it has global scope. That is one of the reasons why anonymous procedures are so useful. Also note the use of SKIP to yield a value of mode BOOL: in fact, it will never be used because stop is a synonym for GOTO end of program.

In lines 442-517, the program processes the command line argument by argument. If an argument starts with “-” it is assumed to be an option otherwise it is assumed to be a filename. Note the use of skip terminators to skip spaces in the command line. Options that require a number (-s and -t) expect it to follow the option directly (see lines 493 and 495). Lines 500-506 process a solitary - to mean “list the standard input”. Lines 507-516 process a named file. As you examine the code, underline the identifiers of all procedure calls.

The next stage in understanding a program is to look at all the mode declarations. There are three in this program: PRINTER, SEC and STAT. You should scan the program to see what identifiers have that or a related mode and where they are used.

Routines

Finally, you need to examine the routines declared. It is a good idea, especially in a more complicated program, to list the identifiers of all procedures with nested declarations of procedures indented under their parent procedure identifiers. This helps to fix the structure of the program in your mind. Then you should examine the procedures used in the main processing loop. In lf, they are:

char in stringhelp
closeopen
disp errorprint
getprint file
get mtimeprocess file name
get numeric argreset parameters
get sectionsskip terminators

When you examine each procedure, do the same as you did for the whole program: first the main logic, then the modes, then the procedures and operators. You will need to backtrack several times in a large program. If a lot of names are declared, prepare a list together with a description of what each name is used for, where it is declared and the places where it is used. A cross-reference program would be really useful, but it is not a simple program to write for Algol 68.

The principle processing is performed by the procedure print file on lines 258-322. Firstly, tab stops are set according to the current value of tabs, then lines is initialised and an initialisation string output to the printer. If letter quality has been chosen (option -q), a special string is sent to the printer accordingly. Then the logical file end event procedure is set. Each section specified on the command line (or the default section if no sections were specified) is then printed using the procedure do line. Each line is input using get line whose principal function is to expand tab characters to the required number of spaces (3 unless set by the -t option). Lines are not output until the beg OF ss line is reached (1 unless set by the -s option). Notice the code following FROM in the preamble to the inner DO ... OD loop (on lines 313-316) which ensures that the file is reset if the sections to be printed are not ordered (the definition of ordered is in the procedure get sections (lines 381-425).

Similar to your list of nested procedures, prepare a list of procedures where indented procedures identify procedures called by the parent procedure. Here is part of the list for lf:

   fstat
      linux fstat
   help
      exit, newline, put
   reset parameters
   lf print
      ODD, print
   get mtime
      fstat, linux ctime
   get sections
      +:=
      add section
         char in string
         get numeric arg
      char in string

Dry-running example

The procedure get line (lines 232-250) and its associated procedures set tabs (lines 220-224) and tab pos (lines 226-227) are worth examining in detail. The best way to see how they work is to dry-run them. Take a blank sheet of paper and make a vertical list of all the names, both local and global, used by the procedures. Opposite in line, write a piece of text containing tab characters (a piece of indented program, for example). Then work your way through the procedure, marking the value referenced by each name as you complete each step. You should also note the value of each non-name; for example, the loop identifier i. Here is what your list could look like after going 3 times round the outer loop (the inner loop is on lines 241-244):

tabstops FFTFFTFFTFFTFFTFFTFFTFFT...
line(ln) T
in line => THEN ch:="A"
op 1 2 3 4 5 6
i 1 2 3
c  υ T

Struck-out values have been superceded and υ denotes a space. Dry-running is a very useful method, if laborious and time-consuming, of finding bugs. tab ch is declared in the standard prelude.

This utility program (lf) is quite short, but we have analysed its working in detail so that you can see how it is done.

ALIEN procedures

The utility lf uses some of the extensions provided by the a68toc compiler, in particular, the ALIEN construct which provides access to procedures compiled by other compilers. In this section we shall look at the get cwd and the fstat procedures.

The procedure fstat

The procedure fstat is on lines 100-105. It depends on a call of the linux fstat procedure whose second parameter is a name referring to a value of mode STAT. The declaration of STAT is on lines 24-41.

If you investigate the file /usr/include/statbuf.h, you will find the C definition of the stat structure therein. The STAT mode accurately reflects this structure using LONG or SHORT as appropriate. Briefly, a C unsigned int is equivalent to an Algol 68 BITS. For historical reasons, the C unsigned long int has the same meaning as an unsigned int so BITS could have been used for those fields as well. However, because the value is required as an integer (and is stored as a positive integer), it is possible to regard them as having mode INT. Some of the C modes13.3 are hidden by further mode declarations13.4, but if you hunt for __dev_t you will find it is an unsigned long long int which is equivalent to the Algol 68 LONG BITS or, as is used in STAT, LONG INT.

Now look at the declaration of linux fstat on lines 85-89. Most of this construction is C source code. The ALIEN construct may be written as

   <mode> <identifier> = ALIEN "<symbol>"
      "<C source code>";

where the angle brackets denote items to be replaced. In the declaration for linux fstat we have

followed by three lines of C source code. It is not my intention to delve into the mysteries of C. If you don't understand that language, consult someone who does. However, the point of the declaration is to map the Algol 68 modes onto the C equivalents. The C procedure fstat takes two parameters: the first has mode int (equivalent to INT) and the second of mode struct stat* which is equivalent to REF STAT. The cast in C consists of a mode in parentheses (compare with the Algol 68 cast in section 10.5) so the third line of C code ensures that the second parameter of the Algol 68 procedure linux fstat has the right mode. The A_int_INT(...) construct is a C language macro13.5 for a cast which ensures that the yielded C integer is equivalent to the Algol 68 INT. If you want to see what the a68toc compiler generates, look for FSTAT in the file lf.c.

Reverting to line 102, the field sys file OF f has the correct mode for use as the “file descriptor” for fstat. You should check the manual page of fstat (in section 2 of the Linux Programming Manual) for details of its functioning and yield.

The procedure get cwd

The procedure get cwd is more complicated because it uses several facilities provided by the standard prelude as well as another extension provided by the a68toc compiler. Firstly, look at the ALIEN declaration of linux getcwd on lines 91-93. The mode VECTOR[]CHAR is similar to the mode []CHAR, but the lower bound is always 1 and is omitted from the generated construct. In fact, a68toc translates this mode into the C equivalent of

   STRUCT(REF CHAR data, INT gc, upb)

The gc field is an integer provided for the garbage-collector (the run-time memory management system which looks after the heap). The data field is a reference to the actual data (in fact it is a memory address)13.6.

The C procedure getcwd requires two parameters: a reference to an area which it can use to return the full path of the current working directory and an integer which states how big that area is. The C source code in the declaration for linux getcwd contains the C macro

   A_VC_charptr(buf)

which expands into buf.data (equivalent to the Algol 68 expression data OF buf) and the C macro A_INT_int which converts an Algol 68 INT into a C int (directly equivalent on Linux).

The yield of linux getcwd is a reference to the area in which the current working directory path has been put. Strictly speaking, this is identical to the first parameter of the C procedure getcwd, but the GNU C compiler complains if it is used as such. To get around this, the author used the cast (void *) which effectively causes the reference to be a reference to an anonymous piece of memory. The Algol 68 equivalent is CPTR which is defined in the standard prelude as REF BITS.

Now comes the clever bit. Look at line 98. The value of mode CPTR (REF BITS) is converted by the operator CPTRTOCSTR into a value of mode CSTR (declared in the standard prelude as REF STRUCT 16000000 CHAR). Now look at the definition of that operator (on line 95)! BIOP stands for “built-in operator” and BIOP 99 is the only built-in operator implemented by the a68toc translator. BIOP 99 maps its parameter (of one mode) onto its yield (of another mode). It effectively acts as a cast (in this case) from one REF mode to another REF mode. Have a look at the C source code in lf.c if you are interested in the details. Then the value of mode CSTR is converted using the operator CSTRTORVC to a value of mode REF VECTOR[]CHAR which is dereferenced and then coerced to a value of mode STRING. In fact, the a68toc compiler will silently coerce values of mode REF STRUCT i MODE to mode REF VECTOR[]MODE and thence to REF[]MODE. Notice that you cannot coerce a value of mode REF VECTOR[]MODE to REF FLEX[]MODE. The mode STRING has no flexibility (it is equivalent to []CHAR).

Lastly, note that the parameter of linux getcwd is an anonymous VECTOR[]CHAR whose scope is limited to the scope of get cwd (the Algol 68 procedure).

If you want to examine the other macros used for the translated C source, have a look at the files in the directories

   /usr/share/a68toc/Linux
   /usr/share/a68toc/include
Sian Mountbatten 2012-01-19