When you are writing computer programs, it is very useful to be
able to copy your Algol 68 source programs to a printer with line
numbers. Many editors, including vim
, Emacs
and FTE
, use line numbers. When the Algol 68
compiler finds an error in your program, it displays the offending
line together with its number and a descriptive message on the screen
and the number of the character in the line where the error occurred.
However, it is insufficient to merely copy the contents of a file to
the printer (unless you are using the spooling facility of a header
file) because the output will not contain any identifying
information.
What is required is a small program which will optionally write line numbers and which will write the name of the file being printed together with the date and time at which the file was last modified. A page number is another useful item as it prevents pages being lost when the listing is made on separate sheets of paper. It would also be very useful to be able to specify where in a file a listing should start and where it should finish. Such a program is called a utility. Notice that the program must be able to handle zero-length lines as well as lines which are too long to be printed on one line alone. Lastly, some editors allow you to insert tab characters into your document, so the utility must be able to print the file with the correct indentation.
The preceding problem analysis shows that we could write such a
program if we knew how to obtain the date and time of last
modification of a file from the operating system. In the directory
/usr/share/doc/algol68toc/
, you will find the source of
the program lf
which solves the problem described above
for the Linux operating system. The source of lf
is 526
lines long. Compile it and run it with the
argument -h
. The help information
displayed by a program should be displayed by every program you write
which is used at the command line: it prevents accidental use from
causing damage to your operating system files or directories.
There are many ways of tackling the understanding of a program, but here is a method which does help with Algol 68 programs. In summary,
Stage one of examining a program is to see what it does. Examples
of its output, and possibly its input, help you to identify the
actions of various parts of the program. Documentation of the input
and output would suffice, but neither exists in this case because the
input is a plain text file and the output is better seen than
described. Compile the Algol 68 example program lf
in
/usr/share/doc/a68toc/examples
and use it to list the file test.lf
(in the same
directory) with line numbers on your printer using the command
lf -pg -n test.lf | lpr
to pipe the output to the printer unless you have a LaserJet 4 or
6L when you can omit the -pg
argument. Notice that the
time and date the file was last modified appears at the top of each
page, together with the identifier of the file and the page number. If
you used the -n
parameter to print the test file, each
line will be preceded by a line number and a colon. If you did not
list the file with line numbers, do so now because the line numbers
will highlight another feature of the program. The first line in
test.lf
is too long to be printed on one line, so the
program breaks it into two parts. The second part does not have a
line number since it is part of the same line in the input.
The second stage in understanding a program is to look at the
principal processing. Since procedures and other values must be
declared before use in the a68toc compiler, the
last part of the program contains the main processing
logic. Now print (or display) the source of
lf.a68
using the command
lf -n /usr/share/doc/a68toc/pame/lf.a68
In the source, the main processing logic is on lines 427-517. Examine those lines now.
Before processing any command line arguments, the program defines
the actions to take when the last argument has been read. In other
words, what should be done when the logical end of file has been
reached for comm line
. The default action is to terminate
the program immediately with a suitable error message. In
lf
, no identification is given for comm line
in the open
procedure, because it isn't relevant, but if
you insert such an identification, for example, command line
file
, then any error message issued by the transput system will
include it. Notice that although the anonymous procedure used as the
second parameter for on logical file end
on line 448
occurs within the IF ... FI
clause, because it is a
denotation (a procedure denotation) it has global scope. That is one
of the reasons why anonymous procedures are so useful. Also note the
use of SKIP to yield a value of mode
BOOL
: in fact, it will never be used because
stop is a synonym for GOTO end of
program
.
In lines 442-517, the program processes the command line argument
by argument. If an argument starts with “-” it is assumed
to be an option otherwise it is assumed to be a filename. Note the use
of skip terminators
to skip spaces in the command line.
Options that require a number (-s
and -t
)
expect it to follow the option directly (see lines 493 and 495). Lines
500-506 process a solitary -
to mean “list the
standard input”. Lines 507-516 process a named file. As you
examine the code, underline the identifiers of all procedure
calls.
The next stage in understanding a program is to look at all the
mode declarations. There are three in this
program: PRINTER
, SEC
and STAT
.
You should scan the program to see what identifiers have that or a
related mode and where they are used.
Finally, you need to examine the routines declared. It is a good
idea, especially in a more complicated program, to list the
identifiers of all procedures with nested declarations of procedures
indented under their parent procedure identifiers. This helps to fix
the structure of the program in your mind. Then you should examine the
procedures used in the main processing loop. In lf
, they
are:
char in string | help |
close | open |
disp error | |
get | print file |
get mtime | process file name |
get numeric arg | reset parameters |
get sections | skip terminators |
When you examine each procedure, do the same as you did for the whole program: first the main logic, then the modes, then the procedures and operators. You will need to backtrack several times in a large program. If a lot of names are declared, prepare a list together with a description of what each name is used for, where it is declared and the places where it is used. A cross-reference program would be really useful, but it is not a simple program to write for Algol 68.
The principle processing is performed by the procedure
print
file
on lines 258-322. Firstly, tab
stops are set according to the current value of tabs
,
then lines
is initialised and an initialisation string
output to the printer. If letter quality has been chosen (option
-q
), a special string is sent to the printer accordingly.
Then the logical
file
end
event
procedure is set. Each section specified on the command line (or the
default section if no sections were specified) is then printed using
the procedure do line
. Each line is input using get
line
whose principal function is to expand tab characters to
the required number of spaces (3 unless set by the -t
option). Lines are not output until the beg OF ss
line is
reached (1 unless set by the -s
option). Notice the code
following FROM
in the preamble to the inner DO ...
OD
loop (on lines 313-316) which ensures that the file is reset
if the sections to be printed are not ordered (the definition of
ordered is in the procedure get sections
(lines
381-425).
Similar to your list of nested procedures, prepare a list of
procedures where indented procedures identify procedures called by
the parent procedure. Here is part of the list for
lf
:
fstat linux fstat help exit, newline, put reset parameters lf print ODD, print get mtime fstat, linux ctime get sections +:= add section char in string get numeric arg char in string
The procedure get line
(lines 232-250) and its
associated procedures set tabs
(lines 220-224) and
tab pos
(lines 226-227) are worth examining in detail.
The best way to see how they work is to dry-run them. Take a blank
sheet of paper and make a vertical list of all the names, both local
and global, used by the procedures. Opposite
in line
, write a piece of text containing tab characters
(a piece of indented program, for example). Then work your way
through the procedure, marking the value referenced by each name as
you complete each step. You should also note the value of each
non-name; for example, the loop identifier i
. Here is
what your list could look like after going 3 times round the outer
loop (the inner loop is on lines 241-244):
tabstops | FFTFFTFFTFFTFFTFFTFFTFFT... | |
line(ln) | T |
|
in line | =>
THEN ch:="A" |
|
op |
| |
i | ||
c |
Struck-out values have been superceded and υ denotes a space. Dry-running is a very useful method, if laborious and time-consuming, of finding bugs. tab ch is declared in the standard prelude.
This utility program (lf
) is quite short, but we have
analysed its working in detail so that you can see how it is done.
The utility lf
uses some of the extensions provided by
the a68toc compiler, in particular, the
ALIEN
construct which provides access to procedures
compiled by other compilers. In this section we shall look at the
get cwd
and the fstat
procedures.
The procedure fstat
is on lines 100-105. It depends on
a call of the linux fstat
procedure whose second
parameter is a name referring to a value of mode STAT
.
The declaration of STAT
is on lines 24-41.
If you investigate the file /usr/include/statbuf.h
,
you will find the C definition of the stat
structure
therein. The STAT
mode accurately reflects this structure
using LONG
or SHORT
as appropriate. Briefly,
a C unsigned int
is equivalent to an Algol 68
BITS
. For historical reasons, the C unsigned
long int
has the same meaning as an unsigned int
so BITS
could have been used for those fields as well.
However, because the value is required as an integer (and is stored as
a positive integer), it is possible to regard them as having mode
INT
. Some of the C modes13.3 are
hidden by further mode declarations13.4, but
if you hunt for __dev_t
you will find it is an
unsigned long long int
which is equivalent to the Algol
68 LONG BITS
or, as is used in STAT
,
LONG INT
.
Now look at the declaration of linux fstat
on lines
85-89. Most of this construction is C source code. The
ALIEN
construct may be written as
<mode> <identifier> = ALIEN "<symbol>" "<C source code>";
where the angle brackets denote items to be replaced. In the
declaration for linux fstat
we have
<mode> = PROC(INT,REF STAT)INT
<identifier> = linux fstat
<symbol> = FSTAT
followed by three lines of C source code. It is not my
intention to delve into the mysteries of C. If you don't understand
that language, consult someone who does. However, the point of the
declaration is to map the Algol 68 modes onto the C equivalents.
The C procedure fstat
takes two parameters: the
first has mode int
(equivalent to INT
) and
the second of mode struct stat*
which is equivalent to
REF STAT
. The cast in C consists of a mode in
parentheses (compare with the Algol 68 cast in
section 10.5) so the third
line of C code ensures that the second parameter of the Algol 68
procedure linux fstat
has the right mode. The
A_int_INT(...)
construct is a C language
macro13.5 for a
cast which ensures that the yielded C integer is equivalent to
the Algol 68 INT
. If you want to see what the
a68toc compiler generates, look for
FSTAT
in the file lf.c
.
Reverting to line 102, the field sys file OF f
has the correct
mode for use as the “file descriptor” for fstat
. You should
check the manual page of fstat
(in section 2 of the Linux
Programming Manual) for details of its functioning and yield.
The procedure get cwd
is more complicated because it
uses several facilities provided by the standard prelude as well as
another extension provided by the a68toc compiler.
Firstly, look at the ALIEN
declaration of linux
getcwd
on lines 91-93. The mode VECTOR[]CHAR
is
similar to the mode []CHAR
, but the lower bound is always
1 and is omitted from the generated construct. In fact, a68toc
translates this mode into the C equivalent of
STRUCT(REF CHAR data, INT gc, upb)
The gc
field is an integer provided for the
garbage-collector (the run-time memory management system which looks
after the heap). The data
field is a reference to the
actual data (in fact it is a memory address)13.6.
The C procedure getcwd
requires two parameters: a
reference to an area which it can use to return the full path of the
current working directory and an integer which states how big that
area is. The C source code in the declaration for linux
getcwd
contains the C macro
A_VC_charptr(buf)
which expands into buf.data
(equivalent to the
Algol 68 expression data OF buf
) and the
C macro A_INT_int
which converts an Algol 68
INT
into a C int
(directly equivalent
on Linux).
The yield of linux getcwd
is a reference to the area
in which the current working directory path has been put. Strictly
speaking, this is identical to the first parameter of the
C procedure getcwd
, but the GNU C compiler
complains if it is used as such. To get around this, the author used
the cast (void *)
which effectively causes the reference
to be a reference to an anonymous piece of memory. The Algol 68
equivalent is CPTR
which is defined in the standard
prelude as REF BITS
.
Now comes the clever bit. Look at line 98. The value of mode
CPTR
(REF BITS
) is converted by the operator
CPTRTOCSTR
into a value of mode CSTR
(declared in the standard prelude as REF STRUCT 16000000
CHAR
). Now look at the definition of that operator (on line
95)! BIOP
stands for “built-in operator” and
BIOP 99
is the only built-in operator implemented by the
a68toc translator. BIOP 99
maps its parameter (of one
mode) onto its yield (of another mode). It effectively acts as a cast
(in this case) from one REF
mode to another
REF
mode. Have a look at the C source code in
lf.c
if you are interested in the details. Then the value
of mode CSTR
is converted using the operator
CSTRTORVC
to a value of mode REF
VECTOR[]CHAR
which is dereferenced and then coerced to a value
of mode STRING
. In fact, the a68toc compiler will
silently coerce values of mode REF STRUCT i MODE
to mode
REF VECTOR[]MODE
and thence to REF[]MODE
.
Notice that you cannot coerce a value of mode REF
VECTOR[]MODE
to REF FLEX[]MODE
. The mode
STRING
has no flexibility (it is equivalent to
[]CHAR
).
Lastly, note that the parameter of linux getcwd
is an
anonymous VECTOR[]CHAR
whose scope
is limited to the scope of get cwd
(the Algol 68
procedure).
If you want to examine the other macros used for the translated C source, have a look at the files in the directories
/usr/share/a68toc/Linux /usr/share/a68toc/includeSian Mountbatten 2012-01-19