The first stage in the development of a new program consists of analysing the problem that the program must solve. Unfortunately, there is no known method or methodology which will solve any kind of problem. However, a particularly good book on problem solving was written by George Pólya(see the Bibliography) and although the book is geared towards mathematical problems, it will help you solve most technical problems.
Problem analysis is not usually taught to beginners at computer programming because, so far as we know, it is mainly an intuitive activity (it is a branch of Heuristics). Learning to analyse a problem with the intention of writing a computer program is largely accomplished by writing simple programs followed by programs of increasing sophistication--this is sometimes called “learning by doing”. When we start analysing actual programs later in the chapter, each such analysis will be preceded by a problem analysis. You will be able to see how the program, as presented, accords with that analysis.
Nevertheless, even though no definitive method can be given, there are guidelines which help you to appreciate and analyse problems suitable for computer solution. In the field of systems analysis, you will find various methodologies (such as SSADM). These are usually geared towards large-scale systems and are designed to prevent systems designers from forgetting details. In the context of program design, knowing the data to be used by the program and the data to be produced by the program is the principal guide to knowing what manipulations the program must perform. Data knowledge specifies the books accessed by the program and usually constitutes a substantial part of the program's documentation.
Once you know the data your program operates on, you can determine the actual manipulations, or calculations, required. At this stage, you should be able to determine which data structures are suitable for the solution of your problem. The data structures in turn lead you to the mode declarations. The kind of data structure also helps to determine the kind of procedures required. Some examples: if your data structures include a queue, then queue procedures will be needed; or, if you are using multiples (repeated data), then you will almost invariably be using loops. Again, if an input book contains structured data, such as an item which is repeated many times, then again your program will contain a processing loop. The Jackson programming methodology is a useful way of specifying procedures given the data structures to be manipulated (see the bibliography).
After you have determined suitable modes and procedures, you need to analyse the problem in a top-down manner. Basically, top-down analysis consists of determining the principal actions needed to perform a given action, then analysing each of the principal actions in the same way. For example, suppose we wished to write a program to copy a book whose identifier is given on the command line. The topmost statement of the problem could be
copy an identified book
The next stage could be
get the book identifier open the book establish the output copy book copy the input book to output close both books
At this stage, the process “copy the input book to
output” will depend on the structure of the input book. If it is
text, with lines of differing length, you could use a name of mode
REF STRING
. If the book contains similar groupings of
data, called records, then it would be more
appropriate to declare a structured mode and write appropriate input
and output procedures:
DO get record from input book put record to output book OD
The analysis is continued until each action can be directly coded.
Before you start coding the program (writing the actual Algol 68 source program), you should be aware of various programming strategies besides the different means of manipulating data structures. The first to address is the matter of source program layout.
In the examples given in this book, code has been indented to
reflect program structure, but even in this matter, there are choices.
For example, some people indent the THEN
and
ELSE
clauses of an IF
clause:
IF ... THEN ... ELSE ... FI
instead of
IF ... THEN ... ELSE ... FI
Others regard the parts of the IF
clause as some kind
of bracketing:
IF ... THEN ... ELSE ... FI
Some people write a procedure as:
PROC ... BEGIN ... END
Others never use BEGIN
and END
, but only
use parentheses.
Another point is whether to put more than one phrase on the same line. And what about blank lines--these usually improve a program's legibility. Whatever you decide, keep to your decision throughout the program (or most of the program) otherwise the format of the code may prove confusing. Of course, you will learn by your mistakes and usually you will change your programming style over the years.
Another matter is whether to group declarations. Unlike many
programming languages, Algol 68 allows you to place declarations
wherever you wish. This does not mean that you should therefore
sprinkle declarations throughout your program, although there is
something to be said for declarations being as local as possible.
There are also advantages in grouping all your
global declarations so that they can be found easily. Generally
speaking, it is a good idea to group all global
names together (those in the outermost range) and
within that grouping, to declare together all names which use the same
base mode (for example, group declarations of modes CHAR
,
[]CHAR
and STRING
). Some of the exercises
in this book only declare names when they are immediately followed by
related procedures. If your program needs many global
names, it makes sense to declare them near the
beginning of the program, after mode declarations, so that if
subsequent changes are required, you know that all the global name
declarations are together and therefore you are unlikely to miss
any.
The next consideration is breaking your code into procedures. As you analyse the problem, you will find that some of the processing can be specified in a single line which must be analysed further before it can be directly coded. Such a line is a good indication that that process should be written as a procedure. Even a procedure which is used once only is worth writing if the internal logic is more than a couple of conditional clauses, or more than one conditional clause even.
You also have to decide between repeating a procedure in a loop, or placing the loop in the procedure. Deciding the level at which logic should be put in a procedure is largely the product of experience--yours and other people's--another reason for maintaining existing programs.
When you have decided where to use procedures, you should then consider the interface between the procedure and the code that calls it. What parameters should it have, what yield, should you use a united mode for the yield, and so on. Try to have as few parameters as possible, but preferably use parameters rather than assign to names global to the procedure. The design of individual procedures is similar to the design of a complete program.
When you are coding a procedure, be especially careful with compound Boolean formulæ. From experience, this is where most mistakes arise. If you are writing a procedure which manipulates a linked list, draw a diagram of what you are trying to do. That is much easier than trying to picture the structures in your head.
Problems can arise when dealing with money in computer programs because the value stored must be exact. For this reason, it is usually argued that only integers should be used. In fact, real numbers can be used provided that the precision of the mantissa is not exceeded. Real numbers are stored in two parts: the mantissa, which contains the significant digits of the value, and the exponent, which multiplies that value by a power of 2. In other words, using decimal arithmetic, the number 3⋅14159×10-43 has 3⋅14159 as a mantissa and -43 as an exponent. Because real numbers are stored in binary (radix 2), the mantissa is stored as a value in the range 1 ≤ value < 2 with the exponent adjusted appropriately.
There are a number of identifiers declared in the standard prelude, known as environment enquiries, which serve to determine the range and precision of real numbers. The real precision is the number of bits used to store the mantissa, while the value max exp real is the maximum exponent which can be stored for a binary mantissa (not the number of bits, although it is a guide to that number). The real width and exp width say how many decimal digits can be written for the mantissa and the exponent. The values max real and min real are the maximum and minimum real numbers which can be stored in the computer. All these values are specified by the IEEE 754-1985 standard on “Binary Floating-Point Arithmetic” which is implemented by most microprocessors today.
The value of real width
is 15 meaning that 15 decimal
digits can be stored accurately. Leaving a margin of safety, we can
say that an integer with 14 digits can be stored accurately, so that
the maximum amount is
units. If the unit of currency is divided into smaller units, such as the sterling pound into pence, or the dollar into cents, then the monetary value should be stored in the smaller unit unless it is known that the smaller unit is not required. Thus the greatest sterling amount that can be handled would appear to be £999,999,999,999.99.
However, Algol 68 allows arithmetic values to be stored to a lesser or greater precision. The modes INT, REAL, COMPL and BITS can be preceded by any number of SHORTs or LONGs (but not both). Thus
LONG LONG LONG REAL r;
is a valid declaration for a name which can refer to an exceptionally precise real. When declaring identifiers of other precisions, denotations of the required precision can be obtained by using a cast with the standard denotation of the value as in
LONG REAL lr = LONG REAL(1);
One alternative is to use LONG
with the
denotation:
LONG REAL lr = LONG 1.0;
Another is to use the LENG operator,
which converts a value of mode INT
or REAL
to a value of the next longer precision, as in
LONG REAL lr = LENG 1.0;
SHORT SHORT INT ssi = SHORTEN SHORTEN 3;
All the arithmetic operators are valid for all the
LONG
and SHORT
modes. Although you can write
as many LONG
s or SHORT
s as you like, any
implementation of Algol 68 will provide only a limited number.
The number of different precisions available is given by some
identifiers in the standard prelude called
environment enquiries. They are
The values for complex numbers are the same as those for reals.
For integers, where int lengths
is greater than
1
, long max int and so on are
also declared, and similarly for short max
int. If int lengths
is
1
, then only the mode INT
is available.
int lengths=2 int shorths=3
Thus it is meaningful to write
LONG INT long int:=long max int; INT int:=max int; SHORT INT sh int:=short max int; SHORT SHORT INT sh sh int:= short short max int;
The same applies to the mode BITS
. Try writing a
program which prints out the values of the environment enquiries
mentioned in this section. The transput procedures get
,
put
, get
bin
and put
bin
all handle the available LONG
and
SHORT
modes.
Although you can still write
LONG LONG INT lli=LONG LONG 3;
the actual value created may not differ from LONG INT
depending on the value of int lengths
. Note that you
cannot transput a value which is not covered by the available
lengths/shorths. Use LENG
or SHORTEN
before
trying to transput.
For monetary values, LONG INT
is available with the
value of long max int
being
9,223,372,036,854,775,807
which should be big enough for most amounts.
There are two well-known rules about optimisation:
However, often there is a great temptation to optimise code, particularly if two procedures are very similar. Using identity declarations is a good form of optimisation because not only do they save some writing, they also lead to more efficient code. However, you should avoid procedure optimisation like the plague because it usually leads to more complicated or obscure code. A good indicator of bad optimisation is the necessity of extra conditional clauses. In general, optimisation is never a primary consideration: you might save a few milliseconds of computer time at the expense of a few hours of programmer time.
When writing a program, there is a strong tendency to write hundreds of lines of code and then test it all at once. Resist it. The actual writing of a program rarely occupies more than 30% of the whole development time. If you write your overall logic, test it and it works, you will progress much faster than if you had written the whole program. Once your overall logic works, you can code constituent procedures, gradually refining your test data (see below) so that you are sure your program works at each stage. By the time you complete the writing of your program, most of it should already be working. You can then test it thoroughly. The added advantage of step-wise testing is that you can be sure of exercising more of your code. Your test data will also be simpler.
The idea behind devising test data is not just
giving your program correct data to see whether it will produce the
desired results. Almost every program is designed to deal with
exception conditions. For example, the lf
program has to
be able to cope with blank lines (usually, zero-length lines) so the
test data should contain not one blank line, but also two consecutive
blank lines. It also has to be able to cope with extra-long lines, so
the test data should contain at least one of those. Programs which
check input data for validity need to be tested extensively with
erroneous data.
It is particularly important that you test your programs with data designed to exercise boundary conditions. For example, suppose the creation of an output book fails due to a full hard disk. Have you tested it, and does your program terminate sensibly with a meaningful error message? You could try testing your program with the output book being created on a floppy disk which is full.
Sometimes a program will fault with a run-time error such as
Run time fault (aborting): Subscript out of bounds
or errors associated with slicing or trimming multiples. A good way of discovering what has gone wrong is to write a monitor procedure on the lines of
PROC monitor=(INT a, []UNION(SIMPLOUT, PROC(REF FILE)VOID)r )VOID: BEGIN print(("*** ",whole(a,0))); print(r) END
and then call monitor
with an identifying number and
string at various points in the program. For example, if you think a
multiple subscript is suspect, you could write
monitor(20,("Subscript=",whole(subscript,0)))
By placing monitors at judicious points, you can follow the action of your program. This can be particularly useful for a program that loops unexpectedly: monitors will tell you what has gone wrong. If you need to collect a large amount of monitors, it is best to send the output to a book. The disadvantage of this is that the operating system does not register a book as having a size until it has been closed after creating. This means that if your program creates a monitoring book, writes a large amount of data to it and fails before the book is closed, you will not be able to read any of the contents because, according to most operating systems, there will not be any contents. A way round this problem is to open the book whenever you want to write to it, position the writing position at the end of the book, write your data to it and then close the book. This will ensure that the book will have all the executed monitors (unless, of course, it is a monitor which has caused the program to fail!). The procedure debug given in section 9.9 will do this.
An alternative method of tracing the action of a program at
run-time is to use a source-level debugger. The
DDD
program can help you debug the C source program
produced by the a68toc compiler, but unless you
understand the C programming language and the output of the
a68toc compiler, you will not find it useful. Monitors, although an
old-fashioned solution to program debugging, are still the best means
of gathering data about program execution.
Another proven method of debugging (the
process of removing bugs) is dry-running.
This involves acting as though you are the computer and executing a
small portion of program accordingly. An example will be given in the
analysis of the lf
program later.
Sometimes, no matter what you do, it just seems impossible to find out what has gone wrong. There are three ploys you can try. The first, and easiest, is to imagine that you are explaining your program to a friend. The second is to actually explain it to a friend! This finds most errors. Finally, if all else fails, contact the author.
You can trust the compiler to find grammatical errors in your program if any are there. The compiler will not display an error message for some weird, but legal, construction. If your program is syntactically correct (that is, it is legal according to the rules of the language), then it will parse correctly.
When compiling a program of more than a hundred lines, say, you can
use the parsing option (-check
) which will more than
double the speed of compilation. When your program parses without
error, then it is worth doing a straight compilation (see the online
documentation for program mm
in the
a68toc compilation system).
A definitive list of error messages can be found in the file
algol68toc-1.12/src/message.a68
You will find that most of the messages are easy to understand.
Occasionally, you will get a message which seems to make no sense at
all. This is usually because the actual error occurs much earlier in
your program. By the time the compiler has discovered something wrong,
it may well have compiled (or tried to compile) several hundred lines
of code. A typical error of this sort is starting a comment and not
finishing it, especially if you start the comment with an opening
brace ({
), which gives rise to the following error
message:
ERROR (112) end of file inside comment or pragmat
If you start a comment with a sharp (#
) and forget to
finish it likewise, the next time a sharp appears at the beginning of
another comment, the compiler will announce all sorts of weird
errors.13.1
Another kind of troublesome error is to insert an extra closing
parenthesis or END
. This can produce lots of spurious
errors. For example:
ERROR (118) FI expected here (at character 48) ERROR (203) ELSE not expected here (at character 4) ERROR (140) BOOL, INT or UNION required here, not VOID ERROR (116) brackets mismatch (at character 2) ERROR (159) elements of in-parts must be units ERROR (117) FINISH expected here (at character 3)
Omitting a semicolon, or inadvertently inserting one will also cause the appearance of curious error messages. Messages about UNIONs usually mean that you should use a cast to ensure that the compiler knows which mode you mean. If, for example, you have a procedure which expects a multiple of mode
[]UNION(STRING,[]INT)and you present a parameter like
((1,2),(4,2),(0,4))
then the compiler will not know whether the display is a
row-display or a
structure-display. Either you should precede it
with a suitable mode, or modify your procedure to take a single
[]INT
and loop through it in twos. Having to modify your
program because the compiler does not like what you have written is
rare however.
Sometimes your program will fail at the time of elaboration or
“run-time” due to arithmetic overflow.
If, during a calculation, an intermediate result exceeds the capacity
of an INT
, no indication will be given other than
erroneous results.
Overflow of REAL
numbers can be detected by the
floating-point unit. The standard prelude contains the value fpu
cw algol 68 round
of mode SHORT BITS
and the
procedure
PROC set fpu cw = (SHORT BITS cw)VOID:
The small test program testov
(to be found with the
a68toc compilation system documentation)
illustrates testing for overflow both with integers and real
numbers.
The most tedious aspect of writing a program is documenting it. Even if you describe what the program is going to do before you write it, but after you have designed it, documentation is not usually a vitally interesting task. Large programming teams often have the services of a technical writer whose job it is to ensure that all program documentation is completed.13.2
Existing programs are usually documented and there is no doubt that the best way of learning to document a program is to see how others have done it. There are several documentation standards in use, although most large companies have their own. Generally speaking, the documentation for a program should contain at least the following
but not necessarily in the order given above. The aim of program documentation is to make it easy to amend the program, or to use it for a subsequent rewrite.
Lastly, it is worthwhile saying “don't be rigid in program design”. If, as you reach the more detailed stages of designing your program, you discover that you have made a mistake in the high-level design, be willing to backtrack and revise it. Design faults are usually attributable to faulty analysis of the problem.
Sian Mountbatten 2012-01-19