C Program Checker: Lint Overview
C Program Checker: Lint Overview
S. C. Johnson
ABSTRACT
S. C. Johnson
Set/Used Information
Lint attempts to detect cases where a variable is used before it is set. This is very difficult to do well;
many algorithms take a good deal of time and space, and still produce messages about perfectly valid pro-
grams. Lint detects local variables (automatic and register storage classes) whose first use appears physi-
cally earlier in the input file than the first assignment to the variable. It assumes that taking the address of a
variable constitutes a ‘‘use,’’ since the actual use may occur at any later time, in a data dependent fashion.
The restriction to the physical appearance of variables in the file makes the algorithm very simple and
quick to implement, since the true flow of control need not be discovered. It does mean that lint can com-
plain about some programs which are legal, but these programs would probably be considered bad on
stylistic grounds (e.g. might contain at least two goto’s). Because static and external variables are initial-
ized to 0, no meaningful information can be discovered about their uses. The algorithm deals correctly,
however, with initialized automatic variables, and variables which are used in the expression which first sets
them.
The set/used information also permits recognition of those local variables which are set and never
used; these form a frequent source of inefficiencies, and may also be symptomatic of bugs.
Flow of Control
Lint attempts to detect unreachable portions of the programs which it processes. It will complain
about unlabeled statements immediately following goto, break, continue, or return statements. An
attempt is made to detect loops which can never be left at the bottom, detecting the special cases while( 1 )
and for(;;) as infinite loops. Lint also complains about loops which cannot be entered at the top; some valid
programs may have such loops, but at best they are bad style, at worst bugs.
Lint has an important area of blindness in the flow of control algorithm: it has no way of detecting
functions which are called and never return. Thus, a call to exit may cause unreachable code which lint
does not detect; the most serious effects of this are in the determination of returned function values (see the
next section).
One form of unreachable statement is not usually complained about by lint; a break statement that
cannot be reached causes no message. Programs generated by yacc, Johnson Yacc 1975 and especially lex,
Lesk Lex may have literally hundreds of unreachable break statements. The −O flag in the C compiler will
often eliminate the resulting object code inefficiency. Thus, these unreached statements are of little impor-
tance, there is typically nothing the user can do about them, and the resulting messages would clutter up the
lint output. If these messages are desired, lint can be invoked with the −b option.
Function Values
Sometimes functions return values which are never used; sometimes programs incorrectly use func-
tion ‘‘values’’ which have never been returned. Lint addresses this problem in a number of ways.
-3-
Type Checking
Lint enforces the type checking rules of C more strictly than the compilers do. The additional check-
ing is in four major areas: across certain binary operators and implied assignments, at the structure selection
operators, between the definition and uses of functions, and in the use of enumerations.
There are a number of operators which have an implied balancing between types of the operands.
The assignment, conditional ( ? : ), and relational operators have this property; the argument of a return
statement, and expressions used in initialization also suffer similar conversions. In these operations, char,
short, int, long, unsigned, float, and double types may be freely intermixed. The types of pointers must
agree exactly, except that arrays of x’s can, of course, be intermixed with pointers to x’s.
The type checking rules also require that, in structure references, the left operand of the —> be a
pointer to structure, the left operand of the . be a structure, and the right operand of these operators be a
member of the structure implied by the left operand. Similar checking is done for references to unions.
Strict rules apply to function argument and return value matching. The types float and double may
be freely matched, as may the types char, short, int, and unsigned. Also, pointers can be matched with
the associated arrays. Aside from this, all actual arguments must agree in type with their declared counter-
parts.
With enumerations, checks are made that enumeration variables or members are not mixed with other
types, or other enumerations, and that the only operations applied are =, initialization, ==, !=, and function
arguments and return values.
Type Casts
The type cast feature in C was introduced largely as an aid to producing more portable programs.
Consider the assignment
-4-
p=1;
where p is a character pointer. Lint will quite rightly complain. Now, consider the assignment
p = (char )1 ;
in which a cast has been used to convert the integer to a character pointer. The programmer obviously had
a strong motivation for doing this, and has clearly signaled his intentions. It seems harsh for lint to con-
tinue to complain about this. On the other hand, if this code is moved to another machine, such code
should be looked at carefully. The −c flag controls the printing of comments about casts. When −c is in
effect, casts are treated as though they were assignments subject to complaint; otherwise, all legal casts are
passed without comment, no matter how strange the type mixing seems to be.
Strange Constructions
Several perfectly legal, but somewhat strange, constructions are flagged by lint; the messages hope-
fully encourage better code quality, clearer style, and may even point out bugs. The −h flag is used to
enable these checks. For example, in the statement
p++ ;
the does nothing; this provokes the message ‘‘null effect’’ from lint. The program fragment
unsigned x ;
if( x < 0 ) ...
is clearly somewhat strange; the test will never succeed. Similarly, the test
if( x > 0 ) ...
is equivalent to
if( x != 0 )
which may not be the intended action. Lint will say ‘‘degenerate unsigned comparison’’ in these cases. If
one says
-5-
if( 1 != 0 ) ....
lint will report ‘‘constant in conditional context’’, since the comparison of 1 with 0 gives a constant result.
Another construction detected by lint involves operator precedence. Bugs which arise from misun-
derstandings about the precedence of operators can be accentuated by spacing and formatting, making such
bugs extremely hard to find. For example, the statements
if( x&077 == 0 ) ...
or
x<<2 + 40
probably do not do what was intended. The best solution is to parenthesize such expressions, and lint
encourages this by an appropriate message.
Finally, when the −h flag is in force lint complains about variables which are redeclared in inner
blocks in a way that conflicts with their use in outer blocks. This is legal, but is considered by many
(including the author) to be bad style, usually unnecessary, and frequently a bug.
Ancient History
There are several forms of older syntax which are being officially discouraged. These fall into two
classes, assignment operators and initialization.
The older forms of assignment operators (e.g., =+, =−, . . . ) could cause ambiguous expressions, such
as
a =−1 ;
which could be taken as either
a =− 1 ;
or
a = −1 ;
The situation is especially perplexing if this kind of ambiguity arises as the result of a macro substitution.
The newer, and preferred operators (+=, −=, etc. ) have no such ambiguities. To spur the abandonment of
the older forms, lint complains about these old fashioned operators.
A similar issue arises with initialization. The older language allowed
int x 1 ;
to initialize x to 1. This also caused syntactic difficulties: for example,
int x ( −1 ) ;
looks somewhat like the beginning of a function declaration:
int x ( y ) { . . .
and the compiler must read a fair ways past x in order to sure what the declaration really is.. Again, the
problem is even more perplexing when the initializer involves a macro. The current syntax places an equals
sign between the variable and the initializer:
int x = −1 ;
This is free of any possible syntactic ambiguity.
Pointer Alignment
Certain pointer assignments may be reasonable on some machines, and illegal on others, due entirely
to alignment restrictions. For example, on the PDP-11, it is reasonable to assign integer pointers to double
pointers, since double precision values may begin on any integer boundary. On the Honeywell 6000,
-6-
double precision values must begin on even word boundaries; thus, not all such assignments make sense.
Lint tries to detect cases where pointers are assigned to other pointers, and such alignment problems might
arise. The message ‘‘possible pointer alignment problem’’ results from this situation whenever either the
−p or −h flags are in effect.
Implementation
Lint consists of two programs and a driver. The first program is a version of the Portable C Compiler
Johnson Ritchie BSTJ Portability Programs System Johnson portable compiler 1978 which is the basis of
the IBM 370, Honeywell 6000, and Interdata 8/32 C compilers. This compiler does lexical and syntax
analysis on the input text, constructs and maintains symbol tables, and builds trees for expressions. Instead
of writing an intermediate file which is passed to a code generator, as the other compilers do, lint produces
an intermediate file which consists of lines of ascii text. Each line contains an external variable name, an
encoding of the context in which it was seen (use, definition, declaration, etc.), a type specifier, and a
source file name and line number. The information about variables local to a function or file is collected by
accessing the symbol table, and examining the expression trees.
Comments about local problems are produced as detected. The information about external names is
collected onto an intermediate file. After all the source files and library descriptions have been collected,
the intermediate file is sorted to bring all information collected about a given external name together. The
second, rather small, program then reads the lines from the intermediate file and compares all of the defini-
tions, declarations, and uses for consistency.
The driver controls this process, and is also responsible for making the options available to both
passes of lint.
Portability
C on the Honeywell and IBM systems is used, in part, to write system code for the host operating
system. This means that the implementation of C tends to follow local conventions rather than adhere
strictly to UNIX® system conventions. Despite these differences, many C programs have been successfully
moved to GCOS and the various IBM installations with little effort. This section describes some of the dif-
ferences between the implementations, and discusses the lint features which encourage portability.
Uninitialized external variables are treated differently in different implementations of C. Suppose
two files both contain a declaration without initialization, such as
-7-
int a ;
outside of any function. The UNIX loader will resolve these declarations, and cause only a single word of
storage to be set aside for a. Under the GCOS and IBM implementations, this is not feasible (for various
stupid reasons!) so each such declaration causes a word of storage to be set aside and called a. When load-
ing or library editing takes place, this causes fatal conflicts which prevent the proper operation of the pro-
gram. If lint is invoked with the −p flag, it will detect such multiple definitions.
A related difficulty comes from the amount of information retained about external names during the
loading process. On the UNIX system, externally known names have seven significant characters, with the
upper/lower case distinction kept. On the IBM systems, there are eight significant characters, but the case
distinction is lost. On GCOS, there are only six characters, of a single case. This leads to situations where
programs run on the UNIX system, but encounter loader problems on the IBM or GCOS systems. Lint −p
causes all external symbols to be mapped to one case and truncated to six characters, providing a worst-
case analysis.
A number of differences arise in the area of character handling: characters in the UNIX system are
eight bit ascii, while they are eight bit ebcdic on the IBM, and nine bit ascii on GCOS. Moreover, character
strings go from high to low bit positions (‘‘left to right’’) on GCOS and IBM, and low to high (‘‘right to
left’’) on the PDP-11. This means that code attempting to construct strings out of character constants, or
attempting to use characters as indices into arrays, must be looked at with great suspicion. Lint is of little
help here, except to flag multi-character character constants.
Of course, the word sizes are different! This causes less trouble than might be expected, at least
when moving from the UNIX system (16 bit words) to the IBM (32 bits) or GCOS (36 bits). The main prob-
lems are likely to arise in shifting or masking. C now supports a bit-field facility, which can be used to
write much of this code in a reasonably portable way. Frequently, portability of such code can be enhanced
by slight rearrangements in coding style. Many of the incompatibilities seem to have the flavor of writing
x &= 0177700 ;
to clear the low order six bits of x. This suffices on the PDP-11, but fails badly on GCOS and IBM. If the
bit field feature cannot be used, the same effect can be obtained by writing
x &= 077 ;
which will work on all these machines.
The right shift operator is arithmetic shift on the PDP-11, and logical shift on most other machines.
To obtain a logical shift on all machines, the left operand can be typed unsigned. Characters are consid-
ered signed integers on the PDP-11, and unsigned on the other machines. This persistence of the sign bit
may be reasonably considered a bug in the PDP-11 hardware which has infiltrated itself into the C lan-
guage. If there were a good way to discover the programs which would be affected, C could be changed; in
any case, lint is no help here.
The above discussion may have made the problem of portability seem bigger than it in fact is. The
issues involved here are rarely subtle or mysterious, at least to the implementor of the program, although
they can involve some work to straighten out. The most serious bar to the portability of UNIX system utili-
ties has been the inability to mimic essential UNIX system functions on the other systems. The inability to
seek to a random character position in a text file, or to establish a pipe between processes, has involved far
more rewriting and debugging than any of the differences in C compilers. On the other hand, lint has been
very helpful in moving the UNIX operating system and associated utility programs to other machines.
Shutting Lint Up
There are occasions when the programmer is smarter than lint. There may be valid reasons for ‘‘ille-
gal’’ type casts, functions with a variable number of arguments, etc. Moreover, as specified above, the flow
of control information produced by lint often has blind spots, causing occasional spurious messages about
perfectly reasonable programs. Thus, some way of communicating with lint, typically to shut it up, is
desirable.
-8-
The form which this mechanism should take is not at all clear. New keywords would require current
and old compilers to recognize these keywords, if only to ignore them. This has both philosophical and
practical problems. New preprocessor syntax suffers from similar problems.
What was finally done was to cause a number of words to be recognized by lint when they were
embedded in comments. This required minimal preprocessor changes; the preprocessor just had to agree to
pass comments through to its output, instead of deleting them as had been previously done. Thus, lint
directives are invisible to the compilers, and the effect on systems with the older preprocessors is merely
that the lint directives don’t work.
The first directive is concerned with flow of control information; if a particular place in the program
cannot be reached, but this is not apparent to lint, this can be asserted by the directive
/* NOTREACHED */
at the appropriate spot in the program. Similarly, if it is desired to turn off strict type checking for the next
expression, the directive
/* NOSTRICT */
can be used; the situation reverts to the previous default after the next expression. The −v flag can be
turned on for one function by the directive
/* ARGSUSED */
Complaints about variable number of arguments in calls to a function can be turned off by the directive
/* VARARGS */
preceding the function definition. In some cases, it is desirable to check the first several arguments, and
leave the later arguments unchecked. This can be done by following the VARARGS keyword immediately
with a digit giving the number of arguments which should be checked; thus,
/* VARARGS2 */
will cause the first two arguments to be checked, the others unchecked. Finally, the directive
/* LINTLIBRARY */
at the head of a file identifies this file as a library declaration file; this topic is worth a section by itself.
Bugs, etc.
Lint was a difficult program to write, partially because it is closely connected with matters of pro-
gramming style, and partially because users usually don’t notice bugs which cause lint to miss errors which
it should have caught. (By contrast, if lint incorrectly complains about something that is correct, the pro-
grammer reports that immediately!)
A number of areas remain to be further developed. The checking of structures and arrays is rather
inadequate; size incompatibilities go unchecked, and no attempt is made to match up structure and union
declarations across files. Some stricter checking of the use of the typedef is clearly desirable, but what
checking is appropriate, and how to carry it out, is still to be determined.
Lint shares the preprocessor with the C compiler. At some point it may be appropriate for a special
version of the preprocessor to be constructed which checks for things such as unused macro definitions,
macro arguments which have side effects which are not expanded at all, or are expanded more than once,
etc.
The central problem with lint is the packaging of the information which it collects. There are many
options which serve only to turn off, or slightly modify, certain features. There are pressures to add even
more of these options.
In conclusion, it appears that the general notion of having two programs is a good one. The compiler
concentrates on quickly and accurately turning the program text into bits which can be run; lint concen-
trates on issues of portability, style, and efficiency. Lint can afford to be wrong, since incorrectness and over-
conservatism are merely annoying, not fatal. The compiler can be fast since it knows that lint will
cover its flanks. Finally, the programmer can concentrate at one stage of the programming process solely
on the algorithms, data structures, and correctness of the program, and then later retrofit, with the aid of
lint, the desirable properties of universality and portability.
-10-
$LIST$
-11-