What Are "Tentative" Symbols?

Ali Bahrami — Friday September 22, 2006

Surfing with the Linker-Aliens

In the Linker and Libraries Guide, you will encounter discussion of tentative symbols. Based on the name, we might expect that such a symbol is missing something, but what? And why does the linker have to treat them as a special case?

A tentative symbol is a symbol used to track a global variable when we don't know its size or initial value. In other words, a symbol for which we have not yet assigned a storage address. They are also known as "common block" symbols, because they have their origins in the implementation of Fortran COMMON blocks. They are historical baggage — something that needs to work for compatibility with the past, but also something to avoid in new code.

Consider the following two C declarations, made at outer file scope:

        int foo;
        int foo = 0;
Superficially, these both appear to declare a global variable named foo with an initial value of 0. However, the first definition is tentative — it will have a value of 0 only if some other file doesn't explicitly give it a different value. The outcome depends on what else we link this file against.

To get a better handle on this, let's create two separate C files (t1.c, and t2.c) and experiment:

t1.c
        #include <stdio.h>

        #ifdef TENTATIVE_FOO
        int foo;
        #else
        int foo = 0;
        #endif

        int
        main(int argc, char *argv[])
        {
                printf("FOO: %d\\n", foo);
                return (0);
        }
t2.c
        int foo = 12;

First, we compile and link t1.c by itself, using both forms of declaration for variable foo:

        % cc -DTENTATIVE_FOO t1.c; ./a.out
        FOO: 0
        % cc t1.c; ./a.out
        FOO: 0

As expected, they give identical results. Now, lets add t2.c to the mix and see what happens:

        % cc -DTENTATIVE_FOO t1.c t2.c; ./a.out
        FOO: 12
        % cc t1.c t2.c; ./a.out
        ld: fatal: symbol `foo' is multiply-defined:
                (file t1.o type=OBJT; file t2.o type=OBJT);
        ld: fatal: File processing errors. No output written to a.out
        ./a.out: No such file or directory
As you can see, the two different ways of declaring foo are not 100% equivalent. The tentative declaration of foo in t1.c took on the value provided by the declaration in t2.c. In contrast, the linker was unwilling to merge the two non-tentative definitions of foo that had different values, and instead issued a fatal link error.

Normal C rules say that a variable at file scope without an explicit value is assigned an initial value of 0. However, the existence of other global variables with the same name can change this. The C compiler is only able to see the code in the single file it is compiling, and cannot know how to handle this case. So, it marks it as tentative by giving the symbol a type of STT_COMMON, and leaves it for the linker to figure out. The linker is in a position to match up all of these symbols and merge them into a single instance. The linker has no insight into programmer intent though, and it cannot protect you from doing this by accident. The result usually works, but is fragile.

The other declaration form (with a value) causes a non-tentative symbol to be created (STT_OBJECT). In this case, the linker ensures that all the declarations agree. This is the right behavior if you care about robust and scalable code.

It is worth noting that you will never see a tentative symbol with local scope. It can only happen to global symbols, because global symbols in different files are the only way you can get this form of aliasing to occur.

History

Tentative symbols are bad software engineering. A declaration in one file should not be able to alter one in another file. The need for them dates from the early days of the Fortran language. In Fortran, you can declare a common block in more than one file, with each file independently specifying the number, types, and sizes of the variables. The linker then takes all of these blocks, allocates enough space to satisfy the largest one, and makes all them point at that space. This is a very crude form of a union (variant), and is therefore very useful (and dangerous) Fortran technique.

Sadly, it didn't stop there. We still sometimes find this practice in C code. Two files will both declare:

        int foo;
and then expect that they are both be referring to a single global variable, with an initial value of 0. This is not necessary. The proper solution has existed for decades. The safe way to do the above is to have exactly one declaration for the global variable in a single file. The other files that need to access to it use the "extern" keyword to let the compiler know what is going on. The statement
        extern int foo;
is a reference, not a declaration, and it has a single unambiguous interpretation.

Moral: Don't Do That!

Don't use common block binding in your code. It was a bad idea 40 years ago, and it hasn't improved with age. The necessity of backward compatibility is such that compilers and linkers must support common block binding. We are stuck with it, but we don't have to use it.

You should always try to minimize or eliminate global variables. However, when you do use them:

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/ali/entry/what_are_tentative_symbols/
https://blogs.oracle.com/ali/entry/what_are_tentative_symbols/
https://blogs.oracle.com/ali/what-are-tentative-symbols/

Surfing with the Linker-Aliens

[2] Settling An Old Score
Blog Index (ali)
[4] Symbol Tables