Using Stub Objects

Ali Bahrami — Friday November 11, 2011

Surfing with the Linker-Aliens

Having told the long and winding tale of where stub objects came from and how we use them to build Solaris, I'd like to focus now on the the nuts and bolts of building and using them. The following new features were added to the Solaris link-editor (ld) to support the production and use of stub objects:

-z stub

This new command line option informs ld that it is to build a stub object rather than a normal object. In this mode, it accepts the same command line arguments as usual, but will quietly ignore any objects and sharable object dependencies.

STUB_OBJECT Mapfile Directive

In order to build a stub version of an object, its mapfile must specify the STUB_OBJECT directive. When producing a non-stub object, the presence of STUB_OBJECT causes the link-editor to perform extra validation to ensure that the stub and non-stub objects will be compatible.

ASSERT Mapfile Directive

All data symbols exported from the object must have an ASSERT symbol directive in the mapfile that declares them as data and supplies the size, binding, bss attributes, and symbol aliasing details. When building the stub objects, the information in these ASSERT directives is used to create the data symbols. When building the real object, these ASSERT directives will ensure that the real object matches the linking interface presented by the stub.
Although ASSERT was added to the link-editor in order to support stub objects, they are a general purpose feature that can be used independently of stub objects. For instance you might choose to use an ASSERT directive if you have a symbol that must have a specific address in order for the object to operate properly and you want to automatically ensure that this will always be the case.

The material presented here is derived from a document I originally wrote during the development effort, which had the dual goals of providing supplemental materials for the stub object PSARC case, and as a set of edits that were eventually applied to the Oracle Solaris Linker and Libraries Manual (LLM). The Solaris 11 LLM contains this information in a more polished form.

Stub Objects

A stub object is a shared object, built entirely from mapfiles, that supplies the same linking interface as the real object, while containing no code or data. Stub objects cannot be used at runtime. However, an application can be built against a stub object, where the stub object provides the real object name to be used at runtime, and then use the real object at runtime.

When building a stub object, the link-editor ignores any object or library files specified on the command line, and these files need not exist in order to build a stub. Since the compilation step can be omitted, and because the link-editor has relatively little work to do, stub objects can be built very quickly.

Stub objects can be used to solve a variety of build problems:

Speed

Modern machines, using a version of make with the ability to parallelize operations, are capable of compiling and linking many objects simultaneously, and doing so offers significant speedups. However, it is typical that a given object will depend on other objects, and that there will be a core set of objects that nearly everything else depends on. It is necessary to impose an ordering that builds each object before any other object that requires it. This ordering creates bottlenecks that reduce the amount of parallelization that is possible and limits the overall speed at which the code can be built.
Complexity/Correctness

In a large body of code, there can be a large number of dependencies between the various objects. The makefiles or other build descriptions for these objects can become very complex and difficult to understand or maintain. The dependencies can change as the system evolves. This can cause a given set of makefiles to become slightly incorrect over time, leading to race conditions and mysterious rare build failures.
Dependency Cycles

It might be desirable to organize code as cooperating shared objects, each of which draw on the resources provided by the other. Such cycles cannot be supported in an environment where objects must be built before the objects that use them, even though the runtime linker is fully capable of loading and using such objects if they could be built.

Stub shared objects offer an alternative method for building code that sidesteps the above issues. Stub objects can be quickly built for all the shared objects produced by the build. Then, all the real shared objects and executables can be built in parallel, in any order, using the stub objects to stand in for the real objects at link-time. Afterwards, the executables and real shared objects are kept, and the stub shared objects are discarded.

Stub objects are built from a mapfile, which must satisfy the following requirements.

The mapfile must specify the STUB_OBJECT directive. This directive informs the link-editor that the object can be built as a stub object, and as such causes the link-editor to perform validation and sanity checking intended to guarantee that an object and its stub will always provide identical linking interfaces.
All function and data symbols that make up the external interface to the object must be explicitly listed in the mapfile.
The mapfile must use symbol scope reduction ('*'), to remove any symbols not explicitly listed from the external interface.
All global data exported from the object must have an ASSERT symbol attribute in the mapfile to specify the symbol type, size, and bss attributes. In the case where there are multiple symbols that reference the same data, the ASSERT for one of these symbols must specify the TYPE and SIZE attributes, while the others must use the ALIAS attribute to reference this primary symbol.

Given such a mapfile, the stub and real versions of the shared object can be built using the same command line for each, adding the '-z stub' option to the link for the stub object, and omiting the option from the link for the real object.

To demonstrate these ideas, the following code implements a shared object named idx5, which exports data from a 5 element array of integers, with each element initialized to contain its zero-based array index. This data is available as a global array, via an alternative alias data symbol with weak binding, and via a functional interface.

% cat idx5.c
int _idx5[5] = { 0, 1, 2, 3, 4 };
#pragma weak idx5 = _idx5

int
idx5_func(int index)
{
        if ((index < 0) || (index > 4))
                return (-1);
        return (_idx5[index]);
}

A mapfile is required to describe the interface provided by this shared object.

% cat mapfile
$mapfile_version 2
STUB_OBJECT;
SYMBOL_SCOPE {
        _idx5   {
                        ASSERT { TYPE=data; SIZE=4[5] };
                };
        idx5    {
                        ASSERT { BINDING=weak; ALIAS=_idx5 };
                };
        idx5_func;
    local:
        *;
};

The following main program is used to print all the index values available from the idx5 shared object.

% cat main.c
#include <stdio.h>

extern int      _idx5[5], idx5[5], idx5_func(int);

int
main(int argc, char **argv)
{
        int     i;
        for (i = 0; i < 5; i++)
                (void) printf("[%d] %d %d %d\n",
                    i, _idx5[i], idx5[i], idx5_func(i));
        return (0);
}

The following commands create a stub version of this shared object in a subdirectory named stublib. elfdump is used to verify that the resulting object is a stub. The command used to build the stub differs from that of the real object only in the addition of the -z stub option, and the use of a different output file name. This demonstrates the ease with which stub generation can be added to an existing makefile.

% cc -Kpic -G -M mapfile -h libidx5.so.1 idx5.c -o stublib/libidx5.so.1 -zstub
% ln -s libidx5.so.1 stublib/libidx5.so
% elfdump -d stublib/libidx5.so | grep STUB
      [11]  FLAGS_1           0x4000000           [ STUB ]

The main program can now be built, using the stub object to stand in for the real shared object, and setting a runpath that will find the real object at runtime. However, as we have not yet built the real object, this program cannot yet be run. Attempts to cause the system to load the stub object are rejected, as the runtime linker knows that stub objects lack the actual code and data found in the real object, and cannot execute.

% cc main.c -L stublib -R '$ORIGIN/lib' -lidx5 -lc
% ./a.out
ld.so.1: a.out: fatal: libidx5.so.1: open failed: No such file or directory
Killed
% LD_PRELOAD=stublib/libidx5.so.1 ./a.out
ld.so.1: a.out: fatal: stublib/libidx5.so.1: stub shared object cannot be used at runtime
Killed

We build the real object using the same command as we used to build the stub, omitting the -z stub option, and writing the results to a different file.

% cc -Kpic -G -M mapfile -h libidx5.so.1 idx5.c -o lib/libidx5.so.1

Once the real object has been built in the lib subdirectory, the program can be run.

% ./a.out
[0] 0 0 0
[1] 1 1 1
[2] 2 2 2
[3] 3 3 3
[4] 4 4 4

Mapfile Changes

The version 2 mapfile syntax was extended in a number of places to accommodate stub objects.

Conditional Input

The version 2 mapfile syntax has the ability conditionalize mapfile input using the $if control directive. As you might imagine, these directives are used frequently with ASSERT directives for data, because a given data symbol will frequently have a different size in 32 or 64-bit code, or on differing hardware such as x86 versus sparc.

The link-editor maintains an internal table of names that can be used in the logical expressions evaluated by $if and $elif. At startup, this table is initialized with items that describe the class of object (_ELF32 or _ELF64) and the type of the target machine (_sparc or _x86). We found that there were a small number of cases in the Solaris code base in which we needed to know what kind of object we were producing, so we added the following new predefined items in order to address that need:

Name Meaning

... ...

_ET_DYN shared object

_ET_EXEC executable object

_ET_REL relocatable object

... ...

Name	Meaning
...	...
_ET_DYN	shared object
_ET_EXEC	executable object
_ET_REL	relocatable object
...	...

STUB_OBJECT Directive

The new STUB_OBJECT directive informs the link-editor that the object described by the mapfile can be built as a stub object.

STUB_OBJECT;

A stub shared object is built entirely from the information in the mapfiles supplied on the command line. When the -z stub option is specified to build a stub object, the presence of the STUB_OBJECT directive in a mapfile is required, and the link-editor uses the information in symbol ASSERT attributes to create global symbols that match those of the real object.

When the real object is built, the presence of STUB_OBJECT causes the link-editor to verify that the mapfiles accurately describe the real object interface, and that a stub object built from them will provide the same linking interface as the real object it represents.

All function and data symbols that make up the external interface to the object must be explicitly listed in the mapfile.
The mapfile must use symbol scope reduction ('*'), to remove any symbols not explicitly listed from the external interface.
All global data in the object is required to have an ASSERT attribute that specifies the symbol type and size.
If the ASSERT BIND attribute is not present, the link-editor provides a default assertion that the symbol must be GLOBAL.
If the ASSERT SH_ATTR attribute is not present, or does not specify that the section is one of BITS or NOBITS, the link-editor provides a default assertion that the associated section is BITS.
All data symbols that describe the same address and size are required to have ASSERT ALIAS attributes specified in the mapfile. If aliased symbols are discovered that do not have an ASSERT ALIAS specified, the link fails and no object is produced.

These rules ensure that the mapfiles contain a description of the real shared object's linking interface that is sufficient to produce a stub object with a completely compatible linking interface.

SYMBOL_SCOPE/SYMBOL_VERSION ASSERT Attribute

The SYMBOL_SCOPE and SYMBOL_VERSION mapfile directives were extended with a symbol attribute named ASSERT. The syntax for the ASSERT attribute is as follows:

ASSERT {
	ALIAS = symbol_name;
	BINDING = symbol_binding;
	TYPE = symbol_type;
	SH_ATTR = section_attributes;

	SIZE = size_value;
	SIZE = size_value[count];
};

The ASSERT attribute is used to specify the expected characteristics of the symbol. The link-editor compares the symbol characteristics that result from the link to those given by ASSERT attributes. If the real and asserted attributes do not agree, a fatal error is issued and the output object is not created.

In normal use, the link editor evaluates the ASSERT attribute when present, but does not require them, or provide default values for them. The presence of the STUB_OBJECT directive in a mapfile alters the interpretation of ASSERT to require them under some circumstances, and to supply default assertions if explicit ones are not present. See the definition of the STUB_OBJECT Directive for the details.

When the -z stub command line option is specified to build a stub object, the information provided by ASSERT attributes is used to define the attributes of the global symbols provided by the object.

ASSERT accepts the following:

ALIAS

Name of a previously defined symbol that this symbol is an alias for. An alias symbol has the same type, value, and size as the main symbol. The ALIAS attribute is mutually exclusive to the TYPE, SIZE, and SH_ATTR attributes, and cannot be used with them. When ALIAS is specified, the type, size, and section attributes are obtained from the alias symbol.
BIND

Specifies an ELF symbol binding, which can be any of the STB_ constants defined in <sys/elf.h>, with the STB_ prefix removed (e.g. GLOBAL, WEAK).
TYPE

Specifies an ELF symbol type, which can be any of the STT_ constants defined in <sys/elf.h>, with the STT_ prefix removed (e.g. OBJECT, COMMON, FUNC). In addition, for compatibility with other mapfile usage, FUNCTION and DATA can be specified, for STT_FUNC and STT_OBJECT, respectively. TYPE is mutually exclusive to ALIAS, and cannot be used in conjunction with it.
SH_ATTR

Specifies attributes of the section associated with the symbol. The section_attributes that can be specified are given in the following table:

Section Attribute Meaning

BITS Section is not of type SHT_NOBITS

NOBITS Section is of type SHT_NOBITS

SH_ATTR is mutually exclusive to ALIAS, and cannot be used in conjunction with it.
SIZE

Specifies the expected symbol size. SIZE is mutually exclusive to ALIAS, and cannot be used in conjunction with it. The syntax for the size_value argument is as described in the discussion of the SIZE attribute below.

Section Attribute	Meaning
BITS	Section is not of type SHT_NOBITS
NOBITS	Section is of type SHT_NOBITS

SIZE

The SIZE symbol attribute existed before support for stub objects was introduced. It is used to set the size attribute of a given symbol. This attribute results in the creation of a symbol definition.

Prior to the introduction of the ASSERT SIZE attribute, the value of a SIZE attribute was always numeric. While attempting to apply ASSERT SIZE to the objects in the Solaris ON consolidation, I found that many data symbols have a size based on the natural machine wordsize for the class of object being produced. Variables declared as long, or as a pointer, will be 4 bytes in size in a 32-bit object, and 8 bytes in a 64-bit object. Initially, I employed the conditional $if directive to handle these cases as follows:

$if _ELF32
        foo     { ASSERT { TYPE=data; SIZE=4 } };
        bar     { ASSERT { TYPE=data; SIZE=20 } };
$elif _ELF64
        foo     { ASSERT { TYPE=data; SIZE=8 } };
        bar     { ASSERT { TYPE=data; SIZE=40 } };
$else
$error UNKNOWN ELFCLASS
$endif

I found that the situation occurs frequently enough that this is cumbersome. To simplify this case, I introduced the idea of the addrsize symbolic name, and of a repeat count, which together make it simple to specify machine word scalar or array symbols. Both the SIZE, and ASSERT SIZE attributes support this syntax:

The size_value argument can be a numeric value, or it can be the symbolic name addrsize. addrsize represents the size of a machine word capable of holding a memory address. The link-editor substitutes the value 4 for addrsize when building 32-bit objects, and the value 8 when building 64-bit objects. addrsize is useful for representing the size of pointer variables and C variables of type long, as it automatically adjusts for 32 and 64-bit objects without requiring the use of conditional input.
The size_value argument can be optionally suffixed with a count value, enclosed in square brackets. If count is present, size_value and count are multiplied together to obtain the final size value.

Using this feature, the example above can be written more naturally as:

        foo     { ASSERT { TYPE=data; SIZE=addrsize } };
        bar     { ASSERT { TYPE=data; SIZE=addrsize[5] } };

Exported Global Data Is Still A Bad Idea

As you can see, the additional plumbing added to the Solaris link-editor to support stub objects is minimal. Furthermore, about 90% of that plumbing is dedicated to handling global data.

We have long advised against global data exported from shared objects. There are many ways in which global data does not fit well with dynamic linking. Stub objects simply provide one more reason to avoid this practice. It is always better to export all data via a functional interface. You should always hide your data, and make it available to your users via a function that they can call to acquire the address of the data item. However, If you do have to support global data for a stub, perhaps because you are working with an already existing object, it is still easilily done, as shown above.

Oracle does not like us to discuss hypothetical new features that don't exist in shipping product, so I'll end this section with a speculation. It might be possible to do more in this area to ease the difficulty of dealing with objects that have global data that the users of the library don't need. Perhaps someday...

Conclusions

It is easy to create stub objects for most objects. If your library only exports function symbols, all you have to do to build a faithful stub object is to add

STUB_OBJECT;

and then to use the same link command you're currently using, with the addition of the -z stub option.

Happy Stubbing!

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.oracle.com/ali/entry/using_stub_objects/
https://blogs.oracle.com/ali/using-stub-objects/

Surfing with the Linker-Aliens

[20] Stub Objects Blog Index (ali) [22] Stub Proto: Not Just For Stub Objects