OSNet Direct Binding

Rod Evans — Wednesday May 14, 2008

Surfing with the Linker-Aliens

Direct Binding refers to a symbol search and binding model that has been available in Solaris for quite some time. See Library Bindings.

At runtime, a symbol reference from an object must be located by the runtime linker (ld.so.1(1) ). Under direct bindings, symbol definitions are searched for directly in the dependency that provides the symbol definition. The provider of the symbol definition was determined by the link-editor (ld(1)) when the object was originally built.

This direct binding model differs from the traditional symbol search and binding model. In the traditional model, the symbol search starts with the application and advances through each object that is loaded within the process until a symbol definition is found.

Given that direct binding capabilities have been available for some time, and a number of other consolidations have been happily using them, why did it take so long to get this model employed to build the OSNet consolidation? (that's the Solaris core OS and networking).

Basically, there were a number of corner cases to solve. One advantage of direct bindings is that this model can protect against unintentional interposition. One disadvantage of direct bindings is that this model can circumvent intentional interposition. Determining whether interposition exists, and whether it is intentional or unintentional is the fun part. The core Solaris libraries seem to be a frequent target of interposition.

So first, what is interposition? Suppose a process is made up of several shared objects, and two shared objects, libX.so and libY.so, export the same symbol xy(). Under the traditional symbol search model, any references to the symbol xy() will be bound to the first instance of xy() that is found. So, if libX.so is loaded before libY.so , then the instance of xy() within libX.so is used to satisfy all references. The instance of xy() within libX.so is said to interpose on the instance in libY.so.

Now, suppose that two other shared objects within the process, libA.so and libB.so, reference xy(). Under the traditional symbol search model, both of these objects will bind to libX.so. But, if libA.so was built to depend on libX.so, and libB.so was built to depend on libY.so, and both employed direct bindings, then libA.so would bind to xy() in libX.so, and libB.so would bind to xy() in libY.so.

One avenue to observe this difference in binding is to employ lari(1), a utility that looks for interesting binding events. Not surprisingly, most interesting events revolve around the multiple instance of a symbol. From our example, the traditional symbol search model will reveal:

    % lari main
    [2:2E]: xy(): ./libX.so
    [2:0]: xy(): ./libY.so

Here, we see the two instances of xy(), with libX.so being the recipient of the two external bindings (2E).

However, if libA.so and libB.so employ direct bindings then the symbol search model will reveal:

    % lari main
    [2:1ED]: xy(): ./libX.so
    [2:1ED]: xy(): ./libY.so

Here, both libX.so and libY.so are the recipient of one external, direct binding (1ED).

The question now is what did the developer of libX.so intend? Did they want to capture all bindings to xy()?, or was their choice of the name xy() an unintended name-clash with the existing symbol in libY.so?

It is this latter name-clash issue that was one of the main motivators in having the OSNet consolidation use direct bindings for all system libraries. There have been numerous instances of user applications breaking system functionality by unintentionally interposin g on a symbol that exists within a system library. However, although we wished to protect our libraries from unintentional interposition, we still wished to provide for interposition where it was intended.

Although the direct bindings implementation prevents unintentional interposition , the implementation does allow for interposition. However, if you want interposition then you now need to be explicit. Explicit interposition can be achieved with LD_PRELOAD (an old favorite), or by tagging the associated object with -z interpose, or by identifying symbols within an executable with INTERPOSE mapfile directives.

Alternatively, if you design a library with the intent that users be allowed to interpose on symbols within the library, you can disable direct binding to the library. Disabling can be achieved for the whole library using the link-editors -B nodirect option, or by identifying individual symbols with NODIRECT mapfile directives or as singletons.

If you suspect an issue with direct bindings in effect, you can return to the tradition symbol search model by setting the environment variable LD_NODIRECT=yes. A suggestion for investigating the issue further would be:

    % lari main > direct
    % LD_NODIRECT=yes lari main > no-direct
    % diff direct no-direct

Standard interposition dates from an era where applications had very few dependencies. Times have changed, and the number of dependencies have dramatically increased. Although interposition can be powerful, it can also be fragile and scale badly. Diagnosing the occurrence of interposition can be a challenge.

Given the ability to time travel, direct binding would probably have been the only model for symbol binding, and explicit interposition the only means of defining an interposer. Having to support direct bindings and the traditional model with the various flags and options is the cost of backward compatibility. However, the ability of ELF to stretch this far speaks to the overall quality of its initial design, warts and all.

The OSNet consolidation uses the various binding-control flags to both identify interposers, and prevent direct bindings to commonly interposed upon symbols. All the gory details of direct binding, the various flags that can be used, and examples of their use, can be found in the Direct Binding chapter of the Linker and Libraries Guide.

Surfing with the Linker-Aliens

Comments

Oliver Kiddle — Monday July 21, 2008

Do you use -z direct or -B direct? It seems -B direct implies -z lazyload and a big disadvantage of lazy loading is that you don't get an immediate error at runtime if a library can't be found so the program can fail once it is already in the middle of something. -z direct also apparently doesn't explicitly bind symbols in the object being created. In what situation could that make a difference?

Are there other differences between -zdirect and -Bdirect that I've missed?

Thanks

Rod — Monday July 21, 2008

Oliver, your questions (hopefully) have been answered in the latest blog entry:

http://www.linker-aliens.org/blogs/rie/entry/direct_binding_the_zdirect_bdirect

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/rie/entry/direct_binding_now_the_default/
https://blogs.oracle.com/rie/entry/direct_binding_now_the_default/
https://blogs.oracle.com/rie/direct-binding-now-the-default-for-osnet-components/

Surfing with the Linker-Aliens

[27] moved /usr/ccs/bin commands Blog Index (rie) [29] Direct Binding...options, and probing