Init and Fini Processing - who designed this?

Rod Evans — Tuesday September 27, 2005

Surfing with the Linker-Aliens

Recently we had to make yet another modification to our runtime .init processing to compensate for an undesirable application interaction. I thought an overview of our torturous history of .init processing might be entertaining.

During the creation of a dynamic object, the link-editor ld(1), arranges for any .init and .fini sections to be collected into blocks of code that are executed by the runtime linker ld.so.1(1). These blocks of code are typically used to implement constructors and destructors, or code identified with #pragma init, or #pragma fini.

The original System V ABI was rather vague in regard these sections, stating simply:

        After the dynamic linker has built the process image and performed the
        relocations, each shared object gets the opportunity to execute some
        initialization code.  These initialization functions are called in no
        specific order, but all shared object initializations happen before the
        executable file gains control.

        Similarly, shared objects may have termination functions, which are
        executed with the atexit(3c) mechanism after the base process begins
        its termination sequence. Once again, the order in which the dynamic
        linkers calls termination functions is unspecified.

The system has evolved since this was written, and today there is an expectation that any initialization functions are called before any code within the same object is referenced. This holds true for dependencies of the application and objects that are dynamically added to a process with dlopen(3c). The reverse is expected on process termination, and when objects are removed from the process with dlclose(3c).

Todays processes, language requirements (C++), lazy-loading, together with dlopen(), and dlclose() use, have resulted in a complex .init/.fini execution model that attempts to satisfy a users expectations. However, scenarios can still exist where expectations can not be achieved. Typically, the implementation details of .init/.fini processing aren't something any developers design to. But as the dynamic dependencies of a process change, unforeseen side-effects of this processing can cause problems. Given the complexity of todays processes, these problems can be difficult to detect let alone prepare against.

For the rest of this discussion, let's use .init sections for examples.

At First it was Simple

In the early days of Solaris, .init sections were run in reverse load order, sometimes referred to as reverse breadth first order. If an application had the following dependencies:

        % ldd a.out
                lib1.so.1 =>     /opt/ISV/lib/lib1.so.1
                lib2.so.1 =>     /opt/ISV/lib/lib2.so.1
                libc.so.1 =>     /usr/lib/libc.so.1

then the initialization sequence would be libc.so.1 followed by lib2.so.1 followed by lib1.so.1.

This level of simplicity proved insufficient for calling .init sections in their expected order. All that was required was for a dependency to have its own dependency. For example, if lib1.so.1 had a dependency on lib3.so.1, the load order would reveal:

        % ldd a.out
                lib1.so.1 =>     /opt/ISV/lib/lib1.so.1
                lib2.so.1 =>     /opt/ISV/lib/lib2.so.1
                libc.so.1 =>     /usr/lib/libc.so.1
                lib3.so.1 =>     /opt/ISV/lib/lib3.so.1

The result of this loading was that the .init for lib3.so.1 was called before the system library libc.so.1. In practice this wasn't a big issue back in our very early releases. Although libc.so.1 is typically the dependency of every application and library created, its .init used to contributed little that was required by the .init's of other objects.

Issues really started to arise as C++ and/or threads use started to expand. The libraries libC.so.1 and libthread.so.1, and the C++ objects themselves, had far more complex initialization requirements. It became essential that libC.so.1 and libthread.so.1's .init ran before any other objects .init.

Topological Sorting Arrives

In Solaris 6, the runtime linker started constructing a dependency ordered list of .init sections to call. This list is built from the dependency relationships expressed by each object and any bindings that have been processed outside of the expressed dependencies.

Explicit dependencies are established at link-edit i.e., lib1.so.1 needs lib3.so.1. However, explicit dependency relationships are often insufficient to establish complete dependency relationships. It is still very typical for a shared object to be generated that does not express its dependencies. Use of the link-editors -z defs option would enforce that dependencies be expressed, but this isn't always the case.

Therefore, the runtime-linker also adds any dependencies established by relocations to the information used for topological sorting. ldd(1) can be used to display expected .init order:

        % ldd -di a.out
                lib1.so.1 =>     ./lib1.so.1
                lib2.so.1 =>     ./lib2.so.1
                libc.so.1 =>     /usr/lib/libc.so.1
                lib3.so.1 =>     ./lib3.so.1

           init object=/usr/lib/libc.so.1
           init object=./lib3.so.1
           init object=./lib1.so.1
           init object=./lib2.so.1

But there's still something missing. The above example shows ldd processing only immediate (data) relocations. This is normal when executing an application, i.e., without LD_BIND_NOW being set. Typically, functions are not resolved when objects are loaded, but are done lazily when the function is first called.

The problem with the runtime linkers dependency analysis, is that without resolving all relocations, including functions, the exact dependency relationship may not be known at the time .init firing starts.

Of course, we had one library that had to be treated differently, libthread.so.1. Even though this library could have dependencies on other system libraries, it's .init had to fire first. This was insured with the .dynamic flag DF_1_INITFIRST. But, this also excited others who claim they'd like to be first too! Better mechanisms have since evolved to insure the new merged libthread and libc are initialized appropriately.

As a side note, when topological sorting was added, the environment variable LD_BREADTH was also provided. This variable suppressed topological sorting and reverted to the original breadth first sorting. This fall back was provided in case applications were found to be dependent upon breadth first sorting, or in case bugs existed in the topological sort mechanism. Sadly, the latter proved true, and LD_BREADTH found its way into scripts and user startup files. But, as systems became more complex, LD_BREADTH became increasingly inappropriate, and its existence caused more problems than it solved. This environment variables processing was finally removed.

Note, the debugging capabilities that are available with the runtime linker in OpenSolaris have been significantly updated so that LD_DEBUG=init,detail provides detailed information on the topological sorting process.

Dynamic .init Calling

To complete the initialization model, each time the runtime-linker resolves a function call (through a .plt), the defining objects .init is called if it hasn't already executed. With dynamic .init calling, a lazily bound family of objects can assume that an objects .init is called before code in that object is referenced.

Suppose in our a.out example, that the .init code executed for lib3.so.1 makes a call to lib2.so.1. It follows that the .init for lib2.so.1 should be called before it had previously been scheduled. This dynamic initialization can not be observed under ldd, but can be seen with the runtime-linkers debugging:

        % LD_DEBUG=init a.out
        ......
        09086: calling .init (from sorted order): /usr/lib/libc.so.1
        09086:
        09086: calling .init (done): /usr/lib/libc.so.1
        09086:
        09086: calling .init (from sorted order): ./lib3.so.1
        09086:
        09086: calling .init (dynamically triggered): ./lib2.so.1
        09086:
        09086: calling .init (done): ./lib2.so.1
        09086:
        09086: calling .init (done): ./lib3.so.1
        09086:
        09086: calling .init (from sorted order): ./lib1.so.1
        09086:
        09086: calling .init (done): ./lib1.so.1

But, there is still something missing. What if a family of objects are not lazily bound? And, can a user assume that an objects .init has completed before the code in that object is referenced?

Loss of Lazy Binding

It has been observed that users frequently call dlopen() with RTLD_NOW. This results in functions being bound at relocation time. An interesting side-effect is how this can interact with .init firing.

Suppose a family of objects have been loaded under the default mode of lazy binding. The runtime linker has sorted this family and is in the process of firing the .init's. Now let's say that a particular .init calls dlopen() with RTLD_NOW, on members of this already loaded family. Effectively, this dlopen() may have altered the dependency ordering, as function relocations would have contributed to the topological sorting process. In addition, as there are no longer outstanding function relocations (.plt's) that would give the runtime-linker control, no dynamic .init calling for these fully relocated objects can be performed.

This observation prompted an additional level of dynamic .init calling. When ever a family of objects are dlopen()'ed, any .init sections for those objects that have not been called, would be sorted and called before return from the dlopen().Although this has always occurred for newly loaded objects, any existing objects .init's would have been skipped as they were already part of a pending initialization thread. By always collecting the .init's of a dlopen() family we could compensate for any loss of lazy binding, and insure that an objects .init is called before their code is first referenced.

No Lazy Binding

Objects can be loaded without lazy-binding, either under control of LD_BIND_NOW, the dlopen() flag RTLD_NOW, or because they were built with link-editors -z now option. Although the relocation information provided to the topological sorting may be more precise than under lazy-loading, there is no longer any dynamic .init calling possible.

Cyclic Dependencies

Cyclic dependencies seem to be quite common. It has been observed that calling one objects .init can result in one or more other objects being called, which in turn reference code from the originating object. Problems start manifesting themselves when this return to the originating object exercises code whose initialization has not yet completed.

When topological sorting detects cycles, the members of the cycles will have their .init's fired in reverse load order. With dynamic .init calling, this order may be a fine starting point, but without dynamic .init calling this order may not be sufficient to prepare for the execution path of code throughout the cyclic objects.

A similar issue has arisen between different threads of control. One thread may be in the process of calling an .init and get preempted for another thread that references the same object. In fact, the runtime linker is fully capable of using condition variables to synchronize .inituse. However, this functionality is not enabled by default because of cyclic dependencies, and the possible deadlock conditions that can result.

Recursion with dlopen(0)

When we added the reevaluation and firing of .init's if any loaded objects where referenced by a dlopen(), dlopen(0) fell under the same model. And, although this model was integrated into Solaris a couple of years ago, it wasn't until recently that some existing applications were observed to fail because of the .init reevaluation.

The problem was that unexpected recursion was occurring. .init's were being fired, and then the code within a .init called dlopen(0). This caused a reevaluation of the outstanding .init sections, some .init's to fire, and a thread of execution lead to running code within an object whose .init had not yet completed.

Yet another refinement was added. Now, if any dlopen() operation only references objects that are already loaded, and that dlopen() operation does not promote any objects relocation requirements, i.e., doesn't use RTLD_NOW, then the thread of initialization that must presently be in existence is left to finish the job. If however, the dlopen() operation adds new objects, or promotes any existing objects relocations, then the family of objects referenced will have their .init's reevaluated, and a new thread of initialization is kicked off to process this family.

Conclusions

As you can now see, the dynamic initialization of objects within a process is quite complex, and no-one in the right mind would have ever designed it this way. This complexity has evolved, from some incredibly vague starting point, to an implementation that has become necessary to satisfy existing dynamic objects. Whether a lot of this complexity is by accident or by design, is open to debate. And whether developers have ever considered initialization requirements or designed to some goal, seems doubtful. Given the inability to initialize groups of cyclic dependencies in a correct order, you have to wonder if users ever meant to create such an environment. Personally I think most initialization "requirements" have worked more through luck than judgment,

I've tried to provide documentation to educate users of the issues facing object initialization. However, the Linker and Libraries Guide isn't the first place folks look. Documentation should probably start with the compiler documents, which are the first place developers usually go. Documenting how users should use .init's and .fini's might also be useful. Although this really falls back to the languages, like C++, to document constructor and deconstructor use.

Better methods of analyzing initialization dependencies may also be useful. The runtime linkers debugging capabilities are a start. Objects can also be exercised under ldd using the -i and -d/r options. But this still seems a little too late in the development cycle. It would be nice if we could flag things like cyclic dependencies during object development. Specifically, the cyclic dependencies of .init's and .fini's. However, one problem with todays applications is that they are not all created in one build environment. Many components, asynchronously delivered from different ISV's are brought together to produce a complete application.

The bottom line is keep it simple. Try and get rid of .init code. It's amazing how much exists, and what it does (dlopen()/dlclose(), firing off new threads, I've even seen .init sections fork()/exec() processes). .init code often initializes an object for all eventualities, whereas much of the initialization is never used for a particular thread of execution. Make the initialization self contained by eliminating and reducing the references to external objects from the initialization code. Reducing exported interfaces can help too. By keeping things simple you can avoid much of the initialization interactions that the runtime linkers implementation has evolved to handle. Your process will start up faster too. Folks often comment how long it takes for a process to get to main. Have a look how much initialization processing comes before this!

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/rie/entry/init_and_fini_processing_who/
https://blogs.oracle.com/rie/entry/init_and_fini_processing_who/
https://blogs.oracle.com/rie/init-and-fini-processing-who-designed-this/

Surfing with the Linker-Aliens

[17] reducing dlsym() overhead
Blog Index (rie)
[19] Very Slow Link - get patch