Init and Fini Processing - who designed this?
Rod Evans Tuesday September 27, 2005
Recently we had to make yet another modification to our runtime .init processing to compensate for an undesirable application interaction. I thought an overview of our torturous history of .init processing might be entertaining.
During the creation of a dynamic object, the link-editor ld(1), arranges for any .init and .fini sections to be collected into blocks of code that are executed by the runtime linker ld.so.1(1). These blocks of code are typically used to implement constructors and destructors, or code identified with #pragma init, or #pragma fini.
The original System V ABI was rather vague in regard these sections, stating
After the dynamic linker has built the process image and performed the
relocations, each shared object gets the opportunity to execute some
initialization code. These initialization functions are called in no
specific order, but all shared object initializations happen before the
executable file gains control.
Similarly, shared objects may have termination functions, which are
executed with the atexit(3c) mechanism after the base process begins
its termination sequence. Once again, the order in which the dynamic
linkers calls termination functions is unspecified.
The system has evolved since this was written, and today there is an expectation that any initialization functions are called before any code within the same object is referenced. This holds true for dependencies of the application and objects that are dynamically added to a process with dlopen(3c). The reverse is expected on process termination, and when objects are removed from the process with dlclose(3c).
Todays processes, language requirements (C++), lazy-loading, together
For the rest of this discussion, let's use .init sections for examples.
In the early days of Solaris, .init sections were run in reverse load order, sometimes referred to as reverse breadth first order. If an application had the following dependencies:
% ldd a.out lib1.so.1 => /opt/ISV/lib/lib1.so.1 lib2.so.1 => /opt/ISV/lib/lib2.so.1 libc.so.1 => /usr/lib/libc.so.1
then the initialization sequence would be libc.so.1 followed by lib2.so.1 followed by lib1.so.1.
This level of simplicity proved insufficient for calling .init sections in their expected order. All that was required was for a dependency to have its own dependency. For example, if lib1.so.1 had a dependency on lib3.so.1, the load order would reveal:
% ldd a.out lib1.so.1 => /opt/ISV/lib/lib1.so.1 lib2.so.1 => /opt/ISV/lib/lib2.so.1 libc.so.1 => /usr/lib/libc.so.1 lib3.so.1 => /opt/ISV/lib/lib3.so.1
The result of this loading was that the .init for lib3.so.1 was called before the system library libc.so.1. In practice this wasn't a big issue back in our very early releases. Although libc.so.1 is typically the dependency of every application and library created, its .init used to contributed little that was required by the .init's of other objects.
Issues really started to arise as C++ and/or threads use started to expand. The libraries libC.so.1 and libthread.so.1, and the C++ objects themselves, had far more complex initialization requirements. It became essential that libC.so.1 and libthread.so.1's .init ran before any other objects .init.
In Solaris 6, the runtime linker started constructing a dependency ordered list of .init sections to call. This list is built from the dependency relationships expressed by each object and any bindings that have been processed outside of the expressed dependencies.
Explicit dependencies are established at link-edit i.e., lib1.so.1 needs lib3.so.1. However, explicit dependency relationships are often insufficient to establish complete dependency relationships. It is still very typical for a shared object to be generated that does not express its dependencies. Use of the link-editors -z defs option would enforce that dependencies be expressed, but this isn't always the case.
Therefore, the runtime-linker also adds any dependencies established by relocations to the information used for topological sorting. ldd(1) can be used to display expected .init order:
% ldd -di a.out lib1.so.1 => ./lib1.so.1 lib2.so.1 => ./lib2.so.1 libc.so.1 => /usr/lib/libc.so.1 lib3.so.1 => ./lib3.so.1 init object=/usr/lib/libc.so.1 init object=./lib3.so.1 init object=./lib1.so.1 init object=./lib2.so.1
But there's still something missing. The above example shows ldd processing only immediate (data) relocations. This is normal when executing an application, i.e., without LD_BIND_NOW being set. Typically, functions are not resolved when objects are loaded, but are done lazily when the function is first called.
The problem with the runtime linkers dependency analysis, is that without resolving all relocations, including functions, the exact dependency relationship may not be known at the time .init firing starts.
Of course, we had one library that had to be treated differently, libthread.so.1. Even though this library could have dependencies on other system libraries, it's .init had to fire first. This was insured with the .dynamic flag DF_1_INITFIRST. But, this also excited others who claim they'd like to be first too! Better mechanisms have since evolved to insure the new merged libthread and libc are initialized appropriately.
As a side note, when topological sorting was added, the environment variable LD_BREADTH was also provided. This variable suppressed topological sorting and reverted to the original breadth first sorting. This fall back was provided in case applications were found to be dependent upon breadth first sorting, or in case bugs existed in the topological sort mechanism. Sadly, the latter proved true, and LD_BREADTH found its way into scripts and user startup files. But, as systems became more complex, LD_BREADTH became increasingly inappropriate, and its existence caused more problems than it solved. This environment variables processing was finally removed.
Note, the debugging capabilities that are available with the runtime linker in OpenSolaris have been significantly updated so that LD_DEBUG=init,detail provides detailed information on the topological sorting process.
To complete the initialization model, each time the runtime-linker resolves a function call (through a .plt), the defining objects .init is called if it hasn't already executed. With dynamic .init calling, a lazily bound family of objects can assume that an objects .init is called before code in that object is referenced.
Suppose in our a.out example, that the .init code executed for lib3.so.1 makes a call to lib2.so.1. It follows that the .init for lib2.so.1 should be called before it had previously been scheduled. This dynamic initialization can not be observed under ldd, but can be seen with the runtime-linkers debugging:
% LD_DEBUG=init a.out ...... 09086: calling .init (from sorted order): /usr/lib/libc.so.1 09086: 09086: calling .init (done): /usr/lib/libc.so.1 09086: 09086: calling .init (from sorted order): ./lib3.so.1 09086: 09086: calling .init (dynamically triggered): ./lib2.so.1 09086: 09086: calling .init (done): ./lib2.so.1 09086: 09086: calling .init (done): ./lib3.so.1 09086: 09086: calling .init (from sorted order): ./lib1.so.1 09086: 09086: calling .init (done): ./lib1.so.1
But, there is still something missing. What if a family of objects are not lazily bound? And, can a user assume that an objects .init has completed before the code in that object is referenced?
It has been observed that users frequently call
Suppose a family of objects have been loaded under the default mode of
lazy binding. The runtime linker has sorted this family and is
in the process of firing the .init's. Now let's say that a
particular .init calls
This observation prompted an additional level of dynamic .init
calling. When ever a family of objects are
Objects can be loaded without lazy-binding, either under
control of LD_BIND_NOW, the
Cyclic dependencies seem to be quite common. It has been observed that calling one objects .init can result in one or more other objects being called, which in turn reference code from the originating object. Problems start manifesting themselves when this return to the originating object exercises code whose initialization has not yet completed.
When topological sorting detects cycles, the members of the cycles will have their .init's fired in reverse load order. With dynamic .init calling, this order may be a fine starting point, but without dynamic .init calling this order may not be sufficient to prepare for the execution path of code throughout the cyclic objects.
A similar issue has arisen between different threads of control. One thread may be in the process of calling an .init and get preempted for another thread that references the same object. In fact, the runtime linker is fully capable of using condition variables to synchronize .inituse. However, this functionality is not enabled by default because of cyclic dependencies, and the possible deadlock conditions that can result.
When we added the reevaluation and firing of .init's if any loaded objects where referenced by a dlopen(), dlopen(0) fell under the same model. And, although this model was integrated into Solaris a couple of years ago, it wasn't until recently that some existing applications were observed to fail because of the .init reevaluation.
The problem was that unexpected recursion was occurring. .init's were being fired, and then the code within a .init called dlopen(0). This caused a reevaluation of the outstanding .init sections, some .init's to fire, and a thread of execution lead to running code within an object whose .init had not yet completed.
Yet another refinement was added. Now, if any dlopen() operation only references objects that are already loaded, and that dlopen() operation does not promote any objects relocation requirements, i.e., doesn't use RTLD_NOW, then the thread of initialization that must presently be in existence is left to finish the job. If however, the dlopen() operation adds new objects, or promotes any existing objects relocations, then the family of objects referenced will have their .init's reevaluated, and a new thread of initialization is kicked off to process this family.
As you can now see, the dynamic initialization of objects within a process is quite complex, and no-one in the right mind would have ever designed it this way. This complexity has evolved, from some incredibly vague starting point, to an implementation that has become necessary to satisfy existing dynamic objects. Whether a lot of this complexity is by accident or by design, is open to debate. And whether developers have ever considered initialization requirements or designed to some goal, seems doubtful. Given the inability to initialize groups of cyclic dependencies in a correct order, you have to wonder if users ever meant to create such an environment. Personally I think most initialization "requirements" have worked more through luck than judgment,
I've tried to provide documentation to educate users of the issues facing object initialization. However, the Linker and Libraries Guide isn't the first place folks look. Documentation should probably start with the compiler documents, which are the first place developers usually go. Documenting how users should use .init's and .fini's might also be useful. Although this really falls back to the languages, like C++, to document constructor and deconstructor use.
Better methods of analyzing initialization dependencies may also be useful. The runtime linkers debugging capabilities are a start. Objects can also be exercised under ldd using the -i and -d/r options. But this still seems a little too late in the development cycle. It would be nice if we could flag things like cyclic dependencies during object development. Specifically, the cyclic dependencies of .init's and .fini's. However, one problem with todays applications is that they are not all created in one build environment. Many components, asynchronously delivered from different ISV's are brought together to produce a complete application.
The bottom line is keep it simple. Try and get rid of .init code. It's amazing how much exists, and what it does (dlopen()/dlclose(), firing off new threads, I've even seen .init sections fork()/exec() processes). .init code often initializes an object for all eventualities, whereas much of the initialization is never used for a particular thread of execution. Make the initialization self contained by eliminating and reducing the references to external objects from the initialization code. Reducing exported interfaces can help too. By keeping things simple you can avoid much of the initialization interactions that the runtime linkers implementation has evolved to handle. Your process will start up faster too. Folks often comment how long it takes for a process to get to main. Have a look how much initialization processing comes before this!
| reducing dlsym() overhead|| Very Slow Link - get patch|