Interface Creation - using the compilers

Rod Evans — Sunday January 02, 2005

Surfing with the Linker-Aliens

In a previous posting, I covered how the interface of a dynamic object could be established by defining the interface symbols within a mapfile, and feeding this file to the link-edit of the final object. Establishing an objects interface in this manner hides all non-interface symbols, making the object more robust and less vulnerable to symbol name space pollution. This symbol reduction also goes a long way to reducing the runtime relocation cost of the dynamic object.

This mapfile technique, while useful with languages such as C, can be challenging to exploit with languages such as C++. There are two major difficulties.

First, the link-editor only processes symbols in their mangled form. For example, even a simple interface such as:

    void foo(int bar)

has a C++ symbolic representation of:

    % elfdump -s foo.o
    ....
    [2]  .... FUNC GLOB  D   0 .text  __1cDfoo6Fi_v_

As no tool exists that can determine a symbols mangled name other than the compilers themselves, trying to establish definitions of this sort within a mapfile, is no simple task.

The second issue is that some interfaces created by languages such as C++, provide implementation details of the language itself. These implementation interfaces often must remain global within a group of similar dynamic objects, as one interface must interpose on all the others for the correct execution of the application. As users generally are not aware of what implementation symbols are created, they can blindly demote these symbols to local when applying any interface definitions with a mapfile. Even the use of linker options like -B symbolic are discouraged with C++, as these options can lead to implementation symbols being created that are non-interposable.

Thankfully, some recent ELF extension work carried out with various Unix vendors has established a set of visibility attributes that can be applied to ELF symbol table entries. These attributes are maintained within the symbol entries st_other field, and are fully documented under "ELF Symbol Visibility" in the "Object File Format" chapter of the Linker and Libraries Guide.

The compilers, starting with Sun ONE Studio 8, are now capable of describing symbol visibility. These definitions are then encoded in the symbol table, and used by ld(1) in a similar manner as reading definitions from a mapfile. Using a combination of code "definitions" and command line options, you can now defined the runtime interface of a C++ object.

As with any interface definition technique, this compilation method can greatly reduce the number of symbols that would normally be employed in runtime relocations. Given the number and size of C++ symbols, this technique can produce runtime relocation reductions that far exceed those that would be found in similar C objects. In addition, as the compiler knows what implementation symbols must remain global within the final object, these symbols are given the appropriate visibility attribute to insure their correct use.

Presently there are two recommendations for establishing an objects interface. The first is to define all interface symbols using the __global directive, and reduce all other symbols to local using the -xldscope=hidden compiler option. This model provides the most flexibility. All global symbols are interposable, and allow for any copy relocations1 to be processed correctly.

The second model is to define all interface symbols using the __symbolic directive, and again reduce all other symbols to local using the -xldscope=hidden compiler option. Symbolic symbols (also termed protected), are globally visible, but have been internally bound to. This means that these symbols do not require symbolic runtime relocation, but can not be interposed upon, or have copy relocations against them.

In practice, I'd expect to see significant savings in the runtime relocation of any modules that used either model. However, the savings between using the __global or the __symbolic model may be harder to measure. In a nutshell, if you do not want a user to interpose upon your interfaces, and don't export data items, you can probably go with __symbolic. If in doubt, stick with the more flexible use of __global.

The following examples uses C++ code that was furnished to me as being representative of what users may develop.

    % cat interface.h
    class item {
    protected:
        item();
    public:
        virtual void method1() = 0;
        virtual void method2() = 0;
        virtual ~item();
    };

    extern item \*make_item();

    % cat implementation.cc
    #include "interface.h"

    class __global item; /\* Ensures global linkage for any
                            implicitly generated members. \*/

    item::item() { }
    item::~item() { }

    class item_impl : public item {
        void method1();
        void method2();
    };

    void item_impl::method1() { }
    void item_impl::method2() { }

    void helper_func() { }

    __global item \*make_item() {
        helper_func();
        return new item_impl;
    }

All interface symbols have employed the __global attribute. Compiling this module with -xldscope=hidden reveals the following symbol table entries.

    % elfdump -CsN.symtab implementation.so.1
    ...
    [31]  .... FUNC LOCL  H  0 .text  void helper_func()
    [32]  .... OBJT LOCL  H  0 .data  item_impl::__vtbl
    [35]  .... FUNC LOCL  H  0 .text  void item_impl::method1()
    [36]  .... FUNC LOCL  H  0 .text  void item_impl::method2()
    [54]  .... OBJT GLOB  D  0 .data  item::__vtbl
    [55]  .... FUNC GLOB  D  0 .text  item::item()
    [58]  .... FUNC GLOB  D  0 .text  item::~item #Nvariant 1()
    [59]  .... FUNC GLOB  D  0 .text  item\*make_item()
    [61]  .... OBJT GLOB  D  0 .data  _edata
    [67]  .... FUNC GLOB  D  0 .text  item::~item()
    [77]  .... FUNC GLOB  D  0 .text  item::item #Nvariant 1()

Notice that the first 4 local (LOCL) symbols would normally have been defined as global without using the symbol definitions and compiler option. This is a simple example, as implementations get more complex, expect to see a larger fraction of symbols demoted to locals.

For a definition of other related compiler options, at least how they relate to C++, see Linker Scoping2.


1 Copy relocations are a technique employed to allow references from non-pic code to external data items, while maintaining the read-only permission of a typical text segment. This relocations use, and overhead, can be avoided by designing shared objects that do not export data interfaces.

2 It's rumored the compiler folks are also working on __declspec and GCC __attribute__ clause implementations. These should aid porting code and interface definitions from other platforms.


A Update - Sunday May 29, 2005

Giri Mandalika has posted a very detailed article on this topic, including the __declspec implementation.

Surfing with the Linker-Aliens

Comments

guest — Monday January 03, 2005
Re ²: The gcc folks seem to want to stick with <code>__attribute__ ((visibility ()))</code> although it's only implemented for a couple of platforms... <code>__global</code> looks much nicer, of course... ;-)
Giri Mandalika — Monday February 14, 2005
2 It's rumored the compiler folks are also working on __declspec ..

__declspec was already implemented in Sun Studio 9 to facilitate easy porting of Windows applications to Solaris. __declspec(dllexport) maps to __symbolic and __declspec(dllimport) maps to __global

Unfortunately the documentation at docs.sun.com was not updated completely to reflect the new additions to compiler

Thanks for the great post, Rod
Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/rie/entry/interface_creation_using_the_compilers/
https://blogs.oracle.com/rie/entry/interface_creation_using_the_compilers/
https://blogs.oracle.com/rie/interface-creation-using-the-compilers/

Surfing with the Linker-Aliens

[11] Static Linking EOL
Blog Index (rie)
[13] Loading Relocatable Objects at Runtime