The Problem(s) With Solaris SVR4 Link-Editor Mapfiles

Ali Bahrami — Wednesday January 06, 2010

Surfing with the Linker-Aliens

Until recently, I've never really felt that I fully understood the mapfile language used by the Solaris link-editor (ld), despite having used it for years. It's a terse and arbitrary language that does not encourage intuition, full of special cases and odd twists. No matter how many times you read the mapfile chapter of the Linker and Libraries Guide, you're left with a sneaking suspicion that some things just don't fit, or that you've missed something.

Lately, I've been working on a new mapfile syntax to replace this original language, which Solaris inherited as part of its System V Release 4 origins. In the process, I've examined every line of the manual, and of the code, many times. I believe I understand it all the way down now, and I'd like to record some of what I've learned here. My main reason for doing this is as justification for undertaking a replacement language. Oddly enough though, I believe that this information will make it easier to decode, use, and write these older mapfiles. Once you understand the quirks, you can work around them.

This discussion will not cover the new syntax — that will come in a subsequent installment. However, I do want to reassure you that full support for the original mapfile language will remain in place. We're not about to force anyone to rewrite 20+ years worth of mapfiles. The goal is to freeze the old support in its current form, provide a better alternative, and gradually move the world to it over a period of years.

Terse To A Fault / Not Extensible

The core of the old syntax is simple enough: You can create segments, set attributes for them, and assign sections to them. One can easily believe that it seemed adequate and reasonable to its creators. Their primary design decision was to make SVR4 Mapfiles a magic character language. The purpose of a given statement is specified using special characters (=, :, |, @). Options to these statements are further distinguished from each other using other special characters (?, $, ...), or single letter prefixes.

Languages face continuous pressure to expand and provide new features. The initial language may have seemed spare and elegant, but it failed to provide a scalable mechanism for expansion, and this has proven to be a terrible weakness:

There are only a limited number of magic characters available, mainly on the top row of the keyboard.
Only a few of these characters have mnemonic meanings that make intuitive sense in the context of object linking. And those that do have such a meaning can easily imply more than one thing. For example, In the SVR4 syntax, '=' means segment creation, and ':' means section to segment assignment. The reverse would make just as much sense: ':' could have meant segment creation, and '=' could have meant assign sections to segments. No one would have found this less intuitive. I used to constantly get these backwards, and would have to look at the manual or another mapfile to remember which character has which meaning. That's pretty sad, considering that '=' is probably the most mnemonic character in the language.
After the first few good characters (=, :) are taken, the remaining assignments become rather arbitrary. For instance, the | character is used to specify section order, while @ specifies the creation of a "segment size symbol". Neither of these evoke meaning. To the extent that they do, it's a negative effect, such as the fact that '|' evokes shell pipelines, but means nothing like that in mapfiles.
Some characters are overloaded, having different meanings in different contexts.

Most of these characters have no mnemonic value. The human mind struggles to remember what they stand for, resulting in frequent trips to the reference manual to decode them. The problem gets worse as the number of supported features grows, and is exacerbated by the fact that most people only read mapfiles on an occasional basis. The syntax is not reinforced by constant use the way some other terse languages are.

SVR4 Mapfile Syntax As Evolution

For me, the best way to understand the SVR4 mapfile syntax has been to start with it's original form, and then consider how and where each subsequent feature has been added.

In the late 1980's, starting around 1986 or so, Unix System V Release 4 (SVR4) was being developed at AT&T. They created a new linking format (ELF), to resolve the inadequacies of previous format (COFF) used in SVR3. SVR3 had a rather elaborate mapfile syntax. Rather than stay with this syntax, the SVR4 people designed a new, smaller, and simpler replacement. We don't know their reasons for this decision, and can only guess that they didn't think the SVR3 language was necessary, or a good fit with their new ELF based link-editor. As an aside, while researching different mapfile languages during the design of the replacement syntax for Solaris, I discovered that there is a notable similarity between the SVR3 mapfile language and GNU ld linker scripts. SVR3 lives on, as does much of Unix, in its influence on later systems.

The original SVR4 language was very small, consisting of four different possible statements. All of these have the form:

name magic ... ;

where name is a segment name, and magic is a character that determines what the directive does:

Segment Definition (=): Create segments and/or modify their attributes
Section to Segment Assignment (:): Specify how sections are assigned to segments
Section-Within-Segment Ordering (|): Specify the order in which output sections are ordered within a segment
Segment size symbols (@): Create absolute symbols containing final segment size in output object.

Solaris started with the original SVR4 code base. Since then, Sun has added three more top level statements:

Symbol scope/version definition ({}): Assign symbols to versions
File Control Directives (-): Specify which versions can be used from sharable object dependencies
Hardware/Software capabilities (=): Augment/override the capabilities from objects.

File Control Directives and Capabilities use the same form of syntax as the original four directives. Symbol scope/version blazed a new path, using {} to group symbol names within:

[version-name] {
    scope:
	symbol [= ...];
	*;
} [inherited-version-name...];

In the following subsections, I will present a brief description of each of these top level mapfile statements, and discuss the various odd or unfortunate aspects of each. If you're not familiar with the mapfile language, it may be helpful to have the Linker and Libraries Guide available as well.

Segment Definition (=)

Segment definition statements can be used to create a new segment, or to modify existing ones:

segment_name = segment-attribute-value... ;

If a segment-attribute-value is one of (LOAD, NOTE, NULL, STACK), then it defines the type of segment being created:

LOAD segments are regions of the object to be mapped into the process address space at runtime, represented by PT_LOAD program header entries. LOAD segments were part of the original AT&T language.
NOTE segments are regions of the object to contain note sections, represented by PT_NOTE program header entries. NOTE segments were part of the original AT&T language.
NULL segments are a concept added by Sun. As far as we can determine, they were created in order to have a type of segment that won't be eliminated from the output object if no sections are assigned to them. Their actual meaning depends on whether the ?E flags is set or not, as described below.
STACK "segments" were added by Sun relatively recently in order to allow modifying the attributes of the process stack. This is not really a segment at all: It does not specify a memory mapping, and sections cannot be assigned to it. Rather, it allows access to the PT_SUNWSTACK program header, setting the header flags to specify stack permissions. Only the flags can be set on a stack "segment". The representation of this concept as a segment was done to simplify the underlying implementation of the feature.

If a segment-attribute-value starts with the '?' character, then it is a segment flag:

?R, ?W, ?X
Set the Read (PF_R), Write (PF_W), and eXecute (PF_X) program header flags, respectively. This is a feature of the original SVR4 syntax, and is self explanatory.
?E
The Empty flag can be be used with either a LOAD, or NULL segment:

Applied to a LOAD segment, the ?E flag creates a "reservation". This is an obscure and little used feature by which a program header is written to the output object, "reserving" a region of the address space for use by the program, which presumably knows how to locate it and do something useful. Sections cannot be assigned to such a segment.
Applied to a NULL segment, the ?E flag adds extra PT_NULL program headers to the end of the program header array. This feature is useful for post optimizers which rewrite objects to add segments, and need a place to create corresponding PT_LOAD program headers for them.
The ?E flag is meaningless when applied to NOTE or STACK segments.

The Empty flag was added by Sun. It should be noted that ?E does not correspond to an actual program header flag. It's treatment as a flag in the mapfile syntax, rather than expressing it as a different sort of option (using a magic character other than '?' as a prefix) was primarily a matter of implementation convenience.

?N
Normally, the link-editor makes the ELF and program headers part of the first loadable segment in the object. The ?N flag, if set on the first loadable segment, prevents this from occurring. The headers are still placed in the output object, but are not part of a segment, and therefore not available at runtime. It is meaningless to apply ?N to a non-LOAD segment.
This flag was added by Sun. As with ?E, it does not correspond to a real program header flag. It's representation as a flag is a matter of implementation convenience.
?O
This is another flag, added by Sun, that does not correspond to a real program header. It is used to control the order the placement of sections from input files within the output sections in the segment. Sections are assigned to segments via the ':' mapfile directive. Normally, sections are added in the order seen by the link-editor. When ?O is set, the order of the input sections matches the order in which these assignment directives are found in the mapfile.
This feature was added to support the use of the -xF option to the compiler. That option causes each function to be placed in its own section, rather than all of the functions from a given source file going into a single generic text section. Then, their order can be specified using a mapfile, as with this example taken from the Linker and Libraries Manual:
text = LOAD ?RXO;
text : .text%foo
text : .text%bar
text : .text%main
	
The result of using this mapfile will be for foo(), bar(), and main() to be placed adjacent to each other at the head of the segment, in that order. This feature can be used to put routines that call each other close together, to enhance cache performance. It is worth noting that it was also necessary to set the R and X flags, even though they already are RX on a text segment. This is a quirk of the SVR4 syntax: Any change to the flags replaces the previous value, so we have to specify the flags we want to keep (RX) as well as the one we want to set (O).

A segment-attribute-value can also be a numeric value, prefixed with one of the letters (A, L, R, V), to set the Alignment, Maximum length, Rounding, Physical address, or Virtual address of a LOAD segment, respectively.

The syntax for segment definition suffers from a variety of issues:

Most of the options can only be applied to specific segment types, primarily to type LOAD. However, the syntax does nothing to prevent you from trying to apply attributes that are invalid for the type of segment in question. For instance, you might try to assign an address to a STACK segment. The link-editor contains a fair amount of code dedicated to detecting such uses and issuing errors. A better syntax would not allow you to specify nonsensical options in the first place.
STACK "segments" are not really segments at all, but simply a convenient way to manipulate a specific program header. This is confusing, and can lead the user to believe they can control aspects of the stack (such as it's address) that are not user settable.
An ELF object can only have one PT_SUNWSTACK program header. The segment notation used by mapfiles requires the user to give their stack "segment" a name, perhaps causing a user to think they might be able to create more than one stack by specifying more than one mapfile directive using different names. The link-editor contains code dedicated to catching this and turning it into an error.
The ?E, ?N, and ?O flags are confusing, in that they do not correspond directly to ELF program header flags, their terseness, and also their obscure semantics.
There is a syntactic ambiguity with capability directives, which use the same magic character (=) as segment definitions, but which are otherwise unrelated. See the discussion of capability directives below for details.

Section to Segment Assignment (:)

The link-editor contains an internal list of entrance criteria each of which contains section attributes. To place a section in an output segment, it compares the section to each item in this list. If a section matches all of the items in a given entrance criteria, then the section is assigned to the corresponding segment, and the search ends.

Sections can be assigned to a specific segment via the following syntax, which uses the ':' magic character. The result of such a statement is to place a new entrance criteria on the internal list:

segment_name : section-attribute-value... [: file-name...];

If a section-attribute-value starts with a '$' prefix, then it specifies a section type. This can be one of ($PROGBITS, $SYMTAB, $STRTAB, $REL, $RELA, $NOTE, $NOBITS).

If a section-attribute-value starts with a '?' prefix, then it specifies one or more section header flags: A (SHF_ALLOC), W (SHF_WRITE), or X (SHF_EXECINSTR). To specify that a given flag must not be present, you can prepend it with the '!' character.

A section-attribute-value that does not start with a '$' or '?' prefix is a section name.

If there is a second colon (':') character on the line, then all items following it are file paths, and if any of these match the path for the input file containing the section to be assigned, it is considered to be a match. If the path name is prefixed with a '*', then the basename of the path is compared to the given name rather than the entire path.

Odd aspects of section to segment assignment:

The list of section types is incomplete. ELF defines many more section types than the 7 listed above. Apparently this feature isn't used, at least with types outside of $PROGBITS or $NOBITS, because I've never heard of a complaint about it. In any event, all ELF section types should be supported.
The use of a '*' prefix to mean 'basename' in file paths is odd. Conditioned by common shell idioms, any Unix user would expect a '*' within a filepath to be a standard glob wildcard, expanded as the shell would. You'd expect it to match an arbitrary number of characters, and to be usable in the middle of the name, not just at the beginning. Another reasonable assumption would be that it is a regular expression, with the implication that other regular expression features are also possible. None of that applies: A '*' prefix means 'basename', and only if it is the first character.
The fact that segment definition, and the assignment rules, are two separate statements creates the potential for a class of error where the user attempts to assign sections to a segment that cannot accept them (e.g. STACK). Having lured you in with a syntax that suggests something that isn't possible, the link-editor contains code to detect and refuse such assignments. Better syntax could prevent this.

Section-Within-Segment Ordering (|)

Section within segment ordering can be used to cause the link-editor to order output sections within a segment in a specified order. The specification is done by section name:

segment_name | section_name1;
segment_name | section_name2;
segment_name | section_name3;

The named sections are placed at the head of the output section in the order listed.

One might expect to be able to put more than one section on a line (you can't), and the use of '|' may cause a Unix user to make some invalid assumptions about shell pipes, or the C bitwise OR operator. However, there's nothing really terrible about this directive.

It's also not terribly useful --- I'm not sure I've ever seen it used outside of our link-editor tests.

Segment size symbols (@)

The '@' magic character is used to create an absolute symbol containing the length of the associated segment, in bytes:

segment_name @ symbol_name;

There is no corresponding mechanism to create a symbol containing the starting address of a segment, so it is debatable how useful the length is. Perhaps the user is expected to know the name of the first item (possibly a function in a text segment) and use that. In any case, we've never seen this feature used outside of our own tests.

The use of '@' carries no useful mnemonic information, but that's not unique to this particular directive.

Symbol Scope/Version Definition ({})

Symbol scope/versioning directives allows you to build objects that group symbols into named versions. When objects are built, they record the versions they require from dependencies, and at runtime, the runtime linker ld.so.1 validates that the necessary versions are present. Versioning was introduced in Solaris 2.5, and was later adopted (with extensions) by the GNU/Linux developers in a manner compatible with Solaris. This is easily the most successful part of mapfile language, and has proven to be a very useful feature. Today, most mapfiles we encounter contain only symbol versioning.

Scope/versioning definitions have the form:

[version-name] {
    scope:
	symbol [= ...];
	*;
} [inherited-version-name...];

If no version-name is specified, it's a simple scope operation, where global names are assigned to the unnamed "global" version. If a version name is given, the symbols within are assigned to that version, and the version can specify other versions that it inherits from.

Within the {} braces, one can encounter three different types of item:

A symbol scope name (default/global, eliminate, exported, hidden/local, protected, singleton, symbolic), followed by a colon. These statements change the current scope, which starts as global, to the one specified. Any symbols listed after a scope declaration receive that scope, until changed by a following scope definition.
A '*', which is called the scope auto-reduction operator. All global symbols in the final object not explicitly listed in a scope/version directive are given the current scope, which must be hidden/local, or eliminate. Auto-reduction is a powerful tool for preventing implementation details of an object from becoming visible to other objects.
A symbol name, optionally followed by a '=' operator and attributes, finally terminated with a ';'.

The attributes that are allowed for a symbol are:

A numeric value, prefixed with a 'V', giving the symbol value.
A numeric value, prefixed with a 'S', giving the symbol size.
One of 'FUNCTION', 'DATA', or 'COMMON', specifying the type of the symbol.
'FILTER', or 'AUX', specifying that the symbol is a standard or auxiliary filter, followed by the name of the object supplying the filtee. The two tokens are separated by whitespace.
A large number of flags that specify various attributes: 'PARENT', 'EXTERN', 'DIRECT', 'NODIRECT', 'INTERPOSE', 'DYNSORT', 'NODYNSORT'.

The scope/symbol directives are by far the most successful part of the SVR4 mapfile language, and there is relatively little to complain about. However, there are aspects of the way the symbol attributes work that could certainly be improved, caused in my opinion by an evident attempt to fit things stylistically with the rest of the language:

The use of 'V' and 'S' prefixes for value and size, contrasted with the full keywords 'FILTER' and 'AUX' is odd.
The lack of some sort of connecting syntax between FILTER/AUX and the associated object is confusing, and leads to certain confusing errors. For example, a statement like 'filter function' is probably intended to say that the symbol is a function, and also a filter, but will be interpreted as being a filter to a library named 'function', drawing no error from the link-editor. A syntax such as 'filter=object' might have been better.
The syntax does not distinguish between the type and flag values values. This is generally not a problem, but a syntax that did would be more precise, and possibly helpful.

File Control Directives (-)

File control directives allow you to tell the link-editor to restrict the symbol versions available from a sharable object dependency being linked to the output object. The most common use for this feature is to limit your object to a set of functionality associated with a specific release of the operating system:

shared_object_name - version_name [version_name ...];

where version_name is the name of versions found within the shared object.

When a given shared object is specified with one of these directives, the link-editor will only consider using symbols from the object that come from the listed versions, or the versions they inherit. The link-editor will then make the versions actually used dependencies for the output object.

Alternatively, a version_name can be specified using the form:

$ADDVERS=version_name

In this case, the specified name is made a dependency for the output object whether or not it was actually needed by the link.

There are some odd aspects to file control directives:

The '-' magic character has no mnemonic value (as usual).
The use of the '$' character in $ADDVERS, to create a type of optional attribute, is unusual, and represents an overloading of '$' relative to other directives.
The use of '$' aside, the $ADDVERS= notation is unusual relative to the rest of the language, which might have used another magic character instead.

Hardware/Software Capabilities (=)

The hardware and software capabilities of an object can be augmented, or replaced, using mapfile capability directives:

hwcap_1 = capitem...;
sfcap_1 = capitem...;

where the values on the right hand side of the '=' operator can be one of the following:

The name of a capability
A numeric value, prefixed with a 'V' to indicate that it is a number rather than a name.
$OVERRIDE, instructing the link-editor that the capabilities specified in the mapfile should completely replace those provided by the input objects, rather than add to them.

Perhaps the most unfortunate fact about the capability directives is that they use the '=' magic character, which normally indicates a segment definition. This has some odd ramifications:

The names 'hwcap_1', and sfcap_1' have been stolen from the segment namespace, and cannot be used to name segments.
As new capabilities are added to the system, it may become necessary to introduce new capability directives. For example, it is clear that 'hwcap_2' will soon be needed on X86 platforms. When this happens, the new name will also be taken from the segment namespace. Existing mapfiles using that name for a segment will break. One might reasonably expect that there are no such mapfiles, but that is a poor justification.
These names are case sensitive. Although segments cannot be named 'hwcap_1', or 'sfcap_1', they can have these names using any other case. For instance, 'HWCAP_1' will be interpreted as a segment, not as a capability.

One can understand the temptation to reuse '=' for capabilities, instead of picking some other unused magic character. Which one would you pick to convey the idea of 'capability'? I don't find any of the available characters (%, \^, &, ~) compelling in the least. Still, this overloading of '=' is a problem.

As a demonstration of how very similar mapfile lines can have wildly different meanings, consider the following example, which uses the debug feature of the link-editor to show us how mapfile lines are interpreted:

% cat hello.c
#include <stdio.h>

int
main(int argc, char **argv)
{
        printf("hello\\n");
        return (0);
}
% cat mapfile-cap
HwCaP_1 = LOAD ?RWX;		# A segment
hwcap_1 = V0x12;                # A capability
% LD_OPTIONS=-Dmap cc hello.c -Mmapfile-cap
debug: 
debug: map file=mapfile-cap
debug: segment declaration (=), segment added:
debug: 
debug: segment[3] sg_name:  HwCaP_1
debug:     p_vaddr:      0           p_flags:    [ PF_X PF_W PF_R ]
debug:     p_paddr:      0           p_type:     [ PT_LOAD ]
debug:     p_filesz:     0           p_memsz:    0
debug:     p_offset:     0           p_align:    0x10000
debug:     sg_length:    0
debug:     sg_flags:     [ FLG_SG_ALIGN FLG_SG_FLAGS FLG_SG_TYPE ]
debug: 
debug: hardware/software declaration (=), capabilities added:
debug:

Other misfeatures of the capability syntax are the overloading of the '$' prefix to indicate an instruction to the link-editor ($OVERRIDE), and the use of the 'V' prefix in front of numeric values. These prefixes have different, though similar, meanings elsewhere, which makes the language hard to understand.

Mapfile Magic Character Decoder Ring

Another strategy for understanding SVR4 mapfiles is to organize things by magic character.

Most mapfile directives have the form:

name magic ... ;

where name is generally (but not always) a segment name, and magic is a character that determines what the directive does.

The following is a comprehensive list, in no particular order, of the magic characters and related syntactic elements used in the current SVR4 mapfile language:

Character Meaning

=

Create a new segment, or modify the attributes of an existing one, as long as the segment is not named 'hwcap_1', or 'sfcap_1'.
If '=' is used to reference a "segment" named 'hwcap_1', or 'sfcap_1', then this is a hardware or software capabilities directive, and not a segment directive at all. This means that you cannot create a segment named 'hwcap_1', or 'sfcap_1'. However, these names are case sensitive, so you can create segments of those names using any other case. For example, HWcap_1 would name a segment rather than refer to hardware capabilities.
Within a symbol scope/version, associate a symbol name to one or more following attributes.
Within a "File Control Directive", associate the $ADDVERS option (a use of the '$' magic character) with a version name, causing the given version to be added to the output object even if it is not directly used.

:

Assign sections to segments.
If used twice in a section to segment assignment directive, the second one indicates that the items following it are not section names, as they have been to that point, but are file paths from which the previous sections can come.

| Specify output section ordering within a segment. It does not mean "pipe" as it would in the shell, nor does it mean 'OR' as it would in a C-style programming language.

@ Create a "size symbol" for the specified segment, containing the length of the segment. It is not clear how useful these are, since there is no corresponding "address symbol" that might be used to locate the start of the segment for which we have a size. We've never seen it used.

- A "File Control Directive", used to specify the version definitions to be used from the sharable objects linked to the output object.

{ } Grouping, used to contain the symbols within a scope/version directive.

; Terminates all directives, similar to its purpose in the C programming language.

*

Following the second ':' character in a section to segment assignment directive (:), as a prefix to the file names specified following the ':', specifies that the link-editor should compare the basename of the file providing the input section to the prefixed string, rather than comparing the full file path. The use of '*' in a file path is easily confused with the Unix shell "glob" wildcard character. However, this use in the mapfile is not a glob, and only has its special basename meaning if seen as the first character in the name.
Within a symbol scope/version directive, the scope auto-reduction operator, which causes all symbols not otherwise assigned to a symbol version to be reduced to the current scope, which must be local/hidden, eliminate, or protected.

?

Within a segment directive (=), indicates segment flags: 'E' (Empty), 'N' (Nohdr), O (Order), R (Read), W (Write), and 'X' (eXecute). Note that only RWX represent real program header flags. The others (ENO) are not really segment flags but communicate segment related information to the link-editor. This is an example of overloading --- they are "flag like", so it was convenient to treat them as flags rather than use some other magic character to represent them.
Within a section to segment assignment directive (:), indicates section flags: 'A' (Allocable), 'W' (Writable), and 'X' (eXecinstr). Within these flags, the '!' character can be used to specify that the following flag must not be set in the candidate section.

$

Within a section to segment mapping directive (:), a prefix used to indicate that the name following is a section type (PROGBITS, SYMTAB, etc) rather than a section name.
Within a "File Control Directive", a prefix used to indicate that the following name is a special option to be applied to a version. Currently, the only such option is $ADDVERS.

! Within a section to segment mapping directive (:), and within the specification of section flags (?), negates the meaning of a given flag, indicating that the flag must not be set.

A

When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a segment alignment.
When used in section to segment assignment (:) flag value(?), specifies the SHF_ALLOC section header flag.

E When used within a segment definition (=) for a flag (?) value, alters the meaning of LOAD or NULL segments. When applied to a LOAD segnebt, ?E specifies that this segment is to be reserved (Empty). No sections are assigned to it, but a program header is generated and at runtime, the region is available to the running program to use. This is an obscure and little used feature. When applied to a NULL segment, reserves an additional PT_NULL program header, for the use of post optimizers that will add segments to the object. Note that this "flag" does not correspond to an actual program header flag.

L When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a maximum segment size.

N When used within a segment definition (=) flag value (?): By default, the first segment in an object, which is usually the text segment, contains the ELF header found at the start of the file, making the ELF header available to the runtime linker. The ?N flag specifies that if this segment is the first in the file, it should omit the ELF header. Note that this "flag" does not correspond to an actual program header flag, and that it has no meaning if the segment does not end up being first.

O When used within a segment definition (=) flag value (?): Input sections assigned to the segment should be ordered within their output sections in the order that section assignment directives (:) for the segment are encountered within the mapfile. Note that this "flag" does not correspond to an actual program header flag.

P When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a physical address.

R

When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a segment rounding value.
When used within a segment definition (=) flag value(?), specifies the READ (PF_R) program header flag value.

S When used in a symbol scope/version directive, as a prefix to a numeric value in a symbol attributes, specifies that the number provides the symbol size (st_size).

V

When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a virtual address.
When used in a hwcap_1 or sfcap_1 capabilities definition (=), as a prefix to value that has not been recognized as a hardware or software capability name, indicates that the item is a number.
When used in a symbol scope/version directive, as a prefix to a numeric value in a symbols attributes, specifies that the number provides the symbol value (st_value).

W

When used within a segment definition (=) flag value (?), specifies the WRITE (PF_W) program header flag value.
When used in section to segment assignment (:) flag value(?), specifies the SHF_WRITE section header flag.

X

When used within a segment definition (=) flag value (?), specifies the EXECUTE (PF_X) program header flag value.
When used in section to segment assignment (:) flag value(?), specifies the SHF_EXECINSTR section header flag.

Character	Meaning
=	Create a new segment, or modify the attributes of an existing one, as long as the segment is not named 'hwcap_1', or 'sfcap_1'. If '=' is used to reference a "segment" named 'hwcap_1', or 'sfcap_1', then this is a hardware or software capabilities directive, and not a segment directive at all. This means that you cannot create a segment named 'hwcap_1', or 'sfcap_1'. However, these names are case sensitive, so you can create segments of those names using any other case. For example, HWcap_1 would name a segment rather than refer to hardware capabilities. Within a symbol scope/version, associate a symbol name to one or more following attributes. Within a "File Control Directive", associate the $ADDVERS option (a use of the '$' magic character) with a version name, causing the given version to be added to the output object even if it is not directly used.
:	Assign sections to segments. If used twice in a section to segment assignment directive, the second one indicates that the items following it are not section names, as they have been to that point, but are file paths from which the previous sections can come.
\|	Specify output section ordering within a segment. It does not mean "pipe" as it would in the shell, nor does it mean 'OR' as it would in a C-style programming language.
@	Create a "size symbol" for the specified segment, containing the length of the segment. It is not clear how useful these are, since there is no corresponding "address symbol" that might be used to locate the start of the segment for which we have a size. We've never seen it used.
-	A "File Control Directive", used to specify the version definitions to be used from the sharable objects linked to the output object.
{ }	Grouping, used to contain the symbols within a scope/version directive.
;	Terminates all directives, similar to its purpose in the C programming language.
*	Following the second ':' character in a section to segment assignment directive (:), as a prefix to the file names specified following the ':', specifies that the link-editor should compare the basename of the file providing the input section to the prefixed string, rather than comparing the full file path. The use of '*' in a file path is easily confused with the Unix shell "glob" wildcard character. However, this use in the mapfile is not a glob, and only has its special basename meaning if seen as the first character in the name. Within a symbol scope/version directive, the scope auto-reduction operator, which causes all symbols not otherwise assigned to a symbol version to be reduced to the current scope, which must be local/hidden, eliminate, or protected.
?	Within a segment directive (=), indicates segment flags: 'E' (Empty), 'N' (Nohdr), O (Order), R (Read), W (Write), and 'X' (eXecute). Note that only RWX represent real program header flags. The others (ENO) are not really segment flags but communicate segment related information to the link-editor. This is an example of overloading --- they are "flag like", so it was convenient to treat them as flags rather than use some other magic character to represent them. Within a section to segment assignment directive (:), indicates section flags: 'A' (Allocable), 'W' (Writable), and 'X' (eXecinstr). Within these flags, the '!' character can be used to specify that the following flag must not be set in the candidate section.
$	Within a section to segment mapping directive (:), a prefix used to indicate that the name following is a section type (PROGBITS, SYMTAB, etc) rather than a section name. Within a "File Control Directive", a prefix used to indicate that the following name is a special option to be applied to a version. Currently, the only such option is $ADDVERS.
!	Within a section to segment mapping directive (:), and within the specification of section flags (?), negates the meaning of a given flag, indicating that the flag must not be set.
A	When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a segment alignment. When used in section to segment assignment (:) flag value(?), specifies the SHF_ALLOC section header flag.
E	When used within a segment definition (=) for a flag (?) value, alters the meaning of LOAD or NULL segments. When applied to a LOAD segnebt, ?E specifies that this segment is to be reserved (Empty). No sections are assigned to it, but a program header is generated and at runtime, the region is available to the running program to use. This is an obscure and little used feature. When applied to a NULL segment, reserves an additional PT_NULL program header, for the use of post optimizers that will add segments to the object. Note that this "flag" does not correspond to an actual program header flag.
L	When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a maximum segment size.
N	When used within a segment definition (=) flag value (?): By default, the first segment in an object, which is usually the text segment, contains the ELF header found at the start of the file, making the ELF header available to the runtime linker. The ?N flag specifies that if this segment is the first in the file, it should omit the ELF header. Note that this "flag" does not correspond to an actual program header flag, and that it has no meaning if the segment does not end up being first.
O	When used within a segment definition (=) flag value (?): Input sections assigned to the segment should be ordered within their output sections in the order that section assignment directives (:) for the segment are encountered within the mapfile. Note that this "flag" does not correspond to an actual program header flag.
P	When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a physical address.
R	When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a segment rounding value. When used within a segment definition (=) flag value(?), specifies the READ (PF_R) program header flag value.
S	When used in a symbol scope/version directive, as a prefix to a numeric value in a symbol attributes, specifies that the number provides the symbol size (st_size).
V	When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a virtual address. When used in a hwcap_1 or sfcap_1 capabilities definition (=), as a prefix to value that has not been recognized as a hardware or software capability name, indicates that the item is a number. When used in a symbol scope/version directive, as a prefix to a numeric value in a symbols attributes, specifies that the number provides the symbol value (st_value).
W	When used within a segment definition (=) flag value (?), specifies the WRITE (PF_W) program header flag value. When used in section to segment assignment (:) flag value(?), specifies the SHF_WRITE section header flag.
X	When used within a segment definition (=) flag value (?), specifies the EXECUTE (PF_X) program header flag value. When used in section to segment assignment (:) flag value(?), specifies the SHF_EXECINSTR section header flag.

Time For A Fresh Start

The original mapfile language inherited from AT&T was no beauty, but it was good enough to go forward with. We've continued to build on it for 2 decades for a variety of good reasons, primarily that it was getting the job done, that it wasn't preventing progress, and there has been plenty of other work to do. The sort of users who write mapfiles are up to dealing with a little ugliness, and perhaps have been a bit more tolerant than the situation deserves. The "mapfile situation" has been a concern for years. Put simply, it is not asking too much that a programmer with a reasonable (not necessarily deep) grasp of linker concepts be able to read and understand the intent of a mapfile without resorting to a reference manual or linker source code. Nor should it be a difficult chore to fit a new feature into the language cleanly

The mapfile syntax issue usually comes up in the context of wanting to add a new feature, and disparing at the ugliness of what that implies. One is usually in the middle of solving a considerably more focused and urgent problem, and not willing or able to take an extensive detour to replace underlying infrastructure. And so we've moved forward, adding one thing, and then another, with the situation slowly, but not catastrophically, getting worse each time The current state of our mapfile language is such that we shy away from adding new features, and we are aware of other projects that may need some link-editor support in the near future. The right infrastructure simplifies everything it touches, and as we know all too well, the reverse is also true.

We've known for quite awhile that eventually it would be necessary to tackle this issue systematically and produce a new mapfile language for Solaris. That time has finally arrived.

Surfing with the Linker-Aliens

Comments

Michael Ernest — Wednesday January 06, 2010

This is great news, and solid background work.

As a matter of personal interest, I've tried once or twice to divine what's going on with ld. I never got too far, owing to other demands, but neither could I remember that I ever got a foothold from the attempts.

These blog articles are saving me a ton of time in background reading, at the very least. At their best, they're turning on a number of lights for me. Thanks! And keep up the great work.

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/ali/entry/the_problem_s_with_solaris/
https://blogs.oracle.com/ali/entry/the_problem_s_with_solaris/
https://blogs.oracle.com/ali/the-problems-with-solaris-svr4-link-editor-mapfiles/

Surfing with the Linker-Aliens

[13] GNU Hash ELF Sections Blog Index (ali) [15] New Mapfile Syntax