The Bits Between the Bits

How we get to main()

Matt Godbolt
CppCon 2018

"Good, but non-essential"

© Ozric Tentacles, used under Fair Use.

A program

                int main() {}

                    $ gcc -Os empty.c -o c/empty
                    $ g++ -Os empty.cpp -o cpp/empty
$ ls -l c/empty cpp/empty
7976 c/empty
7976 cpp/empty

What's in it?

                    $ objdump --no-show-raw-insn -dC cpp/empty

What's in it?

                    $ readelf -a cpp/empty

The ELF file format


  • .text — code
  • .rodata — read-only data
  • .data— read/write data
  • .bss — zero-initialised data

How we get to main()

A (slightly) more interesting program
struct Foo {
  static int numFoos;
  Foo() {
  ~Foo() {
int Foo::numFoos;
Foo globalFoo;

int main() {
  std::cout << "numFoos = "
      << Foo::numFoos << "\n";

What does it print?

$ g++ -O0 -g global.cpp -o global
$ ./global
numFoos = 1

Code Archaeology - Part 1

Matt Godbolt [CC BY-SA 3.0]

Call stack

#0 Foo::Foo (this=0x601050 <global>) at global.cpp:6
#1 0x000000000040079d in __static_initialization_and_destruction_0 ( __initialize_p=1, __priority=65535) at global.cpp:14 #2 0x00000000004007b3 in _GLOBAL__sub_I_global.cpp(void) () at global.cpp:18
#3 0x000000000040082d in __libc_csu_init () #4 0x00007ffff70c1b28 in __libc_start_main (main=0x400702 <main()>, argc=1, ... at ../csu/libc-start.c:266
#5 0x000000000040064a in _start ()

Where do these functions come from?

Who calls this function?

libc spelunking

libc spelunking

// Paraphrased from glibc/csu/elf-init.c
typedef void (*init_func)(int, char **, char **);
extern init_func __init_array_start[]; extern init_func __init_array_end[];
int __libc_csu_init(int argc, char **argv, char **envp) {
const size_t size = __init_array_end - __init_array_start;
for (size_t i = 0; i < size; i++) (*__init_array_start[i])(argc, argv, envp);

Ok but...


What's going on here?

.section .init_array,"aw"
.align   8
.quad    _GLOBAL__sub_I_Foo::numFoos

The Linker

Matt Godbolt [CC BY-SA 3.0]

The Linker

What does it do?

  • Resolves references between .o files
  • Determines the layout of an executable
  • Writes metadata

A more representative program

// hello.cpp
extern const char *getMessage();
void greet() {
  std::cout << getMessage() << "\n";
int main() {


// message.cpp
const char *getMessage() {
  return "Hello world";

                $ g++ -Os -o hello.o hello.cpp
                $ g++ -Os -o message.o message.cpp
                $ g++ -Os -o hello message.o hello.o
                $ ./hello
                Hello world

Object files

$ file hello hello.o message.o
hello:     ELF 64-bit LSB executable, x86-64,
           dynamically linked, interpreter /lib64/,
           for GNU/Linux 3.2.0, not stripped
hello.o:   ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
message.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

What's in an object file?

                    $ objdump -dC hello.o


Matt Godbolt [CC BY-SA 3.0]

What's in an object file?

                    $ objdump --reloc -dC hello.o


  • Different types
  • Used within same object file


Matt Godbolt [CC BY-SA 3.0]


                    $ objdump --syms -C hello.o


                    $ objdump --syms -C message.o


  • Reads all the inputs
  • Identifies symbols
  • Applies relocations
"Hello world"
Program Headers
greet() {}
"Hello world"

Linker Scripts

$ g++ -o /dev/null -x c /dev/null -Wl,--verbose

  .init_array     :
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) KEEP (*(.init_array EXCLUDE_FILE ( *crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
PROVIDE_HIDDEN (__init_array_end = .);

Now we know!

  • Compiler:
    • "static init" function for each TU
    • pointer to this function into init_array
  • Linker:
    • gathers all init_arrays together
    • script defines symbols pointing at begin and end of init_array
  • C runtime walks init_array and calls each

Stuff to know

  • You can write your own linker scripts
  • Linker can discard unused sections: -Wl,--gc-sections
  • Compiler flags: -ffunction-sections, -fdata-sections

Dynamic linking

$ ls -l dynamic/hello static/hello
8,688      dynamic/hello*
2,406,632  static/hello*

Another hello world

                $ g++ -Os -o message.o message.cpp
                $ g++ -shared -o message.o
                $ g++ -Os -o hello hello.o
                $ g++ -Os -o hello.o hello.cpp -L. -lhello
                $ ./hello
                Hello world

More ELF headers

$ readelf --dynamic --program-headers dynamic-dso/hello

Code Archaeology - Part 2

Matt Godbolt [CC BY-SA 3.0]

0x4006b0: jmpq *0x200962(%rip) # 0x601018
0x4006b6: pushq $0x0 0x4006bb: jmpq 0x4006a0 ; ultimately resolves symbol 0
0x601018: .quad 0x4006b6

 0x4006b0:  jmpq  *0x200962(%rip) # 0x601018
 0x4006b6:  pushq $0x0
 0x4006bb:  jmpq  0x4006a0
0x601018: .quad 0x7ffff7bd35d5 ; now resolved to getMessage()


  • LD_BIND_NOW (and -Wl,-znow)
  • ldd and LD_DEBUG

I wish I had more time

  • Weak references
  • ODR violations
  • LTO

More reading

Special Training Event

  • Summer 2019 — Denver Area
  • Charley Bay, Jason Turner and me together for 3 days
  • C++20, error handling and performance
  • Check out for more info