Files
strangecpp/compile_pipeline.md
2026-02-24 11:49:10 +01:00

5.3 KiB

What Happens When You Run g++ app.cpp -o app

At first glance, it looks like a single command — but under the hood, g++ is orchestrating a multi-stage pipeline. Here's what actually happens, step by step.


The Full Pipeline

app.cpp
   │
   ▼  [1] Preprocessor (cpp)
app.ii          ← expanded source (macros, #includes resolved)
   │
   ▼  [2] Compiler (cc1plus)
app.s           ← assembly code
   │
   ▼  [3] Assembler (as)
app.o           ← relocatable object file (ELF / Mach-O / COFF)
   │
   ▼  [4] Linker (ld / lld / link.exe)
app             ← final executable

g++ is a driver — it calls each of these tools in sequence and passes the right flags between them. None of these intermediate files are written to disk unless you explicitly ask (e.g., g++ -S app.cpp to stop at assembly).


Stage 1 — Preprocessing

Tool: cpp (the C preprocessor, invoked internally)

The preprocessor handles all directives that start with #:

  • #include <iostream> — literally pastes the content of iostream (and everything it includes) into your source
  • #define FOO 42 — performs textual substitution across the file
  • #ifdef / #ifndef / #endif — conditionally includes or excludes blocks of code
  • #pragma once — prevents a header from being included more than once

The output is a single, flat .ii file — pure C++ source with no # directives, potentially tens of thousands of lines long even for a small program.

# You can inspect this stage yourself:
g++ -E app.cpp -o app.ii

Stage 2 — Compilation

Tool: cc1plus (GCC's C++ compiler frontend)

The compiler takes the preprocessed source and:

  1. Parses it into an Abstract Syntax Tree (AST)
  2. Type-checks — validates that types match, overloads resolve, templates instantiate correctly
  3. Optimizes — applies transformations based on the -O level
  4. Emits assembly — produces human-readable .s text in the target ISA (x86-64, ARM, etc.)
# Stop after compilation, get assembly:
g++ -S app.cpp -o app.s
; Example snippet of what app.s might look like for a simple function
_Z3addii:
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-4], edi
    mov     DWORD PTR [rbp-8], esi
    mov     edx, DWORD PTR [rbp-4]
    mov     eax, DWORD PTR [rbp-8]
    add     eax, edx
    pop     rbp
    ret

Note the mangled name _Z3addii — that's add(int, int) after C++ name mangling encodes the parameter types into the symbol name.


Stage 3 — Assembly

Tool: as (GNU assembler, or llvm-mc under Clang)

The assembler converts the .s text file into a binary object file (.o). This is a relocatable binary — it contains:

  • Machine code for all functions defined in this translation unit
  • A symbol table listing every symbol defined here and every symbol referenced but not yet defined
  • Relocation entries — placeholders saying "at this byte offset, fill in the final address of symbol X"

The object file is not yet executable because:

  • References to functions/globals in other .cpp files are unresolved
  • Absolute memory addresses haven't been assigned yet
# Stop at object file:
g++ -c app.cpp -o app.o

# Inspect the symbol table:
nm app.o
# U _ZSt4cout   ← U = undefined, still unresolved
# T _Z3addii    ← T = defined in text (code) section

Stage 4 — Linking

Tool: ld (on Linux), lld (LLVM), or link.exe (MSVC)

This is where everything comes together. The linker:

  1. Collects all .o files (yours + any from -l libraries)
  2. Resolves symbols — for every U (undefined) symbol in any object, finds the T (defined) symbol in another object or library
  3. Applies relocations — patches all those placeholder bytes with real addresses
  4. Lays out sections — merges .text, .data, .bss, .rodata sections from all objects into one
  5. Writes the executable — outputs an ELF (Linux), Mach-O (macOS), or PE (Windows) binary with a proper entry point
app.o          ← your code
   +
libstdc++.so   ← C++ standard library (iostream, string, etc.)
   +
libc.so        ← C runtime (malloc, printf, etc.)
   +
crt1.o         ← C runtime startup (calls main(), handles argc/argv)
   │
   ▼
app            ← fully linked executable

Even though you only wrote app.cpp, the final binary has code from the C++ standard library, the C runtime, and the platform startup objects — all stitched together by the linker.

# See what the linker actually pulls in:
g++ app.cpp -o app -Wl,--verbose 2>&1 | less

# Or check what shared libraries the final binary depends on:
ldd app

Quick Summary

Stage Input Output Key job
Preprocess app.cpp app.ii Expand macros and #includes
Compile app.ii app.s Parse, type-check, optimize, emit asm
Assemble app.s app.o Encode asm as binary machine code
Link app.o + libs app Resolve symbols, assign addresses

When you run g++ app.cpp -o app, all four stages happen invisibly in sequence. The -o app flag only names the final output — not any of the intermediates.