# What Happens When You Run `g++ app.cpp -o app` At first glance, it looks like a single command — but under the hood, `g++` is orchestrating a multi-stage pipeline. Here's what actually happens, step by step. --- ## The Full Pipeline ``` app.cpp │ ▼ [1] Preprocessor (cpp) app.ii ← expanded source (macros, #includes resolved) │ ▼ [2] Compiler (cc1plus) app.s ← assembly code │ ▼ [3] Assembler (as) app.o ← relocatable object file (ELF / Mach-O / COFF) │ ▼ [4] Linker (ld / lld / link.exe) app ← final executable ``` `g++` is a driver — it calls each of these tools in sequence and passes the right flags between them. None of these intermediate files are written to disk unless you explicitly ask (e.g., `g++ -S app.cpp` to stop at assembly). --- ## Stage 1 — Preprocessing **Tool:** `cpp` (the C preprocessor, invoked internally) The preprocessor handles all directives that start with `#`: - `#include ` — literally pastes the content of `iostream` (and everything it includes) into your source - `#define FOO 42` — performs textual substitution across the file - `#ifdef` / `#ifndef` / `#endif` — conditionally includes or excludes blocks of code - `#pragma once` — prevents a header from being included more than once The output is a single, flat `.ii` file — pure C++ source with no `#` directives, potentially tens of thousands of lines long even for a small program. ```bash # You can inspect this stage yourself: g++ -E app.cpp -o app.ii ``` --- ## Stage 2 — Compilation **Tool:** `cc1plus` (GCC's C++ compiler frontend) The compiler takes the preprocessed source and: 1. **Parses** it into an Abstract Syntax Tree (AST) 2. **Type-checks** — validates that types match, overloads resolve, templates instantiate correctly 3. **Optimizes** — applies transformations based on the `-O` level 4. **Emits assembly** — produces human-readable `.s` text in the target ISA (x86-64, ARM, etc.) ```bash # Stop after compilation, get assembly: g++ -S app.cpp -o app.s ``` ```asm ; Example snippet of what app.s might look like for a simple function _Z3addii: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi mov edx, DWORD PTR [rbp-4] mov eax, DWORD PTR [rbp-8] add eax, edx pop rbp ret ``` Note the mangled name `_Z3addii` — that's `add(int, int)` after C++ name mangling encodes the parameter types into the symbol name. --- ## Stage 3 — Assembly **Tool:** `as` (GNU assembler, or `llvm-mc` under Clang) The assembler converts the `.s` text file into a binary **object file** (`.o`). This is a relocatable binary — it contains: - **Machine code** for all functions defined in this translation unit - **A symbol table** listing every symbol defined here and every symbol referenced but not yet defined - **Relocation entries** — placeholders saying "at this byte offset, fill in the final address of symbol `X`" The object file is **not yet executable** because: - References to functions/globals in other `.cpp` files are unresolved - Absolute memory addresses haven't been assigned yet ```bash # Stop at object file: g++ -c app.cpp -o app.o # Inspect the symbol table: nm app.o # U _ZSt4cout ← U = undefined, still unresolved # T _Z3addii ← T = defined in text (code) section ``` --- ## Stage 4 — Linking **Tool:** `ld` (on Linux), `lld` (LLVM), or `link.exe` (MSVC) This is where everything comes together. The linker: 1. **Collects** all `.o` files (yours + any from `-l` libraries) 2. **Resolves symbols** — for every `U` (undefined) symbol in any object, finds the `T` (defined) symbol in another object or library 3. **Applies relocations** — patches all those placeholder bytes with real addresses 4. **Lays out sections** — merges `.text`, `.data`, `.bss`, `.rodata` sections from all objects into one 5. **Writes the executable** — outputs an ELF (Linux), Mach-O (macOS), or PE (Windows) binary with a proper entry point ``` app.o ← your code + libstdc++.so ← C++ standard library (iostream, string, etc.) + libc.so ← C runtime (malloc, printf, etc.) + crt1.o ← C runtime startup (calls main(), handles argc/argv) │ ▼ app ← fully linked executable ``` Even though you only wrote `app.cpp`, the final binary has code from the C++ standard library, the C runtime, and the platform startup objects — all stitched together by the linker. ```bash # See what the linker actually pulls in: g++ app.cpp -o app -Wl,--verbose 2>&1 | less # Or check what shared libraries the final binary depends on: ldd app ``` --- ## Quick Summary | Stage | Input | Output | Key job | |-------------|-------------|---------|--------------------------------------| | Preprocess | `app.cpp` | `app.ii`| Expand macros and `#include`s | | Compile | `app.ii` | `app.s` | Parse, type-check, optimize, emit asm| | Assemble | `app.s` | `app.o` | Encode asm as binary machine code | | Link | `app.o` + libs | `app` | Resolve symbols, assign addresses | When you run `g++ app.cpp -o app`, all four stages happen invisibly in sequence. The `-o app` flag only names the final output — not any of the intermediates.