156 lines
5.3 KiB
Markdown
156 lines
5.3 KiB
Markdown
# What Happens When You Run `g++ app.cpp -o app`
|
|
|
|
At first glance, it looks like a single command — but under the hood, `g++` is orchestrating a multi-stage pipeline. Here's what actually happens, step by step.
|
|
|
|
---
|
|
|
|
## The Full Pipeline
|
|
|
|
```
|
|
app.cpp
|
|
│
|
|
▼ [1] Preprocessor (cpp)
|
|
app.ii ← expanded source (macros, #includes resolved)
|
|
│
|
|
▼ [2] Compiler (cc1plus)
|
|
app.s ← assembly code
|
|
│
|
|
▼ [3] Assembler (as)
|
|
app.o ← relocatable object file (ELF / Mach-O / COFF)
|
|
│
|
|
▼ [4] Linker (ld / lld / link.exe)
|
|
app ← final executable
|
|
```
|
|
|
|
`g++` is a driver — it calls each of these tools in sequence and passes the right flags between them. None of these intermediate files are written to disk unless you explicitly ask (e.g., `g++ -S app.cpp` to stop at assembly).
|
|
|
|
---
|
|
|
|
## Stage 1 — Preprocessing
|
|
|
|
**Tool:** `cpp` (the C preprocessor, invoked internally)
|
|
|
|
The preprocessor handles all directives that start with `#`:
|
|
|
|
- `#include <iostream>` — literally pastes the content of `iostream` (and everything it includes) into your source
|
|
- `#define FOO 42` — performs textual substitution across the file
|
|
- `#ifdef` / `#ifndef` / `#endif` — conditionally includes or excludes blocks of code
|
|
- `#pragma once` — prevents a header from being included more than once
|
|
|
|
The output is a single, flat `.ii` file — pure C++ source with no `#` directives, potentially tens of thousands of lines long even for a small program.
|
|
|
|
```bash
|
|
# You can inspect this stage yourself:
|
|
g++ -E app.cpp -o app.ii
|
|
```
|
|
|
|
---
|
|
|
|
## Stage 2 — Compilation
|
|
|
|
**Tool:** `cc1plus` (GCC's C++ compiler frontend)
|
|
|
|
The compiler takes the preprocessed source and:
|
|
|
|
1. **Parses** it into an Abstract Syntax Tree (AST)
|
|
2. **Type-checks** — validates that types match, overloads resolve, templates instantiate correctly
|
|
3. **Optimizes** — applies transformations based on the `-O` level
|
|
4. **Emits assembly** — produces human-readable `.s` text in the target ISA (x86-64, ARM, etc.)
|
|
|
|
```bash
|
|
# Stop after compilation, get assembly:
|
|
g++ -S app.cpp -o app.s
|
|
```
|
|
|
|
```asm
|
|
; Example snippet of what app.s might look like for a simple function
|
|
_Z3addii:
|
|
push rbp
|
|
mov rbp, rsp
|
|
mov DWORD PTR [rbp-4], edi
|
|
mov DWORD PTR [rbp-8], esi
|
|
mov edx, DWORD PTR [rbp-4]
|
|
mov eax, DWORD PTR [rbp-8]
|
|
add eax, edx
|
|
pop rbp
|
|
ret
|
|
```
|
|
|
|
Note the mangled name `_Z3addii` — that's `add(int, int)` after C++ name mangling encodes the parameter types into the symbol name.
|
|
|
|
---
|
|
|
|
## Stage 3 — Assembly
|
|
|
|
**Tool:** `as` (GNU assembler, or `llvm-mc` under Clang)
|
|
|
|
The assembler converts the `.s` text file into a binary **object file** (`.o`). This is a relocatable binary — it contains:
|
|
|
|
- **Machine code** for all functions defined in this translation unit
|
|
- **A symbol table** listing every symbol defined here and every symbol referenced but not yet defined
|
|
- **Relocation entries** — placeholders saying "at this byte offset, fill in the final address of symbol `X`"
|
|
|
|
The object file is **not yet executable** because:
|
|
- References to functions/globals in other `.cpp` files are unresolved
|
|
- Absolute memory addresses haven't been assigned yet
|
|
|
|
```bash
|
|
# Stop at object file:
|
|
g++ -c app.cpp -o app.o
|
|
|
|
# Inspect the symbol table:
|
|
nm app.o
|
|
# U _ZSt4cout ← U = undefined, still unresolved
|
|
# T _Z3addii ← T = defined in text (code) section
|
|
```
|
|
|
|
---
|
|
|
|
## Stage 4 — Linking
|
|
|
|
**Tool:** `ld` (on Linux), `lld` (LLVM), or `link.exe` (MSVC)
|
|
|
|
This is where everything comes together. The linker:
|
|
|
|
1. **Collects** all `.o` files (yours + any from `-l` libraries)
|
|
2. **Resolves symbols** — for every `U` (undefined) symbol in any object, finds the `T` (defined) symbol in another object or library
|
|
3. **Applies relocations** — patches all those placeholder bytes with real addresses
|
|
4. **Lays out sections** — merges `.text`, `.data`, `.bss`, `.rodata` sections from all objects into one
|
|
5. **Writes the executable** — outputs an ELF (Linux), Mach-O (macOS), or PE (Windows) binary with a proper entry point
|
|
|
|
```
|
|
app.o ← your code
|
|
+
|
|
libstdc++.so ← C++ standard library (iostream, string, etc.)
|
|
+
|
|
libc.so ← C runtime (malloc, printf, etc.)
|
|
+
|
|
crt1.o ← C runtime startup (calls main(), handles argc/argv)
|
|
│
|
|
▼
|
|
app ← fully linked executable
|
|
```
|
|
|
|
Even though you only wrote `app.cpp`, the final binary has code from the C++ standard library, the C runtime, and the platform startup objects — all stitched together by the linker.
|
|
|
|
```bash
|
|
# See what the linker actually pulls in:
|
|
g++ app.cpp -o app -Wl,--verbose 2>&1 | less
|
|
|
|
# Or check what shared libraries the final binary depends on:
|
|
ldd app
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Summary
|
|
|
|
| Stage | Input | Output | Key job |
|
|
|-------------|-------------|---------|--------------------------------------|
|
|
| Preprocess | `app.cpp` | `app.ii`| Expand macros and `#include`s |
|
|
| Compile | `app.ii` | `app.s` | Parse, type-check, optimize, emit asm|
|
|
| Assemble | `app.s` | `app.o` | Encode asm as binary machine code |
|
|
| Link | `app.o` + libs | `app` | Resolve symbols, assign addresses |
|
|
|
|
When you run `g++ app.cpp -o app`, all four stages happen invisibly in sequence. The `-o app` flag only names the final output — not any of the intermediates.
|